[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: [Q] handling 0xC2A0 (space in utf8)
- From: "V S P" <toreason@...>
- Date: Thu, 16 Oct 2008 22:05:05 -0400
Thank you Roberto,
this
+ string.char(0xc2, 0xa0)
worked
Also thank you for all the responses,
now I understand that 0xc2a0 is not a UTF-8 space
but instead a special HTML character that is rendered
by web browsers as space, and that is
represented differently in UTF-8. Some
how was not handeled by PHP's html_entity_decode
(this function is supposed to get rid of all the
HTML stuff for me)
Peter, I use that dijit.Editor Javascript editor because it allows
to define buttons, that am going to use to allow users
to do 'Blocks of code insert' -- instead of just having them typing
in text. I am only disgarding HTML tags when passing to my
compiler written in Lua, otherwise, I will be saving the text as is
in UTF-8 enabled postgreSQL.
... by the way, I added to my online-resume that I programmed in Lua
(my compiler is just over 1.6k lines, but have used also luabind C++
library
for another project)
and got a call today from a recruiter about my LUA experience :-).
On Thu, 16 Oct 2008 17:02:33 -0300, "Roberto Ierusalimschy"
<roberto@inf.puc-rio.br> said:
> > In lua, I have specifed for LPEG the following grammar for space
> >
> > local space=lpeg.S('\r\n\f\t ')^1
> >
> > [...]
> >
> > I am thinking now that this messes up LPeg when trying to match
> > for the space. I would like to tell LPeg to also understand
> > 0xC2A0 as a space.
>
> local space = (lpeg.S('\r\n\f\t ') + string.char(0xc2, 0xa0))^1
>
> -- Roberto
--
V S P
toreason@fastmail.fm
--
http://www.fastmail.fm - Access your email from home and the web