Re: [Q] handling 0xC2A0 (space in utf8)

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: [Q] handling 0xC2A0 (space in utf8)
From: "V S P" <toreason@...>
Date: Thu, 16 Oct 2008 22:05:05 -0400

Thank you Roberto,

this 
   + string.char(0xc2, 0xa0)
worked

Also thank you for all the responses,
now I understand that 0xc2a0 is not a UTF-8 space
but instead a special HTML character that is rendered
by web browsers as space, and that is 
represented differently in UTF-8. Some
how was not handeled by PHP's html_entity_decode
(this function is supposed to get rid of all the
HTML stuff for me)

Peter, I use that dijit.Editor Javascript editor because it allows
to define buttons, that am going to use to allow users
to do 'Blocks of code insert' -- instead of just having them typing
in text.  I am only disgarding HTML tags when passing to my
compiler written in Lua, otherwise, I will be saving the text as is
in UTF-8 enabled postgreSQL.


... by the way, I added to my online-resume that I programmed in Lua
(my compiler is just over 1.6k lines, but have used also luabind C++
library
for another project)
and got a call today from a recruiter about my LUA experience :-).





On Thu, 16 Oct 2008 17:02:33 -0300, "Roberto Ierusalimschy"
<roberto@inf.puc-rio.br> said:
> > In lua, I have specifed for LPEG the following grammar for space
> > 
> > local space=lpeg.S('\r\n\f\t ')^1
> > 
> > [...]
> > 
> > I am thinking now that this messes up LPeg when trying to match
> > for the space.  I would like to tell LPeg to also understand
> > 0xC2A0 as a space.
> 
> local space = (lpeg.S('\r\n\f\t ') + string.char(0xc2, 0xa0))^1
> 
> -- Roberto
-- 
  V S P
  toreason@fastmail.fm

-- 
http://www.fastmail.fm - Access your email from home and the web

References:
- [Q] handling 0xC2A0 (space in utf8), V S P
- Re: [Q] handling 0xC2A0 (space in utf8), Roberto Ierusalimschy

Prev by Date: Re: [Q] handling 0xC2A0 (space in utf8)
Next by Date: Re: LuaSocket SMTP receive
Previous by thread: Re: [Q] handling 0xC2A0 (space in utf8)
Next by thread: LuaSocket SMTP receive
Index(es):
- Date
- Thread