lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Peter Loveday wrote:
>
> I don't see how this deals with UTF-8 at all ?
> 
> Surely to do so you need to combine characters that
> are multi-byte prefixes, otherwise its just 8 bit
> ASCII ?
> 
> Love, Light and Peace,
> 
> - Peter Loveday
> Director of Development, eyeon Software

Well, LUA need not validate nor interpret 
the utf8 multibyte character sequences. 
It only has to /detect/ them. And, and that 
is the beauty of UTF8.  Any byte that has the 
eight bit set, apart from 0xfe and 0xff, 
is part of a multibyte encoding of a Unicode character. 
Furthermore, UTF8 is compatible with 7-bit ascii, 
so we are sure that these multibyte encodings of 
Unicode characters do not encode for any 7-bit 
whitespace, digit or nonprinting characters. 
So, we have a byte of a multibyte sequence that represents 
a Unicode character, that for all practical means is 
valid for use in an identifier.

I assume you knowthe UTF-8 specs. If not,
take a gander here: http://czyborra.com/utf/#UTF-8


-- 
"No one knows true heroes, for they speak not of their greatness." -- 
Daniel Remar.
Björn De Meyer 
bjorn.demeyer@pandora.be