[Date Prev][Date Next][Thread Prev][Thread Next]
- Subject: Re: Native unicode support?
- From: Björn De Meyer <bjorn.demeyer@...>
- Date: Thu, 27 Jun 2002 21:42:39 +0200
Peter Loveday wrote:
> I don't see how this deals with UTF-8 at all ?
> Surely to do so you need to combine characters that
> are multi-byte prefixes, otherwise its just 8 bit
> ASCII ?
> Love, Light and Peace,
> - Peter Loveday
> Director of Development, eyeon Software
Well, LUA need not validate nor interpret
the utf8 multibyte character sequences.
It only has to /detect/ them. And, and that
is the beauty of UTF8. Any byte that has the
eight bit set, apart from 0xfe and 0xff,
is part of a multibyte encoding of a Unicode character.
Furthermore, UTF8 is compatible with 7-bit ascii,
so we are sure that these multibyte encodings of
Unicode characters do not encode for any 7-bit
whitespace, digit or nonprinting characters.
So, we have a byte of a multibyte sequence that represents
a Unicode character, that for all practical means is
valid for use in an identifier.
I assume you knowthe UTF-8 specs. If not,
take a gander here: http://czyborra.com/utf/#UTF-8
"No one knows true heroes, for they speak not of their greatness." --
Björn De Meyer