lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

I haven't pounded on it extensively, but I've wired my simple Lua
environment (built in Cocoa on MacOS X) to work with UTF8 encoded strings
for input and output. I expect this to be fine so long as I:

* Don't want to disassemble strings into characters
* Use regular expressions that use things other than low-ASCII for matches
* Perform comparisons on strings other than for equality

What this relies on is that:

* Lua fully supports essentially any 8-bit character set but really only
cares about those in the 7-bit ASCII set from a parsing standpoint

* UTF-8 does all of its encoding using combinations of high 8-bit values --
i.e., the bytes of a multibyte character can never be mistaken for ASCII