lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Quoth Lorenzo Donati <lorenzodonatibz@interfree.it>, on 2011-01-06 00:05:41 +0100:
> I know that Lua in itself isn't Unicode compliant, but does the
> interpreter behave well if the only non-ASCII Unicode chars are in
> string literals (and in comments sometimes)? Is it a guaranteed
> behaviour?

"Unicode compliant" doesn't mean a whole lot here.  As far as I know,
arbitrary octets can be embedded in string literals and they'll just
be passed through transparently.  This means if the source encoding is
UTF-8 then non-ASCII UTF-8 sequences will show up as the same octet
sequences.  I interpret « Strings in Lua can contain any 8-bit value,
including embedded zeros, which can be specified as '\0'. » from the
Lua 5.1 manual (section 2.1) to imply that this is true for source as
well, but I didn't write the manual, so...

This does mean that if your source files are ever recoded into some
other charset, your literals will break because the execution coding
will have implicitly changed as well.  If this is important, you can
test the octets of a known string early on and raise an error if they
don't look correct.

Things like the length operator and stock Lua string operations will
neither respect nor choke on UTF-8 sequences; they will blindly treat
them as their component octets, with all the blessings and curses that
entails.

Does that answer your question?

   ---> Drake Wilson