lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Wed, Jan 5, 2011 at 6:57 PM, Lorenzo Donati
<lorenzodonatibz@interfree.it> wrote:
> I know Lua can _store_ any octet sequence in a string. The doubt is with the
> interpreter executable: can it read and always parse a utf8 file with
> non-ASCII chars in some literals/comments?

According to [1], the lexer does not guarantee reliable preservation
of arbitrary octets in string literals, so you may need to encode
these octets with escape sequences.  This is particularly due to ASCII
newlines ([\r\n]+) being normalized to '\n' (so that string literals
have the same meaning regardless of the newline encoding of the source
file).  There's a lexer change in 5.2.0-alpha eliminating dependence
on locales [2], but that doesn't alter the newline normalization--see
the `inclinenumber` in `read_long_string` in llex.c.

This indeed in sometimes unfortunate.  It means that Lua syntax is not
an ideal binary encoding format.

BTW, [3] had an interesting related request about whether its possible
to force memory alignment of string literals (and the answer is no).

[1] http://lua-users.org/lists/lua-l/2009-10/msg00846.html
[2] http://lua-users.org/lists/lua-l/2009-11/msg00999.html
[3] http://lua-users.org/lists/lua-l/2001-02/msg00112.html