Re: Lua interpreter and Lua files encoding

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: Lua interpreter and Lua files encoding
From: Lorenzo Donati <lorenzodonatibz@...>
Date: Thu, 06 Jan 2011 11:04:48 +0100

jgiors@threeeyessoftware.com wrote:
[snip]


Caveats:
(a) Windows build
(b) Lua version 5.1.2
(c) Wikipedia
(d) Verify my test code and reasoning  :)


I tested your code and works for me (WinXP + Lua 5.1.4/Lua 5.2.0-alpha).

As for the reasoning, I find no fault in it. I'm no expert ofUnicode/utf8, though.

It seems that if one sticks to literals with no octet in the range 0-31(to be safe), utf8 Lua files should be safe.

The only problem may be the normalization algorithm cited by David in abranch of this thread:


David Manura wrote:

According to [1], the lexer does not guarantee reliable preservation
of arbitrary octets in string literals, so you may need to encode
these octets with escape sequences.  This is particularly due to ASCII
newlines ([\r\n]+) being normalized to '\n' (so that string literals
have the same meaning regardless of the newline encoding of the source
file).  There's a lexer change in 5.2.0-alpha eliminating dependence
on locales [2], but that doesn't alter the newline normalization--see
the `inclinenumber` in `read_long_string` in llex.c.


I'm no C expert, so I cannot comment on the Lua internals cited. But...


This indeed in sometimes unfortunate.  It means that Lua syntax is not
an ideal binary encoding format.

...even if a general binary stream cannot be encoded as a Lua file, canwe at least depend on the fact that a stream of utf-8 octets (trustingwhat Wikipedia said) can be safely embedded in a string literal, asJohn's test seems to prove?

Anyway, is this only an implementation artifact? Or is something thatwill last? In this latter case a mention in the reference manual couldbe useful, since utf8 is very common nowadays and generating utf8 filesusing Lua, _without specialized libraries_ and without the hassle ofencoding literals with escape sequence, is really a useful!


Thanks.

--
Lorenzo

Follow-Ups:
- Re: Lua interpreter and Lua files encoding, Peter Odding
- Re: Lua interpreter and Lua files encoding, Javier Guerra Giraldez

References:
- RE: Lua interpreter and Lua files encoding, jgiors

Prev by Date: Re: Lua Cookbook
Next by Date: Re: Lua Cookbook
Previous by thread: RE: Lua interpreter and Lua files encoding
Next by thread: Re: Lua interpreter and Lua files encoding
Index(es):
- Date
- Thread