lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Roberto Ierusalimschy wrote:
I know Lua can _store_ any octet sequence in a string. The doubt is
with the interpreter executable: can it read and always parse a utf8
file with non-ASCII chars in some literals/comments?

The problem is that it depends on the system's file manipulation. Lua
uses regular fread functions in text mode to read a source file. If
these functions manipulate the file contents in any way (e.g., changing
newlines), there is not much that Lua can do about it.

I see. So it all depends to the particular compiler implementation of C's fread.


The manual says:

  You should not use long strings for non-text data;
  Use instead a regular quoted literal with explicit escape sequences
  for control characters.


Yes. I did know that.

The notion of "non-text data" and "control characters" is system
dependent. If your system does not corrupt UTF-8 sequences (that is,
does not treat them as some kind of "control characters"), all will
be fine.

This is just what I missed! thanks! I didn't realize that my notion of text data was too high level, whereas what the Lua manual says is implicitly referring to what _C's fread_ regards as text data (If I got the aforementioned explanation right).

So I suppose the test by John Giors (i.e. write and reread from a file all the octets from 0x20 to 0xFF and test for alterations) should be sufficient to test whether the particular fread implementation is fine for utf8 literals.


-- Roberto


Thank you very much! The above explanation really hit the spot!

--
Lorenzo