|
Roberto Ierusalimschy wrote:
I know Lua can _store_ any octet sequence in a string. The doubt is with the interpreter executable: can it read and always parse a utf8 file with non-ASCII chars in some literals/comments?The problem is that it depends on the system's file manipulation. Lua uses regular fread functions in text mode to read a source file. If these functions manipulate the file contents in any way (e.g., changing newlines), there is not much that Lua can do about it.
I see. So it all depends to the particular compiler implementation of C's fread.
The manual says: You should not use long strings for non-text data; Use instead a regular quoted literal with explicit escape sequences for control characters.
Yes. I did know that.
The notion of "non-text data" and "control characters" is system dependent. If your system does not corrupt UTF-8 sequences (that is, does not treat them as some kind of "control characters"), all will be fine.
This is just what I missed! thanks! I didn't realize that my notion of text data was too high level, whereas what the Lua manual says is implicitly referring to what _C's fread_ regards as text data (If I got the aforementioned explanation right).
So I suppose the test by John Giors (i.e. write and reread from a file all the octets from 0x20 to 0xFF and test for alterations) should be sufficient to test whether the particular fread implementation is fine for utf8 literals.
-- Roberto
Thank you very much! The above explanation really hit the spot! -- Lorenzo