lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

> -----Original Message-----
> From: [] On
> Behalf Of Tim Hill
> Sent: vrijdag 21 februari 2014 23:51
> To: Lua mailing list
> Subject: Re: io:lines() and \0
> (my) thread summary:
> - Using the Lua library to read text files yields unpredictable/unexpected
> results if the file contains embedded NUL characters.
> - A patch has been suggested that fixes this, at the expense of some subtle
> behavior changes that only occur if you rely on the old NUL behavior
> Argument A:
> - A file containing a NUL is not a valid text file as per the ANSI spec, so
> this is garbage-in, garbage-out and expected behavior.
> - You should not try to read non-text files in text mode.
> - Lua behaves this way, live with it.

I understand Lua is built on ANSI, but ANSI is also full of compromises. What I care about is what someone might reasonably expect. 'Someone' being an average Lua user. Now imo the average Lua user, considering the embedded nature of Lua, isn't someone that knows about the ANSI standard, C runtimes nor theirs quirks.

UTF8 was mentioned as a possible feature to be included in future versions. If that happens, the arguments to get control characters handled without data mangling, gets a lot stronger.

Still the requested functionality it is not clear. Lua reads codefiles platform independently, single byte or double byte eol markers. How to handle this with regular text files? What should io.lines() return if you open a Windows text file on Linux?
1 Should it silently ignore the extra eol character Windows uses? 
  Then we're back to square one, with silent data discarding. 
2 Should it return a second result, a string with the eol 
  characters it stripped?
  Then on linux you would still need extra Lua side code if no Windows 
  input is expected, because you would need the extra character
  returned in the second argument to be reinserted in the line.
3 Provide the allowed line end markers you want io.lines() to use?
  It's pretty common to have multiple types of line endings
  in a single file, so it would have to be a list... Now how
  will you get any performance out of that?

There is no easy solution.

The most reasonable solution seems to be an external module, designed specifically to handle all text file idiosyncrasies (and that would be no simple feat by any means...).

My 2cts