[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: io:lines() and \0
- From: Sean Conner <sean@...>
- Date: Fri, 21 Feb 2014 16:26:08 -0500
It was thus said that the Great René Rebe once stated:
>
> Who said it was impossible? A simple loop fixing this issue was presented by
> Francisco.
As I see it, here are the possible solutions:
1) Mention in the manual that using f:lines() (or io.lines()) on a non-text
file may produce inconsistent results. This maintains the status quo, but
Lua cannot read a line of bytes containing embedded NUL bytes (that this
comes up is interesting, but that the last time it came up was eight years
ago also says something). This is the easiest solution (and for now, looks
to be the one picked).
2) Keep the call to fgets() but add additional code to figure out the actual
end of the buffer [1] that degrades performance the larger the buffer is.
This is the fix that René Rebe proposed.
3) Switch to using fgetc() in a loop. This too, degrades performance if
only because fgets() can use implementation details it is privy to to
increase performance, but it is still using standard C calls. Francisco
Olarte proposed this fix. And if a fix is to be used, this is the one I
would actually prefer (it's cleaner than #2, the performance degredation is
constant (not based on buffer size), portable, and provides consistent
behavior across C implementations).
4) On POSIX systems (or systems that have it), use getline() instead of
fgets(); on systems without this function, either fall back to fgets(), or
solutions #2 or #3. Just falling back to fgets() means you get inconsistent
Lua implementations (some work on files with embedded NULs, some don't).
But even if you fall back to #2 or #3, performance is degraded with the use
of getline(). The calling code needs to allocate memory for getline() to
use, and unless this memory can be reused, a call to free() must also be
done (in order to avoid memory leaks).
Of the four that I see, #1 or #3 are the best solutions in my mind. All
(except #1) degrade performance somewhat, but I would prefer a correct
solution [2] over a fast solution.
-spc (Surprisingly, I'm not against #3 ... )
[1] Had getline() (or getdelim()) not existed, and René used fgets() in
a C implementation, I suspect he might have had to resort to strace
to figure out why he wasn't getting all the data, or more data than
expected. I only point this out as an observation.
[2] Even if I feel that reading a binary file "line by line" is silly.