[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: io:lines() and \0
- From: Andrew Starks <andrew.starks@...>
- Date: Tue, 18 Feb 2014 09:20:23 -0600
On Tuesday, February 18, 2014, Enrico Colombini <erix@erix.it> wrote:
On 18/02/2014 15.04, Roberto Ierusalimschy wrote:
ANSI C says that about text files:
Data read in from a text stream will necessarily compare equal to
the data that were earlier written out to that stream only if: the
data consist only of printing characters and the control characters
horizontal tab and new-line; no new-line character is immediately
preceded by space characters; and the last character is a new-line
character.
So, there is no garanties that a text file with embedded zeros will be
read correctly, no matter how we implement it.
Another example is the "end-of-text" character (0x04 in Unix, 0x1a in Windows) that terminates text files.
Just for the record: In my machine, the following program,
local count = 0
for l in io.lines() do
count = count + #l
end
print(count)
reading the Bible, takes ~0.07s with the current implementation and
~0.14s with this proposal.
I really can't see any advantage in making text file reading worse, just to handle the nonstandard case "read a binary file using text file functions".
Especially because the nonstandard case can be easily handled in other ways, either in C or in pure Lua.
--
Enrico
The OP did offer up that a clarification in the documentation might also be an improvement. Given that Lua strings are 8bit clean and that they can contain \0, this seems wise. I agree that strings ~= lines, of course.