lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]




On Tuesday, February 18, 2014, Enrico Colombini <erix@erix.it> wrote:
On 18/02/2014 15.04, Roberto Ierusalimschy wrote:
ANSI C says that about text files:

   Data read in from a text stream will necessarily compare equal to
   the data that were earlier written out to that stream only if: the
   data consist only of printing characters and the control characters
   horizontal tab and new-line; no new-line character is immediately
   preceded by space characters; and the last character is a new-line
   character.

So, there is no garanties that a text file with embedded zeros will be
read correctly, no matter how we implement it.

Another example is the "end-of-text" character (0x04 in Unix, 0x1a in Windows) that terminates text files.

Just for the record: In my machine, the following program,

   local count = 0
   for l in io.lines() do
     count = count + #l
   end
   print(count)

reading the Bible, takes ~0.07s with the current implementation and
~0.14s with this proposal.

I really can't see any advantage in making text file reading worse, just to handle the nonstandard case "read a binary file using text file functions".
Especially because the nonstandard case can be easily handled in other ways, either in C or in pure Lua.

--
  Enrico


The OP did offer up that a clarification in the documentation might also be an improvement. Given that Lua strings are 8bit clean and that they can contain \0, this seems wise. I agree that strings ~= lines, of course.