lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Feb 19, 2014, at 22:30, the Great Sean Conner wrote:

It was thus said that the Great René Rebe once stated:

Given this IO function may not be the most performance critical code path
in Lua, that may be acceptable. But your milage may vary; and so I get the
feeling nobody will like it here in any case.

 You really should be testing this under Microsoft Windows.  Windows
definitely treats text and binary files differently than Unix.  For
starters, you will never get a '\r' (CR, or character 13) in text mode.  You
will never see SUB (charater 26) in text mode---and most likely, you won't
see *any* input past SUB (you'll just get an EOF marker) [1].  This is just
a few reasons why treating a binary file as text is problematic.  [5]

My modifications do not harm Windows and "text" mode files in any way.
The modifications only make sure whatever is returned by the C library
is actually passed to the Lua string.

Most programs open all files in binary mode on Windows anyway, to
avoid exactly this surprises anyway. And with the binary mode flag
DOS behaves like modern Unix and Mac OS X implementations behave
always.

And to contribute to your anecdotes about this infamous text mode, I
have seen multiple Windows program bugs due to this - including
a scan program that failed to open JPEG files when imported with
one menu entry (but not other functions) because at that code path
the file was not opened in binary mode. So \r\n was translated to
just \n and thus obviously corrupted the JPEG stream while reading.
Many people consider this non-binary text mode harmful and make
sure the binary mode flat is never forgotten when files are opened
for portable software that eventually runs on such DOS and alike
systems.

 -spc (I don't care to support Windows, and you may not care to support
Windows, but there are some here who do care … )

Again, my proposed changes do not harm Windows. The Lua
code before already check for the trailing \n, and so do my
changes. So no new problem due to that. The only change is
that all data returned by the C function is actually added to
the Lua string. And again we only need to do additional hiccup
because this really old C function does not return the actual
written data directly. This is why the Lua code already had to
scan for the trailing 0. Just that the trailing 0 is not the real
indicator of bytes written.

With a more proper C function API like the Posix getline there would
not be the need for any of this and the actual length would be
directly returned already:

 ssize_t getline(char **lineptr, size_t *n, FILE *stream);

[1] Some older, simpler operating systems [2] never bothered storing the
actual length of a file on disk, but rather, just indicated the
number of blocks (or rather, disk sectors) the file used (or even
worse---linked blocks together, so getting a "size" of a file was a
very expensive operation).  All files were multiple sector sizes.
For text files (or text-like files) where not all of the last sector
was actually used, a marker byte was used to indicate the actual end
of file [4] if it mattered.  The byte used depended upon the
operating system (or even just a convention, not a real standard).

[2] Say, CP/M, the direct ancestor of MS-DOS [3].

[3] Which is the direct ancestor of Microsoft Windows.

[4] Executable files never had this marker, so executable sizes were
always divisible by the default sector (or block) size.  Other
binary formats could contain a field for the actual size, but it was
up to the format.  The operating system didn't care.

[5] Yes, it just works under Unix.  But that's because Unix treats all
files as a sequence of bytes.  

Great. My code does not alter any of this. If the C run-time of those
ancient systems does not handle this internaly the frets function, than
the current Lua implementation may probably already return random
garbage at the end, as the current implementation is scanning for \0
and checking for \n to read more.

-- 
 ExactCODE GmbH, Jaegerstr. 67, DE-10117 Berlin
 http://exactcode.com | http://exactscan.com | http://ocrkit.com | http://t2-project.org | http://rene.rebe.de

-- 
 ExactCODE GmbH, Jaegerstr. 67, DE-10117 Berlin
 DE Legal: Amtsgericht Berlin (Charlottenburg) HRB 105123B, Tax-ID#: DE251602478
 Managing Director: René Rebe
 http://exactcode.com | http://exactscan.com | http://ocrkit.com | http://t2-project.org | http://rene.rebe.de