lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Hi,

On Feb 20, 2014, at 22:11 , Craig Barnes wrote:

lines operates on streams, which on most platforms these days only operate
in binary mode anyway.

What has "most platforms" got to do with anything? The criteria for inclusion
is "conforms to the the C89 standard", not "works on most platforms".

It has to do that some artificial limitations are discussed that simply do not exist.
Not even on Windows which only translates line endings, not \0.

My popular system Linux -running most internet servers and such you know-
does not know about text files and treads all bytes equally. So do all the
BSD, Mac OS X and whatnot.

Well the standard does make a distinction between text files and binary files,
as does Windows.

For the line ending! And the ISO C standard paper I found says:

The external representations in a text file need not be identical to the internal representations, and are outside the scope of this International Standard.

NOTHING ELSE! Aside that lines must at least be 254 bytes long:

An implementation shall support text files with lines containing at least 254 characters, including the terminating new-line character. The value of the macro BUFSIZ shall be at least 256.

3. \0 is not whitespace.
Is there an official standard for this?

Yes, the C standard:

"white-space characters are the following: space (' '), form feed
('\f'), new-line
('\n'), carriage return ('\r'), horizontal tab ('\t'), and vertical
tab ('\v'). In the
"C" locale, isspace returns true only for the standard white-space characters."

(ISO/IEC 9899:1999 §7.4.2.1)

I just want everything the C library returned in the string. And I did not
complain, I proposed working patches to do so.

Also known as "implementation defined behaviour" or "not standard".

No it is probably called according to the standard, the same ISO draft copy I found says (like the man pages on Linux and Mac):

The fgets function reads at most one less than the number of characters specified by n from the stream pointed to by stream into the array pointed to by s. No additional characters are read after a new-line character (which is retained) or after end-of-file. A null character is written immediately after the last character read into the array.

It does NOT say:

MAY stop on \0

it does NOT say:

MAY copy, or clobber, additional bytes after the \n was encountered.

Or similar magic some try to imply here.

It also does not say interpret the returned string purely on the first \0 terminator. It explicitly states the NEW-LINE CHARACTER (WHICH IS RETAINED) which makes this the obvious delimiter when reading in lines newline delimited.

The next time you parse a text file which accidental has a \0 somewhere you
probably want this bug fix, too ;-) Especially after you spend hours to
figure out what is going on, ...

Or you could just use the 4 line alternative I already posted in this thread.

Why is it you insists so much to refuse to improve a Lua core function to return more precisely what the underlying system actually returned?

-- 
 ExactCODE GmbH, Jaegerstr. 67, DE-10117 Berlin
 http://exactcode.com | http://exactscan.com | http://ocrkit.com | http://t2-project.org | http://rene.rebe.de