lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On 2014-02-20 21:15, René Rebe wrote:

On Feb 20, 2014, at 21:03 , Dirk Laurie wrote:

1. io.lines operates on text files.

lines operates on streams, which on most platforms these days only
operate in binary mode anyway.

Is there an official standard for this?

2. Text files may not contain any control character except whitespace.

Not necessary. ISO C does not describe what can be contained in a
physical representation of a text stream. Loosely speaking the standard
states that (C11 7.21.2[2] / C99 7.19.2[2] / C89 7.9.2[2]):

if ( fgetc(textmodeF) == 'A' )
    printf("Hello, somebody called fputc('A', ...); earlier\n"
           "... but I do not know what value is in an underlying file\n"
           "... even I do not know how many bytes was read\n"
           "    to receive this 'A'\n");

if ( fgetc(textmodeF) == '\0' )
    printf("... bloooorp ...\n");

if ( fgetc(binmodeF) == 'A' )
    printf("Hello, somebody called fputc('A', ...); earlier\n"
           "... and there is 'A' in an underlying file\n"
           "... and I read 1 byte\n");

There is a problem with '\0' even in a binary stream (C11 7.21.2[3] /
C99 7.19.2[3] / C89 7.9.2[3]) as zeroes can be freely appended at the
end of binary streams.

As mentioned above, now there are no text files (on most platforms).
AFAIR, the last widespread ones were CP/M 8080/Z80 floppy disk toys.

All (even on Win) are binary files. A common treating of text/binary
mode (under Win - probably the sole distinguishing implementation) is a
bit erroneous: "use text-mode for .BAT, .CMD, .TXT files, and
binary-mode for .DAT, .EXE, .DOC, and others; otherwise you would expect
a nuclear launch".

No. According to the standard (C11 7.21.2[2] / C99 7.19.2[2] / C89
7.9.2[2]) use binary-mode if you want to read the exact content of a
file, or text-mode if you allow: ,,Characters may have to be added,
altered, or deleted on input and output to conform to differing
conventions for representing text in the host environment.'' and want to
use printable, '\n' and '\t' chars only. ,,Use'': be sure that a writing
X means a reading the same X.

However, what is the common thing of ,,C Library's stream mode'' and
,,operating of ,,fgets''?? Admittedly, ,,io.open'' opens a file in
text-mode, but the same problem occurs when we are using ,,file:lines''
or ,,file:read'' with files opened in binary-mode, or with a
non-distinguishing implementation of the C library. Simply, ,,fgets'' is
not for reading of zeroes.

My popular system Linux -running most internet servers and such you
know- does not know about text files and treads all bytes equally. So do
all the BSD, Mac OS X and whatnot.

Is there an official standard for this?

3. \0 is not whitespace.

Is there an official standard for this?

Just the thing! :) The C Standard:

C89-C11 5.2.1[2]: "... A byte with all bits set to 0, called the null
character, shall exist in the basic execution character set; it is used
to terminate a character string."

C99-C11 6.4.4.4[12] / C89 6.1.3.4[12]: "The construction '\0' is
commonly used to represent the null character."

Unless somebody allows that printable characters can be used to
terminate a character string.

In other words, the behaviour complained of is that a standard library
routine when given data that does not conform to specification gives
undefined results.

Actually the C standard library of all platforms, including Windows,
works just fine.

Indeed, the standard does not explicitly nor implicitly (i.e. by
providing a ,,shall'' or ,,shall not'' requirement) define that thing as
,,undefined behavior''.

I just want everything the C library returned in the string. And I did
not complain, I proposed working patches to do so.

The next time you parse a text file which accidental has a \0 somewhere
you probably want this bug fix, too ;-) Especially after you spend hours
to figure out what is going on, …

Certainly, ,,... is '\0' aware'' sounds like Zeus' thunderbolt and makes
the world more beautiful. IMHO, at least, a documentation note should be
appended to a description of ,,io.lines'', ,,file:lines'', ,,file:read''
- it would have greatly shortened that figuring-out-time.

-- best regards

Cezary H. Noweta