Re: io:lines() and \0

On Feb 20, 2014, at 20:29 , Cezary H. Noweta wrote:

On 2014-02-20 12:13, René Rebe wrote:
On Feb 19, 2014, at 19:42 , Roberto Ierusalimschy wrote:

First of all:

C99 7.19.3[11]:
C11 7.21.3[11]: "...The byte input functions read characters from the
stream as if by successive calls to the fgetc function."

So, as long as ,,fgetc()'' returned { 0, 0, '\n', ... }, ,,fgets''
should have returned { 0, 0, '\n', 0, ... }; { 0, 0, EOF } => { 0, 0, 0,
... }; and so on.

So for the special case of non newline terminated files we unfortunately
need to pre fill the whole buffer with \n.

There is no guarantee that the buffer is not trashed beyond a
terminating NUL. ,,fgets'' and '\0's cause that it is
impossible to determine the number of read chars unambiguously. Even if
we can fulfill the buffer with some magical NaN (NaC?) value, which is
impossible for ,,fgets'' to reproduce, still we have an ambiguity:
,,fgets'' returns { 0, 0, 0, '\n', 0, #, #, ... }, where ,,#'' is an
untouched placeholder => such buffer can come from ,,fgetc'' sequence:
0, EOF; or 0, 0, EOF; or 0, 0, 0, '\n'. Ending '\n', 0 could be remnants
of some strange binary=>text decoding, which can be done in the buffer.

The sole guarantees about a buffer content are: (1) untouched, if EOF at
the beginning; (2) valid data until an appended NUL, which is hard to
determine if a data contains NULs itself.

IMHO, a discussion about using ,,fgets'' to read zeroes is like a
discussion about using a microwave to boil an egg. Using ,,strlen'' on a
result of ,,fgets'' is very fine as ,,fgets'' is not for reading of 0s.
Just my few cents.

Btw. aside that you enter a field of high speculation about touching

bytes beyond the actual string copy (which no C library we looked a

so far does) we only have to pre fill the buffer with \n for a highly

exceptional case, namely: the last line of the file (EOF) not having a

newline. And yet my proposed patch handles this case if the C library

does not do your additional magical buffer trashing. For all other lines

of the file the termination is guaranteed to be a newline:

If a newline is read, it is stored into the buffer.

René

--
ExactCODE GmbH, Jaegerstr. 67, DE-10117 Berlin
http://exactcode.com | http://exactscan.com | http://ocrkit.com | http://t2-project.org | http://rene.rebe.de