Re: io:lines() and \0

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: io:lines() and \0
From: Sean Conner <sean@...>
Date: Thu, 20 Feb 2014 13:29:15 -0500

It was thus said that the Great Francisco Olarte once stated:
> 
> As I hinted above, if in a concrete implementation fgetc, or its macro
> cousin getc, can return a null byte from a text-opened ( that would be
> non-binary opened ) FILE, that means it defines nulls as chars to me,
> so fgets should handle them the same way and the programmer can be
> understood, but not forgiven. For me fgets(buf, size, file) should be
> equivalent to a getc loop with some checkings for size, \n and EOF.
> And, from what we've seen on this thread, it seems the libC
> implementation do it that way. Is lua lib which does not.

  No, the Lua lib uses fgets().  The issue is that fgets() returns a
pointer, not a size, and thus, any embedded '\0' in the data are
problematic, because in C, strings are teminated by '\0'.

> I could tolerate if it interpreted '\0' as '\n', heck, I did tolerate
> MSC discarding \015 ( which is not the same as mapping '\015\012' to
> '\n' ), but reading past the null and then discarding the chars is too
> much.

  How is discarding '\015' any different from mapping "\015\012" to "\012"?

> I think the main problem with lua now would be it does not clearly
> specify file with embeded nuls are not safe to read by lines. 

  I'm not even sure the C Standard covers that.

> And it is a shame The C library does not say anything about wether
> fgets() modifies any part of the buf PAST the null it inserted,
> otherwise we could use memset(anything) and then search for the nul
> from the end of the string:

  The problem with that is if the file in question has multiple NUL byte
runs (enough to fill a buffer, or even an unfortunate alignment where the
last byte read in the buffer is NUL).

> But i would bet one day after putting this on the wild someone fires
> it to a library which, say, helpfully zeroes the whole buf before
> reading to aid in debug.

  Nah, for debugging purposes, you fill memory (via malloc() or
on the stack) with a non-0 pattern [1].

  -spc

[1]	0xCC on x86; 0xA5 on just about anything else.  Why?  On x86, 0xCC
	is INT 3 (single byte instruction), which will be caught by the OS. 
	It will be a large enough positive number, or a significantly sized
	negative number to show up.  It's also most likely *not* to be a
	valid address (0xCCCCCCCC).

	On the 68k, 0xA5 is an illegal instruction so that will be trapped.
	It makes for an odd address, which reads larger than a byte will
	trap.  And again, it makes for a large unsigned number, or a
	significant negative number to be distinctive, and even for a byte
	read, it will most likely be an invalid address.

	For debugging, you want unique values like that that promote bugs.

Follow-Ups:
- Re: io:lines() and \0, Enrico Colombini
- Re: io:lines() and \0, Francisco Olarte

References:
- Re: io:lines() and \0, Craig Barnes
- Re: io:lines() and \0, René Rebe
- Re: io:lines() and \0, Craig Barnes
- Re: io:lines() and \0, Sean Conner
- Re: io:lines() and \0, René Rebe
- Re: io:lines() and \0, René Rebe
- Re: io:lines() and \0, Sean Conner
- Re: io:lines() and \0, Francisco Olarte
- Re: io:lines() and \0, Enrico Colombini
- Re: io:lines() and \0, Francisco Olarte

Prev by Date: Re: io:lines() and \0
Next by Date: Re: io:lines() and \0
Previous by thread: Re: io:lines() and \0
Next by thread: Re: io:lines() and \0
Index(es):
- Date
- Thread