[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: io:lines() and \0
- From: Sean Conner <sean@...>
- Date: Thu, 20 Feb 2014 13:29:15 -0500
It was thus said that the Great Francisco Olarte once stated:
>
> As I hinted above, if in a concrete implementation fgetc, or its macro
> cousin getc, can return a null byte from a text-opened ( that would be
> non-binary opened ) FILE, that means it defines nulls as chars to me,
> so fgets should handle them the same way and the programmer can be
> understood, but not forgiven. For me fgets(buf, size, file) should be
> equivalent to a getc loop with some checkings for size, \n and EOF.
> And, from what we've seen on this thread, it seems the libC
> implementation do it that way. Is lua lib which does not.
No, the Lua lib uses fgets(). The issue is that fgets() returns a
pointer, not a size, and thus, any embedded '\0' in the data are
problematic, because in C, strings are teminated by '\0'.
> I could tolerate if it interpreted '\0' as '\n', heck, I did tolerate
> MSC discarding \015 ( which is not the same as mapping '\015\012' to
> '\n' ), but reading past the null and then discarding the chars is too
> much.
How is discarding '\015' any different from mapping "\015\012" to "\012"?
> I think the main problem with lua now would be it does not clearly
> specify file with embeded nuls are not safe to read by lines.
I'm not even sure the C Standard covers that.
> And it is a shame The C library does not say anything about wether
> fgets() modifies any part of the buf PAST the null it inserted,
> otherwise we could use memset(anything) and then search for the nul
> from the end of the string:
The problem with that is if the file in question has multiple NUL byte
runs (enough to fill a buffer, or even an unfortunate alignment where the
last byte read in the buffer is NUL).
> But i would bet one day after putting this on the wild someone fires
> it to a library which, say, helpfully zeroes the whole buf before
> reading to aid in debug.
Nah, for debugging purposes, you fill memory (via malloc() or
on the stack) with a non-0 pattern [1].
-spc
[1] 0xCC on x86; 0xA5 on just about anything else. Why? On x86, 0xCC
is INT 3 (single byte instruction), which will be caught by the OS.
It will be a large enough positive number, or a significantly sized
negative number to show up. It's also most likely *not* to be a
valid address (0xCCCCCCCC).
On the 68k, 0xA5 is an illegal instruction so that will be trapped.
It makes for an odd address, which reads larger than a byte will
trap. And again, it makes for a large unsigned number, or a
significant negative number to be distinctive, and even for a byte
read, it will most likely be an invalid address.
For debugging, you want unique values like that that promote bugs.
- References:
- Re: io:lines() and \0, Craig Barnes
- Re: io:lines() and \0, René Rebe
- Re: io:lines() and \0, Craig Barnes
- Re: io:lines() and \0, Sean Conner
- Re: io:lines() and \0, René Rebe
- Re: io:lines() and \0, René Rebe
- Re: io:lines() and \0, Sean Conner
- Re: io:lines() and \0, Francisco Olarte
- Re: io:lines() and \0, Enrico Colombini
- Re: io:lines() and \0, Francisco Olarte