Re: io:lines() and \0

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: io:lines() and \0
From: Sean Conner <sean@...>
Date: Thu, 20 Feb 2014 16:51:43 -0500

It was thus said that the Great Francisco Olarte once stated:
> 
> I should have made it more explicit. If you read 'ab\000cd\012' on
> unix, you end up with a buffer containing 'ab\0cd\n'. In C you are not
> going to be able to distinguish it with str*, ok, that's a C problem,
> that's why you never use fgets when you need to be null resistant in
> C.

  And what I (and a few others) are arguing, is that C makes a distinction
between a text file, and a binary file.  Using functions meant for text
files on binary files are not specified to return meaningful results.

  The reason for the distinction is that different systems used different
methods to mark the end of a line, and different methods to mark the end of
a text file.  In order to standardize to a known set of behaviors on wildly
different systems [2] without breaking too much existing code.

  The fact that it *almost* works in C is irrelevant.  Lua is targetted
towards C89, and expecting functions that work for text files to work for
binary files is expecting too much.  And granted, the Lua documentation
should probably mention this.

> >> I could tolerate if it interpreted '\0' as '\n', heck, I did tolerate
> >> MSC discarding \015 ( which is not the same as mapping '\015\012' to
> >> '\n' ), but reading past the null and then discarding the chars is too
> >> much.
> >
> >   How is discarding '\015' any different from mapping "\015\012" to "\012"?
> 
> Isn't obvious? The simpler example: '\015' => '', some more:
> 'ab\015cd\015\012' => abcd\n vs. ab\rcd\n

  I think this is where we differ---if I know I'm reading a binary file, I
open as a binary file, and avoid fgets(), since it's a binary file---either
it has no structure so using fgets() is silly (and I use fread() or
fgetc()), or it has a (to me) known structure, so using fgets() is still
silly (and I use fread() or fgetc()).
  
> >   The problem with that is if the file in question has multiple NUL byte
> > runs (enough to fill a buffer, or even an unfortunate alignment where the
> > last byte read in the buffer is NUL).
> 
> Not an issue. If C guaranteed me fgets would not touch the buffer
> after the null, I can fill it with ones, and as I know it MUST have a
> null at the end I can scan backwards, the first one is the terminating
> null.

  Sigh.  That *still* wouldn't work.  Assume (for sake of argument) a buffer
size of 8 bytes.  You fill it with all ones (0xFF):

	FF FF FF FF FF FF FF FF

And you read the following binary file using your version of fgets():

	34 89 00 FF 23 08 FF FF

So the buffer now contains:

	34 89 00 FF 23 08 FF FF

and thus you return:

	34 89 00 FF 23 08

which is *NOT* the correct data (it's truncated).  

  A binary file can be expected to have any value, so any value you use a
"filler" can lead to data truncation (I'm not saying it always will lead to
data truncation, but that it can).

> >> But i would bet one day after putting this on the wild someone fires
> >> it to a library which, say, helpfully zeroes the whole buf before
> >> reading to aid in debug.
> >   Nah, for debugging purposes, you fill memory (via malloc() or
> > on the stack) with a non-0 pattern [1].
> 
> <I> do, and possibly <you> and <a lot of people> do, using ypur quoted
> 0xCC, the typical 0xdeadbeaf or 0xa5a5a5a5, but I'm nearly sure there
> is one which nullifies it.

  As I mentioned, I pick the value depending on the CPU architecture, with
an eye towards crashing if the value(s) is(are) executed, used as an index,
as a pointer, or printed (not likely for printing, but seeing odd results is
still helpful).  

  -spc
  
[1]	NOT USED HERE

[2]	For some real fun, check out old computer related magazines [3]
	prior to 1989 (ratification of the ANSI C Standard).

[3]	https://archive.org/details/computermagazines  A good one would be
	Byte Magazine [4].

[4]	https://archive.org/details/byte-magazine

Follow-Ups:
- Re: io:lines() and \0, René Rebe
- Re: io:lines() and \0, Francisco Olarte

References:
- Re: io:lines() and \0, Craig Barnes
- Re: io:lines() and \0, Sean Conner
- Re: io:lines() and \0, René Rebe
- Re: io:lines() and \0, René Rebe
- Re: io:lines() and \0, Sean Conner
- Re: io:lines() and \0, Francisco Olarte
- Re: io:lines() and \0, Enrico Colombini
- Re: io:lines() and \0, Francisco Olarte
- Re: io:lines() and \0, Sean Conner
- Re: io:lines() and \0, Francisco Olarte

Prev by Date: Re: io:lines() and \0
Next by Date: Re: io:lines() and \0
Previous by thread: Re: io:lines() and \0
Next by thread: Re: io:lines() and \0
Index(es):
- Date
- Thread