lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Hi Sean.

On Thu, Feb 20, 2014 at 10:51 PM, Sean Conner <sean@conman.org> wrote:
>   And what I (and a few others) are arguing, is that C makes a distinction
> between a text file, and a binary file.  Using functions meant for text
> files on binary files are not specified to return meaningful results.

I know the distinction. It's been beaten to death. The only thing I
really woud want now is some pargraph in the docs stating that if you
read files with nuls, or any other control character, in lua by lines
you are going to get strange results. In more than 30 years lua is the
first language I've found with nul-safe string with silently truncates
lines with nuls. And, what makes it worse, it works with nearly every
control char, which can lead to very hard to trace bugs. These could
be solved by just stating 'DO NOT USE LINE FUNCTION IN FILES WITH
CONTROL CHARS'.

>   The reason for the distinction is that different systems used different
> methods to mark the end of a line, and different methods to mark the end of
> a text file.  In order to standardize to a known set of behaviors on wildly
> different systems [2] without breaking too much existing code.

You do not need to point me to pre-89 magazines, I was working in C
then and have my share of stories. And the most bizarre ones where
with systems which normally did not show up in byte.

>   The fact that it *almost* works in C is irrelevant.  Lua is targetted
> towards C89, and expecting functions that work for text files to work for
> binary files is expecting too much.  And granted, the Lua documentation
> should probably mention this.

Well, no we are making progress.

....
>> >   How is discarding '\015' any different from mapping "\015\012" to "\012"?
>> Isn't obvious? The simpler example: '\015' => '', some more:
>> 'ab\015cd\015\012' => abcd\n vs. ab\rcd\n
>   I think this is where we differ---if I know I'm reading a binary file, I
> open as a binary file, and avoid fgets(), since it's a binary file---either
> it has no structure so using fgets() is silly (and I use fread() or
> fgetc()), or it has a (to me) known structure, so using fgets() is still
> silly (and I use fread() or fgetc()).

Are you kidding me? What has 'reading a binary' have to do with
'discarding \015 is not the same as mapping' ? And, by the way, the
\015 discard happened in some msdos runtimes with fgets, fgetc, getc
and fread. It was quirky but at least it was consistent.


>> >   The problem with that is if the file in question has multiple NUL byte
>> > runs (enough to fill a buffer, or even an unfortunate alignment where the
>> > last byte read in the buffer is NUL).
>> Not an issue. If C guaranteed me fgets would not touch the buffer
>> after the null, I can fill it with ones, and as I know it MUST have a
>> null at the end I can scan backwards, the first one is the terminating
>> null.

>   Sigh.  That *still* wouldn't work.  Assume (for sake of argument) a buffer
> size of 8 bytes.  You fill it with all ones (0xFF):
>         FF FF FF FF FF FF FF FF
> And you read the following binary file using your version of fgets():
>         34 89 00 FF 23 08 FF FF
> So the buffer now contains:
>         34 89 00 FF 23 08 FF FF

Are you trying to kid me again? My buffer will contain a terminating
null, that is guaranteed by fgets. Where is it?

> and thus you return:
>         34 89 00 FF 23 08
> which is *NOT* the correct data (it's truncated).

No, my buffer would have 34 89 00 FF 23 08 FF 00, I would scan
backwards and hit the first 00, push 34 89 00 FF 23 08 FF, notice full
buffer, read again, get FF 00 FF FF ..., scan backwards again till the
00, add the FF and return as buffer was not full.

I'm beginning to feel like you are intentionally putting worong words
in my mouth. Excuse me if that is not the case, but due to my past
experience this is how I'm feeling.

>   A binary file can be expected to have any value, so any value you use a
> "filler" can lead to data truncation (I'm not saying it always will lead to
> data truncation, but that it can).

No, it cannot if you use a function with guarantees a determined end
byte, like fgets, and you prefill the buffer with any other char. Is
the same as the padding of files in crypto, the source file can have
any final sequence, but if you add a \1 byte and then \0 bytes up to a
block size wyou can strip them. Remember fgets has to make some
guarantees, and it guarantees the terminating null byte ( on non error
returns ).


Francisco Olarte.