lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Am 21.02.2014 23:51 schröbte Tim Hill:

(my) thread summary:

— Using the Lua library to read text files yields unpredictable/unexpected results if the file contains embedded NUL characters.
— A patch has been suggested that fixes this, at the expense of some subtle behavior changes that only occur if you rely on the old NUL behavior

Argument A:
— A file containing a NUL is not a valid text file as per the ANSI spec, so this is garbage-in, garbage-out and expected behavior.

Whoa, who said that?! The ANSI spec says that a NUL character *might* not show up in a text stream even if you have explicitly written it into the file, but of course it is allowed to show up (and apparently it does for some common libc implementations). But in times of UTF-8 and the internet it is better to open your files in binary mode anyway where it will definitely show up ...

One could argue that you read strings from text files, and C strings cannot contain embedded NUL bytes, so text files shouldn't either, but we are talking about Lua here.

The rest of the post I agree with.

— You should not try to read non-text files in text mode.
— Lua behaves this way, live with it.

Argument B:
— Programs should be robust when dealing with unexpected input (malformed text files), and this should be detectable.
— The patch allows this to be detected (the NUL will be in the Lua string and can be parsed).

Auxiliary argument:
— There are plenty of ways text file reading can fail (e.g. absurdly long lines) , this is just one of them. We can’t fix them all so we should not fix this one.

All arguments come down to preference and philosophy; there is nothing LOGICALLY wrong with any of them. The auxiliary argument I personally feel is bogus; I’m surprised it was suggested here to be honest.

But there is a PRACTICAL issue here. Text files are EXTERNAL data, and are therefore outside the control of Lua and the developer. Arguing that it’s not the programs fault it exploded because “you should not have fed it a non-text file” is bogus. Taken to the extreme, you might as well omit ALL error checking in code and just crash with “user error — aborted” panics.

imho, this all comes down to something simple: good code is robust code, and robust code handles malformed input as gracefully as possible.

So how do you handle malformed text files?

With Lua as written:
1. Open file in binary mode and scan it for embedded NUL characters. Fail if any found.
2. Reopen the file in text mode
3. Read lines, parsing and validating them as needed

With the suggested patch:
1. Open file in text mode
2. Read lines, parsing and validating them as needed (including NUL checking)

Is the patched approach better to current Lua in any qualitative or quantitative way?
— Well, it only reads the source file once, so it is more efficient (though caching would help with the 2nd read probably).
— It isolates validation in one stage, handling NUL and other validation in the same place.
— It works cleanly in situations (such as pipes) where it is impossible or inefficient to read the file multiple times.
— There is unlikely to be a detectable performance penalty.

So there are only two questions to answer:
(1) Is the patch a significant improvement?
(2) Is it going to be adopted?

I think the answer to (1) is yes, and the answer to (2) is no. I’ve not seen any good, unbiased arguments as to why the answer to (1) would be no.

—Tim


Philipp