[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: io:lines() and \0
- From: Philipp Janda <siffiejoe@...>
- Date: Sat, 22 Feb 2014 10:07:20 +0100
Am 22.02.2014 06:10 schröbte Dirk Laurie:
2014-02-22 0:51 GMT+02:00 Tim Hill <drtimhill@gmail.com>:
So there are only two questions to answer:
(1) Is the patch a significant improvement?
(2) Is it going to be adopted?
I think the answer to (1) is yes, and the answer to (2) is no.
I’ve not seen any good, unbiased arguments as to why the
answer to (1) would be no.
Most arguments start from a position on whether the present
behaviour is a bug. The OP, for example, has chosen to win
friends and influence people by sarcastically describing what
Roberto is willing to do as "to cover a data corruption bug with
a change of the manual".
You can also see it as an improvement that enables you to detect invalid
lines containing a NUL character.
In that sense they are all biased. If you do not agree that it is
a bug, then a little clarification in the manual is fully satisfactory,
and the answer to (1) is no because: if it ain't broke, don't fix it.
But I will give you a good reason not based on that.
The change to the manual that Roberto intends to make
covers other non-portable characters too: "... nor any other
control character other than newlines and horizontal tabs."
This change to the manual
a) applies only to text streams (except the NUL quirk).
b) applies only to certain implementations of libc (except the NUL
quirk), e.g. all POSIX implementations are immune
c) applies to other input functions when used on text streams as well
(except maybe the NUL quirk)
d) lacks a definition of "control character"
e) is not completely correct according to ISO C (unless you define "Lua
control character" == "ISO C non-printable character", and require that
all lines are terminated with a newline)
f) if corrected makes it hard to check whether your input files are
valid (the set of printable characters is implementation-defined and
locale-specific, and you can only check the bytes of a file after you've
read it)
g) if corrected and followed makes dealing with UTF-8 text files (or
ISO-xxxx) hard (all bytes >= 128 are considered non-printable in the
default locale here on my machine).
So the manual change *is* an improvement (and c), d), and e) could be
fixed), but only in the sense that it makes you (the Lua developer)
realize that there may be something amiss when you use `file:lines()` or
`io.lines()` (or text streams) at all ...
And it probably won't help the end user of a Lua program.
Vertical tab, for example. Escape sequences for highlighting
text on your terminal. Page feeds. Ctrl-Z. All of them may
give unportable results.
The proposed patch caters for the promotion of \0 is to be an
honorary non-control character.
You can't just liberate the beatiful butterfly called \0. There's
a whole Pandora's box full of creatures waiting to emerge.
Most of Pandora's box can be avoided using binary streams, and the only
thing Lua could do something about is the NUL quirk.
Philipp
- References:
- io:lines() and \0, René Rebe
- Re: io:lines() and \0, steve donovan
- Re: io:lines() and \0, René Rebe
- Re: io:lines() and \0, Enrico Colombini
- Re: io:lines() and \0, steve donovan
- Re: io:lines() and \0, René Rebe
- Re: io:lines() and \0, Craig Barnes
- Re: io:lines() and \0, René Rebe
- Re: io:lines() and \0, Sean Conner
- Re: io:lines() and \0, René Rebe
- Re: io:lines() and \0, Tim Hill
- Re: io:lines() and \0, Dirk Laurie