[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: io:lines() and \0
- From: Tim Hill <drtimhill@...>
- Date: Sat, 22 Feb 2014 09:58:33 -0800
On Feb 22, 2014, at 6:47 AM, steve donovan <steve.j.donovan@gmail.com> wrote:
> On Sat, Feb 22, 2014 at 4:30 PM, Andrew Starks <andrew.starks@trms.com> wrote:
>> I don't want to be the guy that advocates for banning dogs, though. It just
>> seems that "lines" doesn't deserve the elevated status it enjoys, when a
>> simple and more general alternative could serve as well and be more robust.
>
> But it's a straightforward API, and it works well - assuming you have
> trusted, non-garbled and non-pathological ASCII text files. When I
> write text-wrangling scripts, practically the first thing I write is
> 'for line in io.lines(f)...'. So I'm particularly fond of this dog.
>
> I see the extension for those cases where you need to be paranoid, and
> to deal with odd files which contain nuls and yet are sufficiently
> structured that '\n' actually is a delimiter. Because naturally
> applying even the most intelligent readline() to an arbitrary file
> will give arbitrary results.
>
Being purely pragmatic for a moment, if it were me I’d have a version of io:lines() that had the following properties:
— Reads a byte stream until either EOF, the first ‘\n’ (LF) character, or a specified maximum number of bytes is read.
— Returns those bytes along with an indication of the termination method (EOF, LF or max count).
— Has an optional “legacy” mode where, if the termination method is LF and the last byte in the input is CR, this is discarded from the returned bytes (along with a flag to indicate this).
… and that’s all. This simple model can handle ASCII, control characters, escape sequences, and UTF-8. It treats LF purely as a record separator (NOT a terminator) and as a convenience can handle legacy CR+LF separated lines (though not CR-only ones, which are truly a dying breed). Since it is inclusive rather than exclusive it can be used as a building block for more sophisticated text processing. It makes no attempt at interpretation of the byte stream beyond breaking it up at LF separators.
—Tim
- References:
- io:lines() and \0, René Rebe
- Re: io:lines() and \0, steve donovan
- Re: io:lines() and \0, René Rebe
- Re: io:lines() and \0, Enrico Colombini
- Re: io:lines() and \0, steve donovan
- Re: io:lines() and \0, René Rebe
- Re: io:lines() and \0, Craig Barnes
- Re: io:lines() and \0, René Rebe
- Re: io:lines() and \0, Sean Conner
- Re: io:lines() and \0, René Rebe
- Re: io:lines() and \0, Tim Hill
- RE: io:lines() and \0, Thijs Schreijer
- Re: io:lines() and \0, Enrico Colombini
- Re: io:lines() and \0, Andrew Starks
- Re: io:lines() and \0, Enrico Colombini
- Re: io:lines() and \0, Andrew Starks
- Re: io:lines() and \0, steve donovan
- Re: io:lines() and \0, Andrew Starks
- Re: io:lines() and \0, steve donovan