lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Feb 22, 2014, at 6:47 AM, steve donovan <steve.j.donovan@gmail.com> wrote:

> On Sat, Feb 22, 2014 at 4:30 PM, Andrew Starks <andrew.starks@trms.com> wrote:
>> I don't want to be the guy that advocates for banning dogs, though. It just
>> seems that "lines" doesn't deserve the elevated status it enjoys, when a
>> simple and more general alternative could serve as well and be more robust.
> 
> But it's a straightforward API, and it works well  - assuming you have
> trusted, non-garbled and non-pathological ASCII text files.  When I
> write text-wrangling scripts, practically the first thing I write is
> 'for line in io.lines(f)...'.   So I'm particularly fond of this dog.
> 
> I see the extension for those cases where you need to be paranoid, and
> to deal with odd files which contain nuls and yet are sufficiently
> structured that '\n' actually is a delimiter.  Because naturally
> applying even the most intelligent readline() to an arbitrary file
> will give arbitrary results.
> 

Being purely pragmatic for a moment, if it were me I’d have a version of io:lines() that had the following properties:

— Reads a byte stream until either EOF, the first ‘\n’ (LF) character, or a specified maximum number of bytes is read.
— Returns those bytes along with an indication of the termination method (EOF, LF or max count).
— Has an optional “legacy” mode where, if the termination method is LF and the last byte in the input is CR, this is discarded from the returned bytes (along with a flag to indicate this).

… and that’s all. This simple model can handle ASCII, control characters, escape sequences, and UTF-8. It treats LF purely as a record separator (NOT a terminator) and as a convenience can handle legacy CR+LF separated lines (though not CR-only ones, which are truly a dying breed). Since it is inclusive rather than exclusive it can be used as a building block for more sophisticated text processing. It makes no attempt at interpretation of the byte stream beyond breaking it up at LF separators.

—Tim