lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Thu, 24 Mar 2022 at 14:51, Thijs Schreijer <thijs@thijsschreijer.nl> wrote:
> With 'file:read’ the description of the “l” (or “*l” in previous
> editions) reads:
>  - "l": reads the next line skipping the end of line

end of line is truculent. Normally on Unix it is LF. On CP/M, MSDOS,
Windows, and internet, it is supposed to be CR+LF, but they do weird
things as it is not easy to do it right without lookahead /
lookbehind.

And I've worked in several systems, old ones, which did have a built
in version of text files, they did noty use line terminators.

> When testing this on Windows and Mac, it turns out only an LF (char
> 10) is recognised as a line end.

Are you sure it is this way? Mac uses standard unix convenction, LF is
line end. In windows it depends, I've used runtimes which used ONLY
CR+LF, others which used optional CR+LF, others which just eat all
CRs. ( "a\r\nb\nc\r\r\nd\re\n\r\n" will be read as "a","b\nc\r",
"d\re\n" in the first, a, b, c\r, d\re, "" in the second, a,b,c,de,""
in the 3rd ) ( may have got myself computed but you can see the
differences ).

ALL the windows runtimes I've used NEED a \n for end of line, some
need more stuff. IIRC dropping all \r was a popular choice because it
is so easy to code. What I mean, is window regognicing only a LF as
line ended or is it recognizing several things, all of them containing
at least an LF or terminated by an LF as such?

> This means that reading a Windows based text file, with CRLF as
> line endings, the returned lines will have a trailing CR (char 13).

This is why you have ASCII mode in FTP and ZIP. If you transfer TEXT
files as binary between machines with different line ending
conventions, you are going to have problems.


> My expectations were that it would in all cases treat CRLF and LF
> The same way, similar to the way the Lua source code can be read.

Do not use compiler behaviour as a comparison. In Unix end of line is
LF, a CR is just a normal WHITESPACE character. In C source code
whitespace at end of line is nearly always not significant, so you
could normally compile CR+LF terminated sources UNLESS you used
continuation lines ( backslash+end of line ) where the trailing
whitespace bites you ( it may be solved in recent compilers, have not
hit it in a decade or more ) . In lua whitespace at end of line is,
IIRC, not significant.

> Is there a specific reason for this behaviour? Did I misinterpret something?

There are several, but you'll need to post more details of what you
are exactly doing to let them be explained.

Francisco Olarte.