[Date Prev][Date Next][Thread Prev][Thread Next]
- Subject: Re: EndOfLine Pattern Matching
- From: Philippe Lhoste <PhiLho@...>
- Date: Thu, 4 Apr 2002 10:29:17 +0200 (MEST)
> How can I match all possible EndOfLines in strfind() with a Lua pattern?
> EndOfLine is \n\r
> or \r\n
> or \n
> or \r
> but \n\n are two EndOfLines with an empty string between
> also \r\r
> Since one hour I try to define a pattern that matches all possible
> EndOfLine definitions.
Except in badly formed text files, or in cases I am not aware of (but I
would like to know, if any), I doubt you will meet the \n\r combination. \r\n is
used in Dos/Windows world, \n in Unix-like world, and \r in Macintosh/Apple
world, but I am not aware of a \n\r combination.
Gavin stated that it is quite rare to see mixed EOL in a text file.
It is true, or actually, it shouldn't occurs.
But it is possible. I saw source files on network that can be accessed from
Unix or Windows boxes, and some editors were able to manage the foreign EOL
(display it correctly) but still inserted the EOL of their system when the
user was hitting Return...
SciTE, among others, automatically detects the EOL used in a file and sets
its EOL mode accordingly, but the user can choose to switch mode without
converting the whole file...
Anyway, I won't answer your question directly, as others did it more
precisely that I could.
But I give my own function that try and detect the EOL mode in a given
string (that could be a whole file content...). Despite what I wrote above, for
ease of processing, I don't manage mixed EOLs...
-- Get the end of line used in the given string and return it.
-- Check only the first one, as we assume the string is consistent.
local eol1, eol2, eol
b, _, eol1 = strfind(string, "([\r\n])")
if b == nil then
return nil -- no EOL in this string
-- Care is taken in case the first line finishes with two EOLs
eol2 = strsub(string, b+1, b+1)
if eol1 == '\r' then
if eol2 == '\n' then
-- Windows style
eol = '\r\n'
-- Mac style
eol = '\r'
else -- eol1 == '\n'
-- Unix style
eol = '\n'
Note: if preserving the content/format of the file isn't critical, and if it
is small enough to fit in memory, you can first replace all \r without \n
around by \n, then remove all remaining \r in the file.
Something like (untested) gsub(file, "[^\n]\r[^\n]", "\n") and gsub(file,
Philippe Lhoste (Paris -- France)
Professional programmer and amateur artist
GMX - Die Kommunikationsplattform im Internet.