[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: EndOfLine Pattern Matching
- From: RLake@...
- Date: Thu, 4 Apr 2002 10:06:41 -0500
I really wasn't going to get into this but I ended up thinking about it
whilst walking to work this morning.
I also think that \n\r is not a real line ending. If that's the case, you
could use the something like the following:
local s, len = 1, strlen(str)
while s < len do
local _, f, chomped = strfind(str, "([^\r\n]*)\r?\n?", s)
s = f + 1
-- do something with chomped, possibly exiting the loop with break
end
If you didn't want to extract the line, you could put the parentheses
around the eol expression instead. Then the chomped
line goes from s to f - strlen(eolchars)
This will also handle the case where the string does not have a terminating
line end of any form. I haven't tested this code and
I'm not certain what would happen if you used gsub, since the pattern will
match the zero-length string. I suspect you will get a
false match on the empty line at the end.
If you really believe that \n\r is possible, then you could use something
like this:
local s, len = 1, strlen(str)
while s < len do
local _, f, chomped, end_one, end_two = strfind(str, "([^\r\n]*)([\r\n]?)
([\r\n]?)", s)
if end_one == end_two then f = f - strlen(end_one) end
s = f + 1
-- do something with chomped, possibly exiting the loop with break
end
There are various other variations on this theme but I won't go into them.
Hope this helps, and good luck.
Rici
Philippe Lhoste
<PhiLho@gmx.net> To: Multiple recipients of list <lua-l@tecgraf.puc-rio.br>
Sent by: cc:
owner-lua-l@tecgraf. Subject: Re: EndOfLine Pattern Matching
puc-rio.br
04/04/02 03.29
Please respond to
lua-l
Markus wrote:
> How can I match all possible EndOfLines in strfind() with a Lua pattern?
>
> EndOfLine is \n\r
> or \r\n
> or \n
> or \r
>
> but \n\n are two EndOfLines with an empty string between
> also \r\r
>
> Since one hour I try to define a pattern that matches all possible
> EndOfLine definitions.
Except in badly formed text files, or in cases I am not aware of (but I
would like to know, if any), I doubt you will meet the \n\r combination.
\r\n is
used in Dos/Windows world, \n in Unix-like world, and \r in Macintosh/Apple
world, but I am not aware of a \n\r combination.
Gavin stated that it is quite rare to see mixed EOL in a text file.
It is true, or actually, it shouldn't occurs.
But it is possible. I saw source files on network that can be accessed from
Unix or Windows boxes, and some editors were able to manage the foreign EOL
(display it correctly) but still inserted the EOL of their system when the
user was hitting Return...
SciTE, among others, automatically detects the EOL used in a file and sets
its EOL mode accordingly, but the user can choose to switch mode without
converting the whole file...
Anyway, I won't answer your question directly, as others did it more
precisely that I could.
But I give my own function that try and detect the EOL mode in a given
string (that could be a whole file content...). Despite what I wrote above,
for
ease of processing, I don't manage mixed EOLs...
-- Get the end of line used in the given string and return it.
-- Check only the first one, as we assume the string is consistent.
function GetEol(string)
local eol1, eol2, eol
b, _, eol1 = strfind(string, "([\r\n])")
if b == nil then
return nil -- no EOL in this string
end
-- Care is taken in case the first line finishes with two EOLs
eol2 = strsub(string, b+1, b+1)
if eol1 == '\r' then
if eol2 == '\n' then
-- Windows style
eol = '\r\n'
else
-- Mac style
eol = '\r'
end
else -- eol1 == '\n'
-- Unix style
eol = '\n'
end
return eol
end
Note: if preserving the content/format of the file isn't critical, and if
it
is small enough to fit in memory, you can first replace all \r without \n
around by \n, then remove all remaining \r in the file.
Something like (untested) gsub(file, "[^\n]\r[^\n]", "\n") and gsub(file,
"\r", "\n").
Regards.
--
--=#=--=#=--=#=--=#=--=#=--=#=--=#=--=#=--=#=--
Philippe Lhoste (Paris -- France)
Professional programmer and amateur artist
http://jove.prohosting.com/~philho/
--=#=--=#=--=#=--=#=--=#=--=#=--=#=--=#=--=#=--
GMX - Die Kommunikationsplattform im Internet.
http://www.gmx.net