[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Regular expression for matching lines
- From: "Aaron Brown" <arundelo@...>
- Date: Sat, 28 Jul 2007 12:28:00 -0400
Eike Decker wrote:
I am trying to find a regular expression that matches each line of an input
string but cannot find an expression that needs no additional checks.
Shmuel Zeigerman wrote:
for line in (str.."\n"):gmatch("([^\n]*)\n") do
The standard way to use an EOL (at least in the Unix/C
tradition) is as a line terminator, so a three-line string
looks like this:
"foo\nbar\nbaz\n"
But in the wild you'll see EOLs used as line separators:
"foo\nbar\nbaz"
One problem with Shmuel's solution quoted above is that it
treats the first of the above two strings as four lines, the
last being empty. If you want to treat both of them as
three lines, you need to do something like this:
if string.sub(Str, -1) ~= "\n" then
-- The last line doesn't have an EOL; give it one:
Str = Str .. "\n"
end
for Line in Str:gmatch("([^\n]*)\n") do
print("line: <" .. Line .. ">")
end
or this (doesn't side-effect Str):
local MissingEol = false
if string.sub(Str, -1) ~= "\n" then
-- The last line doesn't have an EOL; give it one:
MissingEol = true
end
for Line in (MissingEol and Str .. "\n" or Str):gmatch("([^\n]*)\n") do
print("line: " .. Line)
end
This is impossible to do with just a single pattern and no
additional checks, but using string.sub avoids a linear scan
of the string. (In other words, don't use Str:match("\n$").)
Also notice that in both examples the newline is
concatenated only if necessary, which saves time and memory
if Str is very long.
Both examples treat the empty string as one blank line. If
you don't like that, it can be special-cased. (That
ambiguity is one reason why using line terminators instead
of line separators when possible is a good idea.)
--
Aaron
http://arundelo.com/