lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Eike Decker wrote:

I am trying to find a regular expression that matches each line of an input
string but cannot find an expression that needs no additional checks.

Shmuel Zeigerman wrote:

for line in (str.."\n"):gmatch("([^\n]*)\n") do

The standard way to use an EOL (at least in the Unix/C
tradition) is as a line terminator, so a three-line string
looks like this:

 "foo\nbar\nbaz\n"

But in the wild you'll see EOLs used as line separators:

 "foo\nbar\nbaz"

One problem with Shmuel's solution quoted above is that it
treats the first of the above two strings as four lines, the
last being empty.  If you want to treat both of them as
three lines, you need to do something like this:

 if string.sub(Str, -1) ~= "\n" then
   -- The last line doesn't have an EOL; give it one:
   Str = Str .. "\n"
 end
 for Line in Str:gmatch("([^\n]*)\n") do
   print("line: <" .. Line .. ">")
 end

or this (doesn't side-effect Str):

 local MissingEol = false
 if string.sub(Str, -1) ~= "\n" then
   -- The last line doesn't have an EOL; give it one:
   MissingEol = true
 end
 for Line in (MissingEol and Str .. "\n" or Str):gmatch("([^\n]*)\n") do
   print("line: " .. Line)
 end

This is impossible to do with just a single pattern and no
additional checks, but using string.sub avoids a linear scan
of the string.  (In other words, don't use Str:match("\n$").)
Also notice that in both examples the newline is
concatenated only if necessary, which saves time and memory
if Str is very long.

Both examples treat the empty string as one blank line.  If
you don't like that, it can be special-cased.  (That
ambiguity is one reason why using line terminators instead
of line separators when possible is a good idea.)

--
Aaron
http://arundelo.com/