lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


I really wasn't going to get into this but I ended up thinking about it
whilst walking to work this morning.

I also think that \n\r is not a real line ending. If that's the case, you
could use the something like the following:

local s, len = 1, strlen(str)
while s < len do
  local _, f, chomped = strfind(str, "([^\r\n]*)\r?\n?", s)
  s = f + 1
  -- do something with chomped, possibly exiting the loop with break
end

If you didn't want to extract the line, you could put the parentheses
around the eol expression instead. Then the chomped
line goes from s to f - strlen(eolchars)

This will also handle the case where the string does not have a terminating
line end of any form. I haven't tested this code and
I'm not certain what would happen if you used gsub, since the pattern will
match the zero-length string. I suspect you will get a
false match on the empty line at the end.

If you really believe that \n\r is possible, then you could use something
like this:

local s, len = 1, strlen(str)
while s < len do
  local _, f, chomped, end_one, end_two = strfind(str, "([^\r\n]*)([\r\n]?)
([\r\n]?)", s)
  if end_one == end_two then f = f - strlen(end_one) end
  s = f + 1
  -- do something with chomped, possibly exiting the loop with break
end

There are various other variations on this theme but I won't go into them.

Hope this helps, and good luck.

Rici



                                                                                                                          
                    Philippe Lhoste                                                                                       
                    <PhiLho@gmx.net>            To:     Multiple recipients of list <lua-l@tecgraf.puc-rio.br>            
                    Sent by:                    cc:                                                                       
                    owner-lua-l@tecgraf.        Subject:     Re: EndOfLine Pattern Matching                               
                    puc-rio.br                                                                                            
                                                                                                                          
                                                                                                                          
                    04/04/02 03.29                                                                                        
                    Please respond to                                                                                     
                    lua-l                                                                                                 
                                                                                                                          
                                                                                                                          




Markus wrote:
> How can I match all possible EndOfLines in strfind() with a Lua pattern?
>
> EndOfLine is \n\r
> or \r\n
> or \n
> or \r
>
> but \n\n are two EndOfLines with an empty string between
> also \r\r
>
> Since one hour I try to define a pattern that matches all possible
> EndOfLine definitions.

Except in badly formed text files, or in cases I am not aware of (but I
would like to know, if any), I doubt you will meet the \n\r combination.
\r\n is
used in Dos/Windows world, \n in Unix-like world, and \r in Macintosh/Apple
world, but I am not aware of a \n\r combination.

Gavin stated that it is quite rare to see mixed EOL in a text file.
It is true, or actually, it shouldn't occurs.
But it is possible. I saw source files on network that can be accessed from
Unix or Windows boxes, and some editors were able to manage the foreign EOL
(display it correctly) but still inserted the EOL of their system when the
user was hitting Return...
SciTE, among others, automatically detects the EOL used in a file and sets
its EOL mode accordingly, but the user can choose to switch mode without
converting the whole file...

Anyway, I won't answer your question directly, as others did it more
precisely that I could.
But I give my own function that try and detect the EOL mode in a given
string (that could be a whole file content...). Despite what I wrote above,
for
ease of processing, I don't manage mixed EOLs...

-- Get the end of line used in the given string and return it.
-- Check only the first one, as we assume the string is consistent.
function GetEol(string)
     local eol1, eol2, eol
     b, _, eol1 = strfind(string, "([\r\n])")
     if b == nil then
          return nil     -- no EOL in this string
     end
     -- Care is taken in case the first line finishes with two EOLs
     eol2 = strsub(string, b+1, b+1)
     if eol1 == '\r' then
          if eol2 == '\n' then
               -- Windows style
               eol = '\r\n'
          else
               -- Mac style
               eol = '\r'
          end
     else -- eol1 == '\n'
          -- Unix style
          eol = '\n'
     end
     return eol
end

Note: if preserving the content/format of the file isn't critical, and if
it
is small enough to fit in memory, you can first replace all \r without \n
around by \n, then remove all remaining \r in the file.
Something like (untested) gsub(file, "[^\n]\r[^\n]", "\n") and gsub(file,
"\r", "\n").

Regards.

--
--=#=--=#=--=#=--=#=--=#=--=#=--=#=--=#=--=#=--
Philippe Lhoste (Paris -- France)
Professional programmer and amateur artist
http://jove.prohosting.com/~philho/
--=#=--=#=--=#=--=#=--=#=--=#=--=#=--=#=--=#=--

GMX - Die Kommunikationsplattform im Internet.
http://www.gmx.net