lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


First, my environment.  I don't patch Lua or modify it other than to
patch the build to generate MacOS universal binaries.

   Lua 5.1.2
   LPeg 0.7
   Darwin 9.0.0

I would like to use LPeg for parsing CSV files and was delighted to
see that there are even two examples for doing so right in the
documentation.  However, I'm having difficulty using them and would
like some advice on how to proceed.  Please consider this example.
I'm using the two CSV recipes unmodified, directly from the
documentation for LPeg 0.7.


local lines =
{
   'somethingin,alllower',
   'SomethingIn,CamelCase',
   'SOMETHING_IN,ALLCAPS',
}

require('re')
require('lpeg')

record_re = re.compile[[
   record <- ( field (',' field)* ) -> {} ('\n' / !.)
   field <- escaped / nonescaped
   nonescaped <- { [^,"\n]* }
   escaped <- '"' {~ ([^"] / '""' -> '"')* ~} '"'
]]

local field =
   '"' * lpeg.Cs(((lpeg.P(1) - '"') + lpeg.P'""' / '"')^0) * '"' +
  lpeg.C((1 - lpeg.S',\n"')^0)

record_lpeg = field * (',' * field)^0 * (lpeg.P'\n' + -1)

for k,impl in ipairs{'record_re', 'record_lpeg'} do
   print('Implementation: ' .. impl)
   local record = _G[impl]
   for i,line in ipairs(lines) do
      io.write('Attempting to match "' .. line .. '": ')
      local m = record:match(line)
      if type(m) == 'table' then
         print('match succeeded')
         for j,v in ipairs(m) do
            print(j,v)
         end
      else
         print('match failed')
         print(tostring(m))
      end
   end
   print('')
end


When I run this program, I get the following output.


Implementation: record_re
Attempting to match "somethingin,alllower": match failed
nil
Attempting to match "SomethingIn,CamelCase": match failed
nil
Attempting to match "SOMETHING_IN,ALLCAPS": match succeeded
1       SOMETHING_IN
2       ALLCAPS

Implementation: record_lpeg
Attempting to match "somethingin,alllower": match failed
somethingin
Attempting to match "SomethingIn,CamelCase": match failed
SomethingIn
Attempting to match "SOMETHING_IN,ALLCAPS": match failed
SOMETHING_IN


In the first case, I seem to get the fields only when the row is in
all capitals.  I discovered this by accident when I ran record_re on a
CSV file which contains lower case letters only in the first line, the
remainder of the file being parsed without errors.

In the second case, I expect to receive a table from the match but get
only the first field as a string.

I tried messing with record_re to get it to take lower case letters
using the Wikipedia article on CSV and RFC 4180 for reference.  I
can't come up with a good reason why it fails.

This is my first foray with LPeg and PEG in general.  Any comments or
criticisms appreciated.

   Ken Smith