[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: LPeg question: parsing CSV
- From: "Ken Smith" <kgsmith@...>
- Date: Fri, 16 Nov 2007 10:38:47 -0800
First, my environment. I don't patch Lua or modify it other than to
patch the build to generate MacOS universal binaries.
Lua 5.1.2
LPeg 0.7
Darwin 9.0.0
I would like to use LPeg for parsing CSV files and was delighted to
see that there are even two examples for doing so right in the
documentation. However, I'm having difficulty using them and would
like some advice on how to proceed. Please consider this example.
I'm using the two CSV recipes unmodified, directly from the
documentation for LPeg 0.7.
local lines =
{
'somethingin,alllower',
'SomethingIn,CamelCase',
'SOMETHING_IN,ALLCAPS',
}
require('re')
require('lpeg')
record_re = re.compile[[
record <- ( field (',' field)* ) -> {} ('\n' / !.)
field <- escaped / nonescaped
nonescaped <- { [^,"\n]* }
escaped <- '"' {~ ([^"] / '""' -> '"')* ~} '"'
]]
local field =
'"' * lpeg.Cs(((lpeg.P(1) - '"') + lpeg.P'""' / '"')^0) * '"' +
lpeg.C((1 - lpeg.S',\n"')^0)
record_lpeg = field * (',' * field)^0 * (lpeg.P'\n' + -1)
for k,impl in ipairs{'record_re', 'record_lpeg'} do
print('Implementation: ' .. impl)
local record = _G[impl]
for i,line in ipairs(lines) do
io.write('Attempting to match "' .. line .. '": ')
local m = record:match(line)
if type(m) == 'table' then
print('match succeeded')
for j,v in ipairs(m) do
print(j,v)
end
else
print('match failed')
print(tostring(m))
end
end
print('')
end
When I run this program, I get the following output.
Implementation: record_re
Attempting to match "somethingin,alllower": match failed
nil
Attempting to match "SomethingIn,CamelCase": match failed
nil
Attempting to match "SOMETHING_IN,ALLCAPS": match succeeded
1 SOMETHING_IN
2 ALLCAPS
Implementation: record_lpeg
Attempting to match "somethingin,alllower": match failed
somethingin
Attempting to match "SomethingIn,CamelCase": match failed
SomethingIn
Attempting to match "SOMETHING_IN,ALLCAPS": match failed
SOMETHING_IN
In the first case, I seem to get the fields only when the row is in
all capitals. I discovered this by accident when I ran record_re on a
CSV file which contains lower case letters only in the first line, the
remainder of the file being parsed without errors.
In the second case, I expect to receive a table from the match but get
only the first field as a string.
I tried messing with record_re to get it to take lower case letters
using the Wikipedia article on CSV and RFC 4180 for reference. I
can't come up with a good reason why it fails.
This is my first foray with LPeg and PEG in general. Any comments or
criticisms appreciated.
Ken Smith