[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: [ANN] LuaCSV
- From: Tobias Kieslich <tobias@...>
- Date: Mon, 3 Aug 2009 08:36:51 -0700
Hi,
thanks for trying. I have to head to work now so I have to look into it
later. It's a week too late for me as I was parsing big files in Lua the
other week. I found that eventually it was best for me to write it in
Lua and the bigger bottlenek was to parse the big files into big
strings. So I came up with a file iterator that returns only one row at
a time, even for multiline. Note, that the code below uses \ for
escaping as I was parsing MySQL based CSV files. It might help you with
the multiline stuff. And yes it was a quick hack based on the PiL code.
-T
-- smart generator, that deals with multiple line fields and properly escaped quotes
-- FIXME: This escapes single quotes which can be here for two reasons:
-- - single quotes as such or apotrophe -> escape for the SQL insert here is cheaper than on every single word!
-- - numerical delimiter -> we don't support numerical formatting in csv. Period!
function csv_generator(filename)
local f = assert(io.open(filename, 'r'))
return function()
local line = f:read("*line")
if line then
local ml = false
line = string.gsub(line, "'", "\\'") .. ',' -- ending comma
local row = {} -- table to collect fields
local f_start = 1
repeat
-- multiline field
if ml then
line = line .. string.gsub(f:read("*line"), "'", "\\'") .. ','
local i = f_start
repeat
-- find closing quote; chew accross escaped quotes
a, i, c = string.find(line, '(\\?)"', i+1)
until c ~= '\\' -- not an escaped quote?
if i then
local f = string.sub(line, f_start+1, i-1)
table.insert(row, (string.gsub(f, '\\"', '"')))
f_start = string.find(line, ',', i) + 1
ml = false
end
end
-- next field is quoted? (start with `"'?)
if string.find(line, '^"', f_start) then
local a, c
local i = f_start
repeat
-- find closing quote; chew accross escaped quotes
a, i, c = string.find(line, '(\\?)"', i+1)
until c ~= '\\' -- not an escaped quote?
if not i then
-- error('unmatched "')
line = string.gsub(line, '\\,$', '\n')
ml = true
else
local f = string.sub(line, f_start+1, i-1)
table.insert(row, (string.gsub(f, '\\"', '"')))
f_start = string.find(line, ',', i) + 1
end
else -- unquoted; find next comma
local nexti = string.find(line, ',', f_start)
table.insert(row, string.sub(line, f_start, nexti-1))
f_start = nexti + 1
end
until f_start > string.len(line)
return row
else
f:close()
end
end
end