[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: a question about csv parser
- From: Dave Nichols <dave.nichols@...>
- Date: Thu, 13 Sep 2007 10:11:14 +0100
This is a coroutine based system I use:
--this is implemented as a coroutine that provides decoded lines
--usage is as follows:
--NextLine = coroutine.wrap(CSVreader)
--lineno,line = NextLine(FileName,ColumnNames)
--while lineno > 0 do
-- process line
-- lineno,line = NextLine()
--end
--if lineno > 0 then NextLine('stop') end
--the reader is given a file name, this is opened scanned and closed as
required.
--the reader returns a line number of the start source line decoded, if
this is
--0 it means the end of the file has been detected, and the coroutine
terminates.
--it also returns a table of the fields extracted, fields in the source
file may
--be multi-line according to the conventions used by Excel, line breaks
are propagated.
--the returned line is an array of fields, [1] is the first column, [2]
the second, etc.
function CSVreader(FileName)
local file = assert(io.open(tostring(FileName),'r'))
local lineno = 0
local offset = 1
local line = {}
local rawline, nextline, nexti
while true do
rawline = file:read()
if not rawline then break end --gone off end of file, that's it
lineno = lineno + offset; offset = 1
local fieldstart = 1
repeat
-- next field is quoted? (start with `"'?)
if string.find(rawline, '^"', fieldstart) then
--{{{ got a quoted field
local a, c, f
local i = fieldstart
while true do --keep iterating lines 'til get a proper field
termination
repeat
-- find closing quote
a, i, c = string.find(rawline, '"("?)', i+1)
until c ~= '"' -- quote not followed by quote?
if not i then --field unterminated
--concatenate next line from file and start again
i = string.len(rawline)+1 --continue search from
start of next line
nextline = file:read() --get next line
if not nextline then error('unmatched "') end
offset = offset + 1
rawline = rawline..'\r\n'..nextline --add continuation (and
line break) to the line
--carry on the field
termination search
else
f = string.sub(rawline, fieldstart+1, i-1)
table.insert(line, (string.gsub(f, '""', '"')))
fieldstart = string.find(rawline, ',', i)
--find end of this field
if not fieldstart then fieldstart = string.len(rawline) end
--gone off end of line
fieldstart = fieldstart + 1
--move to next field
break --got a proper line
termination, so carry on
end
end
--}}}
else -- unquoted; find next comma
nexti = string.find(rawline, ',', fieldstart)
if not nexti then nexti = string.len(rawline)+1 end --pretend
there is a comma on the end
table.insert(line, string.sub(rawline, fieldstart, nexti-1))
fieldstart = nexti + 1
end
until fieldstart > string.len(rawline)
if coroutine.yield(lineno,line) == 'stop' then break end
line = {} --empty line for next lot
end
file:close()
return 0,{}
end
gary ng wrote:
Hi,
I have a question about csv parser.
For most of the codes I can search(including lpeg and
even python's module), they show how to break a
"record" or csv line into fields.
Now if I have a file like the following :
"this is a field with \r\n", 2, 3 \r\n
normal, 2, 3 \r\n
how do I read the correct line as the quoted \r\n
would not be treated specially by io:lines(), I
assume.
____________________________________________________________________________________
Take the Internet to Go: Yahoo!Go puts the Internet in your pocket: mail, news, photos & more.
http://mobile.yahoo.com/go?refer=1GNXIC
--
Regards,
Dave Nichols
Match-IT Limited
Tel: 0845 1300 510
Fax: 0845 1300 610
mailto:dave.nichols@make247.co.uk
http://www.make247.co.uk
Email Disclaimer: The contents of this electronic mail message and any attachments (collectively "this message") are confidential, possibly privileged and intended only for its addressee ("the addressee"). If received in error, please delete immediately without disclosing its contents to anyone. Neither the sender nor its management or employees will in any way be responsible for any advice, opinion, conclusion or other information contained in this message or arising from it's disclosure.