lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


This is a coroutine based system I use:

--this is implemented as a coroutine that provides decoded lines
--usage is as follows:

--NextLine = coroutine.wrap(CSVreader)
--lineno,line = NextLine(FileName,ColumnNames)
--while lineno > 0 do
--  process line
--  lineno,line = NextLine()
--end
--if lineno > 0 then NextLine('stop') end

--the reader is given a file name, this is opened scanned and closed as required. --the reader returns a line number of the start source line decoded, if this is --0 it means the end of the file has been detected, and the coroutine terminates. --it also returns a table of the fields extracted, fields in the source file may --be multi-line according to the conventions used by Excel, line breaks are propagated. --the returned line is an array of fields, [1] is the first column, [2] the second, etc.

function CSVreader(FileName)
 local file = assert(io.open(tostring(FileName),'r'))
 local lineno = 0
 local offset = 1
 local line = {}
 local rawline, nextline, nexti
 while true do
   rawline = file:read()
   if not rawline then break end --gone off end of file, that's it
   lineno = lineno + offset; offset = 1
   local fieldstart = 1
   repeat
     -- next field is quoted? (start with `"'?)
     if string.find(rawline, '^"', fieldstart) then
       --{{{  got a quoted field
local a, c, f
       local i  = fieldstart
while true do --keep iterating lines 'til get a proper field termination
         repeat
           -- find closing quote
           a, i, c = string.find(rawline, '"("?)', i+1)
         until c ~= '"'    -- quote not followed by quote?
         if not i then --field unterminated
           --concatenate next line from file and start again
i = string.len(rawline)+1 --continue search from start of next line
           nextline = file:read()               --get next line
           if not nextline then error('unmatched "') end
           offset = offset + 1
rawline = rawline..'\r\n'..nextline --add continuation (and line break) to the line --carry on the field termination search
         else
           f = string.sub(rawline, fieldstart+1, i-1)
           table.insert(line, (string.gsub(f, '""', '"')))
fieldstart = string.find(rawline, ',', i) --find end of this field if not fieldstart then fieldstart = string.len(rawline) end --gone off end of line fieldstart = fieldstart + 1 --move to next field break --got a proper line termination, so carry on
         end
       end
--}}}
     else                -- unquoted; find next comma
       nexti = string.find(rawline, ',', fieldstart)
if not nexti then nexti = string.len(rawline)+1 end --pretend there is a comma on the end
       table.insert(line, string.sub(rawline, fieldstart, nexti-1))
       fieldstart = nexti + 1
     end
   until fieldstart > string.len(rawline)
   if coroutine.yield(lineno,line) == 'stop' then break end
   line = {}                            --empty line for next lot
 end
 file:close()
 return 0,{}
end


gary ng wrote:
Hi,

I have a question about csv parser.

For most of the codes I can search(including lpeg and
even python's module), they show how to break a
"record" or csv line into fields.

Now if I have a file like the following :

"this is a field with \r\n", 2, 3 \r\n
normal, 2, 3 \r\n

how do I read the correct line as the quoted \r\n
would not be treated specially by io:lines(), I
assume.



____________________________________________________________________________________ Take the Internet to Go: Yahoo!Go puts the Internet in your pocket: mail, news, photos & more. http://mobile.yahoo.com/go?refer=1GNXIC




--
Regards,

Dave Nichols
Match-IT Limited
Tel: 0845 1300 510
Fax: 0845 1300 610
mailto:dave.nichols@make247.co.uk
http://www.make247.co.uk

Email Disclaimer: The contents of this electronic mail message and any attachments (collectively "this message") are confidential, possibly privileged and intended only for its addressee ("the addressee"). If received in error, please delete immediately without disclosing its contents to anyone. Neither the sender nor its management or employees will in any way be responsible for any advice, opinion, conclusion or other information contained in this message or arising from it's disclosure.