lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Hi,

thanks for trying. I have to head to work now so I have to look into it
later. It's a week too late for me as I was parsing big files in Lua the
other week. I found that eventually it was best for me to write it in
Lua and the bigger bottlenek was to parse the big files into big
strings. So I came up with a file iterator that returns only one row at
a time, even for multiline. Note, that the code below uses \ for
escaping as I was parsing MySQL based CSV files. It might help you with
the multiline stuff. And yes it was a quick hack based on the PiL code.


	-T
-- smart generator, that deals with multiple line fields and properly escaped quotes
-- FIXME: This escapes single quotes which can be here for two reasons:
--   - single quotes as such or apotrophe -> escape for the SQL insert here is cheaper than on every single word!
--   - numerical delimiter -> we don't support numerical formatting in csv. Period!
function csv_generator(filename)
	local f = assert(io.open(filename, 'r'))
	return function()
		local line = f:read("*line")
		if line then
			local ml = false
			line = string.gsub(line, "'", "\\'") .. ','         -- ending comma
			local row = {}             -- table to collect fields
			local f_start = 1
			repeat
				-- multiline field
				if ml then
					line = line .. string.gsub(f:read("*line"), "'", "\\'") .. ','
					local i  = f_start
					repeat
						-- find closing quote; chew accross escaped quotes
						a, i, c = string.find(line, '(\\?)"', i+1)
					until c ~= '\\'    -- not an escaped quote?
					if i then
						local f = string.sub(line, f_start+1, i-1)
						table.insert(row, (string.gsub(f, '\\"', '"')))
						f_start = string.find(line, ',', i) + 1
						ml = false
					end
				end

				-- next field is quoted? (start with `"'?)
				if string.find(line, '^"', f_start) then
					local a, c
					local i  = f_start
					repeat
						-- find closing quote; chew accross escaped quotes
						a, i, c = string.find(line, '(\\?)"', i+1)
					until c ~= '\\'    -- not an escaped quote?
					if not i then
						-- error('unmatched "')
						line = string.gsub(line, '\\,$', '\n')
						ml = true
					else
						local f = string.sub(line, f_start+1, i-1)
						table.insert(row, (string.gsub(f, '\\"', '"')))
						f_start = string.find(line, ',', i) + 1
					end
				else                 -- unquoted; find next comma
					local nexti = string.find(line, ',', f_start)
					table.insert(row, string.sub(line, f_start, nexti-1))
					f_start = nexti + 1
				end
			until f_start > string.len(line)

			return row
		else
			f:close()
		end
	end
end