Anyone know how i can optimize this? Or if i even wrote this correctly. I wanted
to create a function that would return the table corresponding with the fields
that the csv was created
If you don't need a table with actual fields but only a way to access the fields by name you could set a metatable for each row where the __index does the named field lookup. As you don't need to create another table for each row and copy every field its much faster to create and also uses half the memory in my testcase.
function CSVToTableMT(file_in)
local file=assert(io.open(file_in,"r"))
local line=assert(file:read("*l"))
local header=fromCSV(line)
local headerlookup={}
for i,field in ipairs(header) do
headerlookup[field]=i
end
local mt={
__index=function(tbl,key)
local idx=headerlookup[key]
if idx==nil then
return nil
else
return tbl[idx]
end
end
}
local tbl={}
line=file:read("*l")
while line~=nil do
table.insert(tbl,setmetatable(fromCSV(line),mt))
line=file:read("*l")
end
file:close()
return tbl
end
Test:
Windows xp with luajit 2.0
30MB csv file with 19 columns and 400000 rows
reading all rows in a table and then writing all out again, indexing each field of each row by name.
Your approach:
7.54 seconds 400MB memory usage
With optimisations mentioned by David Favro:
7.27 seconds 398MB memory usage
Metatable solution:
5.9 seconds 177MB memory usage
so at least under LuaJit 2.0 ist the fastest approach which also uses much less memory.
LG,
Michael
ps:
thats the optimized function with the original approach i used: