[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Load large amount of data fast
- From: Sean Conner <sean@...>
- Date: Sun, 17 Oct 2010 06:13:24 -0400
It was thus said that the Great Alexander Gladysh once stated:
> Hi, list!
>
> Apologies for a lazy question, I have not done my own homework.
>
> I've got a large file (3M entries, 250 MB) with data.
> Each entry is one line with a small Lua table:
>
> { foo = 1; bar = 2; baz = 'text' };
>
> (Actually, there are two different entry formats, but that does not matter.)
>
> I need to load this data fast enough. (Faster than several hours that
> my original loader runs on LJ2, and it still had not stopped.)
>
> So, if you know an implementation than ad-hoc unoptimized one below,
> please share.
Under the assumption that you could either generate the data as an actual
Lua script, or massage the data into a Lua script, I generated a file with
3M entries that looks like:
res = {}
function init()
table.insert(res,{ foo = 1 , bar = 2 , baz = 'text1' })
table.insert(res,{ foo = 1 , bar = 2 , baz = 'text2' })
...
table.insert(res,{ foo = 1 , bar = 2 , baz = 'text4094' })
table.insert(res,{ foo = 1 , bar = 2 , baz = 'text4095' })
table.insert(res,{ foo = 1 , bar = 2 , baz = 'text4096' })
end
init()
function init()
table.insert(res,{ foo = 1 , bar = 2 , baz = 'text4097' })
table.insert(res,{ foo = 1 , bar = 2 , baz = 'text4098' })
table.insert(res,{ foo = 1 , bar = 2 , baz = 'text4099' })
...
table.insert(res,{ foo = 1 , bar = 2 , baz = 'text2999999' })
table.insert(res,{ foo = 1 , bar = 2 , baz = 'text3000000' })
end
init()
Each time init() is defined, 4,096 records are inserted into res. I limited
it to 4,096 because LuaJIT 2.0.0-beta2 (the version I currently have
installed) has an even lower limit of constants per function than Lua and
found (for my own huge script) that a limit of 4,096 works fine for both
LuaJIT and Lua.
I then ran lua and luajit over this file and timed the results:
[spc]lucy:/tmp/foo>time lua data1
real 0m32.067s
user 0m29.807s
sys 0m1.747s
[spc]lucy:/tmp/foo>time luajit data1
real 3m21.539s
user 0m25.572s
sys 0m18.276s
Yes, LuaJIT did worse than Lua. Some more data:
Computer: Intel(R) Pentium(R) D CPU 2.66GHz (dual core)
1G RAM (low I know)
Linux 2.6.9
Lua: 5.1.4 with all patches applied
LuaJIT: 2.0.0-beta2
Script to generate data file:
MAX = 4096
local file = io.open("data1","w")
file:write("res = {}\nfunction init()\n")
cnt = 0
for i = 1,3000000 do
if cnt == MAX then
file:write("end\ninit()\nfunction init()\n")
cnt = 0
end
file:write("table.insert(res,{ foo = 1 , bar = 2 , baz = 'text" .. i .. "'})\n")
cnt = cnt + 1
end
file:write("end\ninit()\n")
file:close()
-spc