[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Load large amount of data fast
 
- From: Sean Conner <sean@...>
 
- Date: Sun, 17 Oct 2010 06:13:24 -0400
 
It was thus said that the Great Alexander Gladysh once stated:
> Hi, list!
> 
> Apologies for a lazy question, I have not done my own homework.
> 
> I've got a large file  (3M entries, 250 MB) with data.
> Each entry is one line with a small Lua table:
> 
> { foo = 1; bar = 2; baz = 'text' };
> 
> (Actually, there are two different entry formats, but that does not matter.)
> 
> I need to load this data fast enough. (Faster than several hours that
> my original loader runs on LJ2, and it still had not stopped.)
> 
> So, if you know an implementation than ad-hoc unoptimized one below,
> please share.
  Under the assumption that you could either generate the data as an actual
Lua script, or massage the data into a Lua script, I generated a file with
3M entries that looks like:
res = {}
function init()
table.insert(res,{ foo = 1 , bar = 2 , baz = 'text1' })
table.insert(res,{ foo = 1 , bar = 2 , baz = 'text2' })
 ... 
table.insert(res,{ foo = 1 , bar = 2 , baz = 'text4094' })
table.insert(res,{ foo = 1 , bar = 2 , baz = 'text4095' })
table.insert(res,{ foo = 1 , bar = 2 , baz = 'text4096' })
end
init()
function init()
table.insert(res,{ foo = 1 , bar = 2 , baz = 'text4097' })
table.insert(res,{ foo = 1 , bar = 2 , baz = 'text4098' })
table.insert(res,{ foo = 1 , bar = 2 , baz = 'text4099' })
...
table.insert(res,{ foo = 1 , bar = 2 , baz = 'text2999999' })
table.insert(res,{ foo = 1 , bar = 2 , baz = 'text3000000' })
end
init()
  Each time init() is defined, 4,096 records are inserted into res.  I limited
it to 4,096 because LuaJIT 2.0.0-beta2 (the version I currently have
installed) has an even lower limit of constants per function than Lua and
found (for my own huge script) that a limit of 4,096 works fine for both
LuaJIT and Lua.  
  I then ran lua and luajit over this file and timed the results:
[spc]lucy:/tmp/foo>time lua data1
real    0m32.067s
user    0m29.807s
sys     0m1.747s
[spc]lucy:/tmp/foo>time luajit data1
real    3m21.539s
user    0m25.572s
sys     0m18.276s
  Yes, LuaJIT did worse than Lua.  Some more data:
Computer:	Intel(R) Pentium(R) D  CPU 2.66GHz (dual core)
		1G RAM (low I know)
		Linux 2.6.9
Lua:		5.1.4 with all patches applied
LuaJIT:		2.0.0-beta2
Script to generate data file:
	MAX = 4096
	local file = io.open("data1","w")
	file:write("res = {}\nfunction init()\n")
	cnt = 0
	for i = 1,3000000 do  
	  if cnt == MAX then
	    file:write("end\ninit()\nfunction init()\n")
	    cnt = 0
	  end
  
	  file:write("table.insert(res,{ foo = 1 , bar = 2 , baz = 'text" .. i .. "'})\n")
	  cnt = cnt + 1
	end
	file:write("end\ninit()\n")
	file:close()
  -spc