lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


It was thus said that the Great Alexander Gladysh once stated:
> Hi, list!
> 
> Apologies for a lazy question, I have not done my own homework.
> 
> I've got a large file  (3M entries, 250 MB) with data.
> Each entry is one line with a small Lua table:
> 
> { foo = 1; bar = 2; baz = 'text' };
> 
> (Actually, there are two different entry formats, but that does not matter.)
> 
> I need to load this data fast enough. (Faster than several hours that
> my original loader runs on LJ2, and it still had not stopped.)
> 
> So, if you know an implementation than ad-hoc unoptimized one below,
> please share.

  Under the assumption that you could either generate the data as an actual
Lua script, or massage the data into a Lua script, I generated a file with
3M entries that looks like:

res = {}
function init()
table.insert(res,{ foo = 1 , bar = 2 , baz = 'text1' })
table.insert(res,{ foo = 1 , bar = 2 , baz = 'text2' })

 ... 

table.insert(res,{ foo = 1 , bar = 2 , baz = 'text4094' })
table.insert(res,{ foo = 1 , bar = 2 , baz = 'text4095' })
table.insert(res,{ foo = 1 , bar = 2 , baz = 'text4096' })
end
init()
function init()
table.insert(res,{ foo = 1 , bar = 2 , baz = 'text4097' })
table.insert(res,{ foo = 1 , bar = 2 , baz = 'text4098' })
table.insert(res,{ foo = 1 , bar = 2 , baz = 'text4099' })

...

table.insert(res,{ foo = 1 , bar = 2 , baz = 'text2999999' })
table.insert(res,{ foo = 1 , bar = 2 , baz = 'text3000000' })
end
init()

  Each time init() is defined, 4,096 records are inserted into res.  I limited
it to 4,096 because LuaJIT 2.0.0-beta2 (the version I currently have
installed) has an even lower limit of constants per function than Lua and
found (for my own huge script) that a limit of 4,096 works fine for both
LuaJIT and Lua.  

  I then ran lua and luajit over this file and timed the results:

[spc]lucy:/tmp/foo>time lua data1

real    0m32.067s
user    0m29.807s
sys     0m1.747s
[spc]lucy:/tmp/foo>time luajit data1

real    3m21.539s
user    0m25.572s
sys     0m18.276s

  Yes, LuaJIT did worse than Lua.  Some more data:

Computer:	Intel(R) Pentium(R) D  CPU 2.66GHz (dual core)
		1G RAM (low I know)
		Linux 2.6.9

Lua:		5.1.4 with all patches applied
LuaJIT:		2.0.0-beta2

Script to generate data file:

	MAX = 4096

	local file = io.open("data1","w")

	file:write("res = {}\nfunction init()\n")
	cnt = 0

	for i = 1,3000000 do  
	  if cnt == MAX then
	    file:write("end\ninit()\nfunction init()\n")
	    cnt = 0
	  end
  
	  file:write("table.insert(res,{ foo = 1 , bar = 2 , baz = 'text" .. i .. "'})\n")
	  cnt = cnt + 1
	end

	file:write("end\ninit()\n")
	file:close()

  -spc