lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


hi,

On Thu, Apr 21, 2011 at 12:56 AM, Geoff Leyland
<geoff_leyland@fastmail.fm> wrote:

> How about having encode do this:
> function encode(data)
>  local out = { n=1}
>  encode_table(data,  out)
>  return table.concat(out)
> end
> and encode_data contains lines like:
>  encode_table(v, out)

yes, this helped a bit but still not enough :

function encode(data)
	local t = type(data)
	if t == 'table' then -- list(array) or hash
		local i = 1
		local list = true
		for k, v in pairs(data) do
			if k ~= i then
				list = false
				break
			end
			i = i + 1
		end
		local out = {}
		if list then
			out[1] = 'l'
			for k, v in pairs(data) do
				table.insert(out,encode(v))
			end
		else -- hash
			out[1] = 'd'
			for k, v in pairs(data) do
				table.insert(out,encode(k))
				table.insert(out,encode(v))
			end
		end
		table.insert(out,'e')
		return table.concat(out,'')
	elseif t == 'string' then
		return table.concat({#data,data},':')
	elseif t == 'number' then
		-- we need to convert scientific notation to decimal
		return table.concat({'i',misc.to_dec_string(data),'e'},'')
	elseif t == 'nil' then -- extension of benc
		return 'n'
	elseif t == 'boolean' then -- extension of benc
		if data then
			return 't'
		else
			return 'f'
		end
	end
end

If compared with the previous version that was using the ... operator,
I got the following results for varying input sizes (times given in
seconds):

base:
#data  #time(s)
1K      0.00011491775512695
10K     0.012901067733765
100K    0.23692297935486
1M   92.385051012039

tuned:
#data  #time(s)
1K      0.00011587142944336
10K     0.0037820339202881
100K    0.12719702720642
1M   51.690559864044

This is still too slow for input sizes >=1M. The ideal would be to go
under 1 second.

The benchmark code is as simple as :

data_sizes={1000,10000,100000,1000000}
for k,v in pairs(data_sizes) do
	local gen= misc.gen_string(v)
	for i=1,(v/100) do
		gen={gen} --deep nesting
	end
	start=os.clock()
	enc_data=test_encode(gen)
	print((v/1000).."K", os.clock()-start)
end

misc.gen_string(v) simply generates a string made by the character 'a'
repeated v times.