lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


I wrote a function to get boundary offsets for further processing from
huge files (tens of GB).

Whilst testing with a file generated from /dev/urandom and some boundary
sprinkled in I was very surprised to see that a ternary is significantly
slower than a function call with an if statement inside.

When I switch tho two lines "local chunk = ..." and use the "readChunk"
function the process is significantly faster.

This goes against my initial presumption in that a function call would
pose additional overhead and a ternary is pretty much a short form of an
if statement. Could anybody explain to me, why I am seeing this?


Results on my machine:

> # Using ternary
> time lua ./stream.lua
12
4294967329
8589934646

real	0m5,074s
user	0m3,463s
sys	0m1,599s

> # Using function
> time lua ./stream.lua
12
4294967329
8589934646

real	0m3,739s
user	0m2,744s
sys	0m0,984s


local chunkSize = 32 * 1024

local readChunk = function(file, lastChunk)
	if #lastChunk < chunkSize then
		return file:read(chunkSize)
	else
		return ""
	end
end

local x = function(file, pattern)
	local lastChunk = ""
	local offset = 0

	return function()
		while lastChunk do
			--local chunk = readChunk(file, lastChunk)
			local chunk = #lastChunk > chunkSize and "" or file:read(chunkSize)
			local data = lastChunk .. (chunk or "")
			local first, last = data:find(pattern, 1, true)
			if first then
				local ret = offset + first - 1
				offset = offset + last
				lastChunk = data:sub(last + 1)
				return ret
			else
				offset = offset + #lastChunk
				lastChunk = chunk
			end
		end
	end
end

local y = io.open("lala")

for start in x(y, "boundary") do
	print(start)
end

y:close()