[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Interning strings considered harmful (somewhat)
- From: Florian Weimer <fw@...>
- Date: Sun, 25 Oct 2009 13:43:41 +0100
The second part of the following benchmark shows distinctly non-linear
performance: Each pass of the first part runs under one second on my
machine, but the passes in the second part take 4, 14, and 32 seconds,
respectively.
I'm a bit worried by this, and the impact on programs which process
data from untrusted inputs. I haven't got a really good idea what can
be done about this. In similar cases, people have used Jenkins'
lookup3.c hash function with a random seed. Perhaps it is sufficient
to drop the skipping from Lua's hash function and use a random seed
stored in the Lua state, but the internal mixing of the current hash
function seems to be rather weak.
local function intersperse(s, ch)
local t = {string.byte(s, 1, #s)}
for i=1,#t do
t[i] = string.char(t[i])
end
return table.concat(t, ch)
end
local function magic(s, i)
return s .. intersperse(tostring(i), '-') .. s
end
local function f(s, count)
local t = {}
for i=10000,10000 + count - 1 do
local s1 = magic(s, i)
t[#t + 1] = s1
end
end
local function bench(f, ...)
local stime = os.time()
f(...)
local etime = os.time()
return etime - stime
end
-- part 1
print(bench(f, "@@@@@@@@@@@@@@@@", 10000))
print(bench(f, "@@@@@@@@@@@@@@@@", 20000))
print(bench(f, "@@@@@@@@@@@@@@@@", 30000))
-- part 2
print(bench(f, "@@@@@@@@@@@@@@@@@", 10000))
print(bench(f, "@@@@@@@@@@@@@@@@@", 20000))
print(bench(f, "@@@@@@@@@@@@@@@@@", 30000))