[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Interning strings considered harmful (somewhat)
- From: Florian Weimer <fw@...>
- Date: Fri, 20 Nov 2009 21:04:27 +0100
* Alex Davies:
> Florian Weimer wrote:
>> I went ahead and replaced the hash function with Jenkins' lookup3.c.
>> The impact on microbenchmarks was rather mixed. For reading text
>> files line by line, with somewhat regular content, there was a small
>> speed-up (despite line lengths generally above 32 bytes). Fasta was
>> slower, k-nucleotide was faster. Reading large, mostly random strings
>> will be significantly slower, but for quite regular strings,
>> lookup3.c's better mixing seems to pay off.
>
> A nice surprise that you noticed speed up. I have to ask though, from
> a security point of view - is there any point in replacing one
> non-cryptographic hash with another?
A cryptographic hash should be collision-resistant. We do not
actually need this property. It's also difficult how to achieve it if
the hash only outputs 32 bits, of which even fewer are actually used.
We only need that it's hard to create collisions which do not depend
on the initial state. That is a fairly weak requirement, and
lookup3.c seems to fullfil it.
As an additional precaution, the order in which you iterate over
string keys in a table should not directly correspond to the value of
the hash for the global string table, otherwise state information
might leak. I haven't implemented that yet (there are various options
for that, all of them just adding a bit of constant overhead when a
new string is allocated).
By the way, I discovered that delayed hashing collides with the
requirement of a non-moving garbage collector. (Technically, you
could only pin objects which are referenced directly from the stack
because that's what the spec says, but reading this list, I got the
impression that the spec is often interpreted as saying that the
garbage collector must be non-moving.) That's why I didn't try to
implement this.