Re: Interning strings considered harmful (somewhat)

Subject: Re: Interning strings considered harmful (somewhat)
From: Mike Zraly &lt;mzraly@ ... &gt;
Date: Wed, 4 Nov 2009 07:16:16 -0500

On Wed, Nov 4, 2009 at 1:37 AM, Florian Weimer <fw@deneb.enyo.de> wrote:

Doug Baskins (the Judy array inventor) suggests to use a relatively
weak hash to split strings into buckets and then store the strings in
each bucket in a trie, using Judy arrays. I don't know if Judy arrays
actually pay off their complexity (the hashing destroys locality, so
it's hard to imagine how the'd be a win).

I think judy arrays are intended to be fairly cache-friendly, though they assume a particular cache line size (64 bits?).

It might also be interesting to look into the HAT trie and other cache-friendly data structures. A number of interesting papers on HAT tries can be found at their author's homepage, http://www.naskitis.com/. Not sure how well HAT tries handle small sets of strings, but the burst tries they are based on supposedly do quite well -- see http://crpit.com/confpapers/CRPITV4Heinz.pdf.