[Date Prev][Date Next][Thread Prev][Thread Next]
- Subject: Re: Interning strings considered harmful (somewhat)
- From: Mike Zraly <mzraly@...>
- Date: Wed, 4 Nov 2009 07:16:16 -0500
On Wed, Nov 4, 2009 at 1:37 AM, Florian Weimer <email@example.com
Doug Baskins (the Judy array inventor) suggests to use a relatively
weak hash to split strings into buckets and then store the strings in
each bucket in a trie, using Judy arrays. I don't know if Judy arrays
actually pay off their complexity (the hashing destroys locality, so
it's hard to imagine how the'd be a win).
I think judy arrays are intended to be fairly cache-friendly, though they assume a particular cache line size (64 bits?).
It might also be interesting to look into the HAT trie and other cache-friendly data structures. A number of interesting papers on HAT tries can be found at their author's homepage, http://www.naskitis.com/
. Not sure how well HAT tries handle small sets of strings, but the burst tries they are based on supposedly do quite well -- see http://crpit.com/confpapers/CRPITV4Heinz.pdf