[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Hash Table Collisions (n.runs-SA-2011.004)
- From: Mark Hamburg <mark@...>
- Date: Sun, 1 Jan 2012 08:08:48 -0800
On Dec 31, 2011, at 7:33 PM, Tom N Harris wrote:
> On 12/31/2011 09:27 PM, Mark Hamburg wrote:
>>
>> I am increasingly convinced that the best defense is to fail string interning when the buckets get too long. This obviously could reject some non-malicious cases, but those shouldn't be that common.
>>
>
> What are the disadvantages to this approach? The VM would not be able to assume inequality if two pointers are different. Another option is to have a slower but more robust algorithm that is activated when the faster hash is insufficient. That would add a test to every table lookup and require a flag bit. Or store the hash function as a pointer which adds 4~8 bytes to the table.
>
> In my oversimplified experiments with this, I found that the excess strings will eventually be removed by the GC. Aggressive collection makes it difficult to fill the string table with collisions. Though that still incurs a performance penalty, and doesn't help if the strings are being kept alive somehow. If the strings are keys in a regular table, such as decoded POST form fields, then you still have a problem on that table, even if you drop them from the internal string table.
In my version of this, at some point string interning failures would start behaving like memory allocation failures — i.e., we simply would fail to generate the new string and would fail the operation attempting to generate it. So, pointer equality would still work for testing string equality.
Mark