|
On 12/31/2011 09:27 PM, Mark Hamburg wrote:
I am increasingly convinced that the best defense is to fail string interning when the buckets get too long. This obviously could reject some non-malicious cases, but those shouldn't be that common.
What are the disadvantages to this approach? The VM would not be able to assume inequality if two pointers are different. Another option is to have a slower but more robust algorithm that is activated when the faster hash is insufficient. That would add a test to every table lookup and require a flag bit. Or store the hash function as a pointer which adds 4~8 bytes to the table.
In my oversimplified experiments with this, I found that the excess strings will eventually be removed by the GC. Aggressive collection makes it difficult to fill the string table with collisions. Though that still incurs a performance penalty, and doesn't help if the strings are being kept alive somehow. If the strings are keys in a regular table, such as decoded POST form fields, then you still have a problem on that table, even if you drop them from the internal string table.
-- - tom telliamed@whoopdedo.org