I had a look at the latest source code at
https://github.com/lua/lua
First question is: it makes a lot of sense to manage very short strings as we manage doubles/int64 - why this solution has been discarded? (source and executable grow of course but benchmarks done in 2007 seems to show an advantage)
Second question: how big is the advantage given by the string cache? (this seems to be a simple hash table which can hold few items per bucket, overflow items are inserted in the string hash table).
Third question: for long strings (larger than 40 chars) the hash function takes into account min 20 and max 32 chars (I have done a quick simulation), which means that there can be a lot of collisions, unintentional or malicious. Is there anything that can be done to mitigate the worst case when such long strings are used as keys in tables?
Fourth question: how does the seed in the hash helps? For the point above it does not help: collisions on long strings will happen all the same - or am I overlooking something? What is the purpose of the seed then?
Andrea