question on lstring and string management

Subject: question on lstring and string management
From: Andrea &lt;andrea.l.vitali@ ... &gt;
Date: Tue, 19 May 2020 12:05:41 -0700

I had a look at the latest source code at https://github.com/lua/lua

I also had a look at an old post by Mike Pall: http://lua-users.org/wiki/FastStringPatch

First question is: it makes a lot of sense to manage very short strings as we manage doubles/int64 - why this solution has been discarded? (source and executable grow of course but benchmarks done in 2007 seems to show an advantage)

Second question: how big is the advantage given by the string cache? (this seems to be a simple hash table which can hold few items per bucket, overflow items are inserted in the string hash table).

Third question: for long strings (larger than 40 chars) the hash function takes into account min 20 and max 32 chars (I have done a quick simulation), which means that there can be a lot of collisions, unintentional or malicious. Is there anything that can be done to mitigate the worst case when such long strings are used as keys in tables?

Fourth question: how does the seed in the hash helps? For the point above it does not help: collisions on long strings will happen all the same - or am I overlooking something? What is the purpose of the seed then?

Andrea

Andrea Vitali