I've noticed on the wiki a new article SpeedingUpStrings, which tackles improving loading strings through the C API. (http://lua-users.org/wiki/SpeedingUpStrings)
I don't believe there's discussion pages on the wiki (unfortunately), but I wanted to share the attempt I made at speeding up strings a while ago. I mentioned it briefly at the time on icynorth, but I don't think that forum is read, so I thought I'd annoy everyone on the list..
It was quite an unnecessary optimization for my own project, but served as a learning experience for me as to how Lua works. I noticed that iterating over the characters in a string using string.sub required the whole string creation fiasco of hashing, copying, etc. It bothered my that a single character went through the same process as a multi megabyte string, so I aimed to improve this specific case by implementing a shortstr type.
Basically, Shortstrs are held entirely inside the TValue. On an x86 using doubles, they consist of a length byte, and then 7 data bytes. The last byte must be a #0, so effectively strings of up to 6 characters are allowed. So essentially you store the whole string in what is normally just a pointer to a string + kludge. (By using some careful tt selection, taking in to account the endianness of the architecture, you could extend this to 7 characters but far too ugly a hack to be worthwhile).
Accordingly, I found the number of gc collectable strings in my program to drop from hundreds to tens, and after making the changes to the string.sub inline of LuaJit received a several fold speedup. That was the goal of course, but I'm doubtful it would ever translate to measurable performance gains in a real world project. (I believe the GC cost of strings to be unmeasurable compared to the number of tables/closures etc)
Particular implementation details:
• LUA_TSHORTSTR (LUA_TSTRING | (1 << 15))
allows checking for a string by just comparing the low byte. Test for a shortstr by checking the high byte. Also, by modifying iscollectable to check a signed short instead of an signed int, GC will not get confused.
• Zero out all unnecessary bytes
allows a simple long long compare to see if two shortstrs are equal
• Left the parser as is.
before dumping a constant to a functions constant table though, convert strings to shortstrs as necessary.
coded in asm, takes 7 ops and produces less collisions then current string hash.
• Change *TStrings in lua_State etc, to full TValues.