lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Am 03.05.2014 14:09 schröbte Coroutines:
On Sat, May 3, 2014 at 4:26 AM, Philipp Janda <siffiejoe@gmx.net> wrote:
We already have long strings and short strings in Lua 5.2. What about an
unhashed "very short" string (7 bytes plus NUL byte) that lives directly in
a TValue?

Firstly, I want to say that I think your proposal is an interesting
tangent worth persuing -- but please note that it does not supersede
anything related to the first proposal I made: `?` -> string.byte('?')
compile-time syntax sugar

Yes, I intentionally didn't say anything about that.


I like your proposal, but I feel (without benchmarking) that comparing
2 integers would be quicker than first finding out if a short string
is long enough to cast the comparison to 2 integers/doubles
(word/dword comparison).

We will have a 64 bit integer type in Lua 5.3, but even if we fill the remaining bytes of the very short string with NULs, using integer compares for character arrays is non-portable. The libc can get away with it, but Lua probably shouldn't try. It could still be an unrolled `memcmp` or `strcmp`.

 Most libc strcmp()'s still do byte-by-byte
comparisons, which would be slower than comparing 4 or 8 bytes between
2 lua_Number's.  Sidenote: Let's make it 16-byte short strings with
long-long comparisons - possible on x86_64 anyway... :(

That would make every `TValue` larger. At the moment we have 64 bit payload (double and 64-bit integer). We probably should also take into account implementations that use floats/32-bit integers where very short strings can only be 3 bytes + NUL ...


It should get rid of the hashing overhead for single character
strings (not sure how much hashing there is for single-byte strings), but
not the call overhead of string.byte, though ...

Between two "short strings" it would be a strcmp()?  If it were
between a short string and a long string you would still have to hash
the short string for the comparison.

For operators other than `==` and `~=` you would still need `memcmp`, but for those two you can stop right after you realize that the string types differ.

But I found another gotcha: the C API. `lua_tostring` would return a pointer to a Lua stack element. I'm not sure the guarantees Lua gives for the validity of character pointers is strong enough for this proposal.

Philipp