lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Having started Lua at version 5.1,
I expect(ed) all strings to be interned,
guaranteeing me O(1) string comparisons
and O(1) table access no matter the keys
(save hash collisions).

However Lua 5.3 introduced a distinction between
"short" strings (which are interned) and "long" strings (which aren't
interned).
The distinction of whether a string is "short" or "long"
is made based on an arbitrary length limit; Lua users have no control.
This means code can be arbitrarily slower -
and arbitrarily more memory inefficient - compared to 5.2.
The only benefit is that string creation is faster, since no hashing is
required.
This constant factor doesn't affect the O(n) time complexity of string
creation however,
and comes at the significant cost of making string comparison -
and thus implicitly table indexing - potentially O(n) in the size of the
compared strings.

This is all technically an "implementation detail",
but it is important for programmers to know about
for them to be able to write efficient code,
especially as the "expected" behavior
(which was "documented" in various unofficial places)
changed since version 5.2.

I suggest making it explicit in the reference manual
that no guarantee regarding string interning / comparison performance is
made,
and perhaps even noting that PUC doesn't do it for long strings since 5.3.

The second "implementation detail" is interior table memory management:
Lua never shrinks tables; it only ever grows hash and array part.
This makes sense to minimize runtime since the allocated space may be
needed again later on -
in fact this is a pattern mentioned in the Lua performance gems - but
can lead to memory leaks.
In practice this mostly appears to be a non-issue,
but programmers need to be aware of it to not unknowingly introduce
memory leaks.
I suggest adding a note to the reference manual (under the GC section)
that tables are not necessarily shrunken by GC (which is also how PUC &
JIT behave).

Note that this behavior also varies across (scripting) languages:
Python shrinks lists (but not dicts);
JavaScript shrinks objects (but not arrays);
Go shrinks both maps and reclaims memory of underlying arrays when using
slices.

A consequence of Lua never shrinking tables is that since tables aren't
"tightly" packed,
pairs is not guaranteed to be linear time;
the unused table slots it has to skip may slow it down arbitrarily
depending on table size.

Ultimately, I think that it should also be reconsidered whether to
shrink tables.
Amortized O(1) newindex runtime doesn't need to be sacrificed;
for example, what about shrinking tables once the load factor drops to
less than one third?

- Lars