lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On 16/07/2020 21:07, Mason Bogue wrote:
We are all generally aware of the issues with the length operator on
tables containing nil. There was an idea to fix this in 5.4 by
creating a special t[x] = undef syntax, which was considered too ugly.
Here I propose an alternative:

#t = 5 -- sets the length of the array portion of t to 5

Essentially we allow the length operator to be used in an lvalue (we
can already invoke functions in lvalues, so this is not
unprecedented). If I understand the implementation correctly, the
length will not be altered unless the table is reindexed, which only
happens if we make a lot of insertions to the hash-map. If this is
still too volatile, there could be a flag similar to `isrealasize(t)`
that doesn't allow the size to shrink if it was set this way. If
necessary for semantic consistency, doing this could also set t[5] =
false when otherwise t[5] = nil.

This also speeds up the creation of large arrays. Previously you had to do this:

t = {}
for i = 1, 10000 do t[i] = false end

During that loop the array would be reallocated several times. With
length assignment (`#t = 10000`) you get *one* reindex. Much nicer.

Really, I can't understand why the simpler, easiest solution, has not been implemented yet: simply add a new function in table library for those efficiency aware programs where preallocating space for a table is paramount.

The function is already there in C API (lua_createtable), so it can't possibly be considered "adding bloat" since it would be just a few bytes more to interface add the corresponding table library function.

All the other proposals are really "hacks" and complicate matters adding conceptual overhead to the language and its syntax for a corner case. This corner case is, IMO, important enough to grant for an added `table.create` function, but it is not important enough to mess with the syntax (how many times would a user need to write #T=maxlen ?). Not to speak of the interactions with other mechanisms and syntax, as pointed out by other in this thread.

Your proposal, moreover, will allocate array slots /after/ the table has been created, so it is not optimal as it could be.

Please, understand I'm not "ranting" against you. I know there is really the need sometimes to allocate tables efficiently (I've been there and I'd like to be able to do it in pure Lua). But the real (elegant, no-bloat, easy, readable) solution is what I told you and "screams" to be implemented.

Why Lua team didn't do it yet (while struggling with yet another GC implementation), I can't really understand. IIRC years ago they replied it would lead to "bad" programming practice, since it leaks an implementation detail (array part vs. hash part). But really I don't agree on that. That leak is already there in all the "sequence vs, non-sequence" thing and in the whole table library.

Moreover, if one wants to be abstract and non-leaky, an hypothetical `table.create(narr, nhash)` can be defined at Lua level just as "hinting" the Lua engine to create a table efficiently for a sequence of `narr` elements and that the user means to fill with `nhash` keys. This doesn't mention "array parts" and "hash parts" whatsoever and use concept well defined at Lua level. So it is non-leaky. The implementation would be free to implement table.create by mapping it to lua_createtable (straightforward implementation) or to whatever future mechanism would be needed for efficient allocation, would the underlying implementation of table change.

So we can have our cake and eat it too, in this case. Easy-peasy.

Another, more elegant long-term solution (maybe), would be to introduce a way to specify this kind of hint in any table ctor.

Just off the top of my head:

{:1000: ... --[[rest of the ctor, as usual]]} -- preallocate 1000 array slots


{:1000,2000:...} -- preallocate 1000 array and 2000 hash slots


{:nil,2000:...} -- preallocate 2000 hash slots

I said "preallocate", so this /is/ implementation details leaking. If you care, substitute it with "hinting" as I said above.

The syntax could be defined by stating that the engine would do its best to optimize whatever allocation is needed to satisfy those parameters (without mentioning hash/array parts).


That (or similar) syntax could be easily extended to other "hinting" or table properties that could, in the future, lead to other optimizations or useful features.

e.g.:

{:"const": ....} -- table that cannot be changed
{:"noresize":...} -- table whose number of entries cannot change

Heck, maybe, some could be used to automatically give a table a metatable with suitable defined metamethods (just brainstorming here), as in:

{:"weak":...}
{:"ephemeron":...}


Cheers!

-- Lorenzo