lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Hi!

Am 29.03.2018 um 21:12 schröbte Petri Häkkinen:
On 29 Mar 2018, at 9.29, Philipp Janda <siffiejoe@gmx.net> wrote:

Couldn't you just add the length-updating code to normal tables? I mean we already pay the extra space for the length field.

I don't think we are understanding each other. Let me try again:

I suspect so too, because I mostly agree with you. I just don't like the idea of adding a separate array sub-type. And I think it's completely unnecessary (except as a type hint for some serialization libraries).

To sum up what I meant:
- Add an array size field to all tables like you did in your patch.
- Don't introduce a new array sub-type.
- Add the array size update code to the table key insertion function (I believe there is already a separate code path for integer inserts). - Array literal syntax is optional (I wouldn't add it). But table literals should set the array size field to the largest integer key. - Add a `table.resize()` function like you did, and modify the other table functions to update the array size field. (You probably also need some new C API functions for setting the array size.)


Lua tables are just key-value pairs with some optimizations. In a sparse table, like the one I showed in my example, there is nothing between the key-values, not even nils. This is the fundamental reason for the issue with holes and why # can't be implemented in constant time.

I have a different view. Tables are mappings from arbitrary keys to corresponding values (a default value most of the time). If you have a finite, ordered set of keys, a Lua table gives you a finite, ordered set of values indexable by those keys. In Lua this set of keys is by convention the set of integers starting from 1 up to some maximum integer. The difficulty is agreeing on this maximum integer. You have multiple options for that:

1. You could have a manually managed length for each table (e.g. the ".n" hack or your `table.resize()`). 2. You could observe the user of the table (e.g. use largest assigned integer key). 3- You could scan all key-values that are different from the default value (e.g. current Lua sequence concept, or linear scan like `ipairs()`). Most of the time you are only interested in the non-default values anyway.
4. maybe more?

All of those approaches have advantages and disadvantages. AFAICS, your approach is a combination of 1. and 2.: You grow depending on usage pattern (assignment out of bounds), you shrink on explicit request (`table.resize()` or `table.remove()`). I think it's a nice mix, and your benchmarks have shown that the performance and space concerns (still looking for that old post -- we had a lot of discussions about table length over the years) are mostly unfounded.


When you remove a key-value pair from the table, you would have to do some sort of scan of key-values to know the new length. You can do this when you insert/remove key-values, or you can do it lazily in the length operator. The latter is usually more efficient and that's what Lua does. When doing insert/remove on a table the table data structure does not know whether you are removing the last element or not, and in which indices the other values are stored in.

If you remove one element, the length decreases by one, as you wrote yourself below. No scan necessary.


Arrays are different. They are dense and have all key-values between 1 and array size (even nils).

Arrays don't have to be dense. For sparse arrays it might make sense to store only non-nil elements. Arrays must "have" all key-values between 1 and array size, but they don't necessarily have to store them.

What do you think about the following "array"?

    local a = setmetatable( {}, {
      __index = function( _, k ) return tostring( k ) end
    } )
    table.resize( a, 10 )  --> array of "1" to "10"

When you push a new value, the array size always grows by one. When you insert a new key-value the array size will always be extended up to the new index (so array size becomes the index of the added value). When you remove a key-value, you always decrease the size by one. When you overwrite an existing value, the size does not change. Simple and efficient.

Completely agree.


To get the speed benefits and to fix the hole issue, you have to break the rules of the game, i.e. the semantics need to change. Coming up with a new working semantic for tables, as proved by all the discussions over the years on this list, is hard (I would say impossible) and can break existing code in subtle ways. Luckily, arrays have just the right semantics for the job.

I think there will be broken code no matter which solution to the array problem we choose. In this particular case: nilling the last sequence element won't shrink the table anymore, and iterating over that table will then produce an unexpected nil value.


Petri


Philipp