Re: Suggestion : Built-in tuple type , was : packed structures

You can already use the IEEE 64-bit double type by taking unused bits in NaN values to store ANY other Lua type, including for packing small string values (6 bytes or less) inside it, without storing a reference to an external buffer. NaN values just use a static value of the exponent field, and a single bit of the mantissa part (and the sign bit is not used; you may keep 1 bit in the mantissa to keep signaling NaNs; non-signaling NaNs have no use and are not distinguished in Lua from signaling NaNs; signaling NaNs can also be reduced to a single value for the mantissa part).

With that, you can store natively all types, and you still have enough bits in the mantissa to store any reference when needed: converting a pointer to a reference can be made arithmetically and is very simple when pointers have also a small alignment constraint for allocated buffers: their least significant bits are zeroes, so non-zero bits there also allow distinguishing other datatypes.

No need to use any shift/rotation, only masking is sufficient and it can be done very efficiently.

Overall we get huge performance boosts by improving the data locality (lower usage of the memory caches) and data alignment. You'll see immediatelyu that Lua uses a lot of very small objects and that most of them fit in 64-bit without any external access to another memory space, so that they can stay in native 64-bit registers (the C/C++ compiler will optimize them). You'll also get significant boost by dramatical reduction to memory allocators and less work to do in the garbage collector.

Using memory pools (by sizes) also in the allocator improves the general speed.

Le ven. 5 nov. 2021 à 15:27, Ranier Vilela <ranier.vf@gmail.com> a écrit :

Em sex., 5 de nov. de 2021 às 05:32, 云风 Cloud Wu <cloudwu@gmail.com> escreveu:
Roberto Ierusalimschy <roberto@inf.puc-rio.br> 于2021年10月12日周二上午12:40写道：
>
> Hugo Gualandi came with the idea of using a packed structure to store
> Lua values. Intel CPUs (and it seems ARMs too) can work with unaligned
> data (or aligned with weaker boundaries) and, at least for some
> architectures, with very small (or even none) performance penalties.
>
> As a very fast check, I simply changed the following line in lobject.h:
>
> -typedef struct TValue {
> +typedef struct __attribute__((packed)) TValue {
> TValuefields;
> } TValue;
>
> This is valid in gcc and clang. (It gives one warning in ltable.c which
> for now I am ignoring. It is a trivial change to correct that: pass the
> second parameter of 'mainposition' by value instead of by reference.)
>
> I quickly tested that in two Intel i7. As expected, memory use
> by arrays is cut by almost half (9/16). Maybe unexpected, I did
> not see any relevant performance penalties at all. (In a few
> benchmarks, performance even improved, probably because there
> is less memory trafic.)

I think the point is the types of values in the array.

In many use cases, the types are determined. For example, we always
use 4 numbers of a table for a vector4 type in 3d game.

I suggest lua to add a built-in tuple type, it will be a special kind
of table. For vector4, we can write:

-- n : number
-- i : integer
-- b : boolean
-- s : string
local v = table.tuple "nnnn" -- initial: { 0,0,0,0 }
v[1] = 0
v[2] = 0
v[3] = 0
v[4] = 1

The string "nnnn" is the type annotations for the tuple, and it can
store in the internal object.

We can implement a library for this, but a built-in support would have
better performance. (compact memory layout and efficient
implementation.)
What if we create a new TValue that only uses 7 bytes for the value and 1 byte for the type?

typedef struct TValue {
lu_byte value_[7]; /* 7 bytes */
lu_byte tt_; /* 1 byte */
} TValue;

Total Size: 8 bytes
It would have to be considered the big-endian and little-endian,
to store and read.

regards,
Ranier Vilela