You can already use the IEEE 64-bit double type by taking unused bits in NaN values to store ANY other Lua type, including for packing small string values (6 bytes or less) inside it, without storing a reference to an external buffer. NaN values just use a static value of the exponent field, and a single bit of the mantissa part (and the sign bit is not used; you may keep 1 bit in the mantissa to keep signaling NaNs; non-signaling NaNs have no use and are not distinguished in Lua from signaling NaNs; signaling NaNs can also be reduced to a single value for the mantissa part).
With that, you can store natively all types, and you still have enough bits in the mantissa to store any reference when needed: converting a pointer to a reference can be made arithmetically and is very simple when pointers have also a small alignment constraint for allocated buffers: their least significant bits are zeroes, so non-zero bits there also allow distinguishing other datatypes.
No need to use any shift/rotation, only masking is sufficient and it can be done very efficiently.
Overall we get huge performance boosts by improving the data locality (lower usage of the memory caches) and data alignment. You'll see immediatelyu that Lua uses a lot of very small objects and that most of them fit in 64-bit without any external access to another memory space, so that they can stay in native 64-bit registers (the C/C++ compiler will optimize them). You'll also get significant boost by dramatical reduction to memory allocators and less work to do in the garbage collector.
Using memory pools (by sizes) also in the allocator improves the general speed.