Re: Can a Lua implementation use 63-bit integers or even "big integers"?

Yes, 64 bit pointers are larger than everything needed in Lua apps. In fact 64-bit OSes allow 64-bit pointer for memory addressing only in a more limited user space (in virtual memory) and keeps some high bits for special functions (e.g. protection of pages, prevention of modification of code, prevention of execution of modifiable data, protection of hardware space for memory-mapped I/O...)

So the user space in virtual memory will always be under the 64-bit limit. The Lua engine should protect these high bits and not use them for addressing (or infer their correct value for use in user space).

In that case, we have at least one bit available to distinguish memory pointers (Lua object references to tables, strings, userdata, lightuserdata, threads/coroutines, C functions...) from other use (e.g. integers up to 63 bits), and so we can avoid the allocation of a separate field in a data structure or the allocation of an extra register to hold the value type, and a single 64-bit register can be enough for most cases.

If needed we can also reserve additional high bits for other subtypes, (e.g. to store even 64-bit doubles when there's no loss of precision or range when the exponent part is small enough so that we can move the sign bit from the most significant bit to a less significant bit: adjustment to restore the true 64-bit value just requires a logical shift to the left by one bit followed by an arithmetic shift by one bit to the right; it can preserve the NANs and infinites), and distinguish other reference types (strings, userdata...) that also don't need the full 64-bit to address the virtual memory in user space.

Le mar. 31 août 2021 à 16:38, Coda Highland <chighland@gmail.com> a écrit :

I never said anything about using NaN tagging for integers. I was analogizing using the high bit to tag an integer versus other kinds of values to using NaN to tag a float versus other kinds of values. Don't read too much into it. ^^()

63-bit integers make perfect sense in this context. 64-bit pointers are capable of addressing far more address space than any modern system is capable of using. 64-bit integers are capable of representing numbers far in excess of most real-world applications. Using a single bit to distinguish between an integer and a pointer results in a negligible loss of utility for either one, assuming that you aren't using the integer to store arbitrary binary data instead of as an actual number, given that the alternative is to use a data structure that doesn't fit in a single machine register to distinguish between types.

/s/ Adam

On Tue, Aug 31, 2021 at 8:31 AM Philippe Verdy <verdyp@gmail.com> wrote:
Using NAN tagging for integers is stupid: this means seeting the exponent part to the value used by NAN, then use the sign bit and mantissa field to store the integer value (except for 0 which uses the floatting point 0, with the exponent field set to the same value as the one used for "denormal" very small fractions). The difference with "denormal" floatting points is that:
- NAN also requires keeping two bits in the mantissa for the signaling/non-signaling flag, in addition to the sign bit which is not a significant sign for NAN and is the most significant bit of the mantissa.
- with the NAN tagging, the implied exponent is 2^0 (so that it represents an integer), where as the "denormal" small fractions has an implied exponent 2^-n (to represent very small fractions), where n=bitsizeof(floattype)-bitsizeof(exponent)-2

If you use IEEE 64-bit doubles, the exponent part if 16 bits, so you have 46 bits left in the mantissa (after removing the signaling flag bit, and keeping 1 bit to distinguish NAN from the representation of integers), so you can represent integers in [-2^45 ... -1] or [+1 ... +2^45]. If you want to add integer zero distinct from floatting point 0, you have to drop one value from the previous representation and use basic binary integer arithmetic on the mantissa field in the inclusive range [-2^45 ...+2^45-1], but you need to take care of the 46-bit overflow (to avoid generating floatting point NaNs or denormal numbers and to not alter the exponent part). It is interesting as a storage method (loading it is fast, you just have to mask 18 bits), but not for computing; storing is slow as it requires checking the range (it with the masked 18 bits are non zero, you have an overflow, so you need to change it to:
- to a regular floatting point: right-shift (with sign extension) the value until the masked bits re all zeros or all ones, then set the exponent according to the number of right-shift performed. "small" 46 bits can be represented that way (to save storage memory) but it has computing cost during stores. It has no use within the virtual machine to store that in a 64-bit register capable of storing both an floatting point and an integer.
- or to an other "large representation of true 64 bit integers: this is the best model, but requires a separate field to store the datatype (this is the approach used in fast LuaVM that can waste an extra field to store the data type and other object types, where the whole object can be 128bits: the second 64-bit part can be used for many other things than just datatype-tracking, e.g. object marking for garbage collection or marks that the object is free and reallocatable; but if you want to save memory, arrays of integers should be using native 64-bit integer arrays, and datatype tracking and flags should use another array (of bytes)
At runtime inside the JIT-compiled native code, such representation looses all its interest: you'll natively compile with separate instructions using registrers as either integers or as floatting points (and often not in the same set of registers!) So this representation is only useful for an interpreter of bytecode (before it is JIT-compiled; note that JIT compilation may be delayed in the Lua VM, like in Java or dotNet or other VMs) or as a way to "compact" the representation of objects (provided you implement accessors for loads and stores) that are part of a large collection of objects.

In my opinion, this is overkill: small fixed-size objects (booleans, integers and floatings points, and nil) are better handled by allocating them in pools of memory containing objects of a single type, that does not need any datatype marking: id the buffer header used by the memory pool that can track this datatype an any other usage bits e.g. for garbage collection. In registers of the code, the datatype is already known.

Le lun. 30 août 2021 à 21:41, Coda Highland <chighland@gmail.com> a écrit :
On Sun, Aug 29, 2021 at 11:34 AM Flyer31 Test <flyer31@googlemail.com> wrote:
???There are no 63bit integers??? (or do you know a 63 bit processor?)

There are a number of languages (OCaml comes to mind) that use a 32-bit or 64-bit tagged representation for values. Integers can be represented by using the most-significant bit to indicate that it's an integer type, and the rest of the bits contain the numeric value. This is analogous to how LuaJIT (and briefly, at one point in history, Lua itself) used NaN tagging to represent other types of values inside of an otherwise-standard 64-bit double-precision floating-point number.

/s/ Adam