lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On 9/14/2016 4:42 PM, Miroslav Janíček wrote:
On 15 Sep 2016, at 0:15, Ross Berteig <Ross@CheshireEng.com> wrote:
I'm not sure there are practical versions of Lua where that table
could actually exist. It would have to be built with a small enough
integer type that a table with half of all possible integral keys could
fit within (virtual) memory.

Bah, that’s an implementation detail ;)

Well, I'm an Engineer, not a Mathematician, so implementation details are what pays my bills ;-)

But you’re of course right — we probably won’t see systems that
could  handle such tables for 64-bit integers in our lifetimes. However, I
think it’s perfectly possible to have such a table with 32-bit integers,
even today.

And although it would likely be foolish, I suspect with some effort one could build a Lua using 16-bit integers, even if the float subtype remained 32-bit. That might even make a kind of sense in some embedded system environment, although having enough RAM to comfortably *use* Lua usually implies also having a 32-bit CPU core.

 (As a side note: in my reading of the Lua Reference
  Manual I haven’t been able to conclusively discard the
  possibility that computing sequence length involves
  invoking the __index metamethod. I know it sounds silly,
  but I think that a Lua implementation that would do that
  would still be “legal” according to the manual. Now, in
  such an implementation, the __index metamethod

  function (t, k) if k > 0 then return k; else return nil; end end

  would do the trick for any integer width.)

The 5.3 reference at https://www.lua.org/manual/5.3/manual.html#3.4.7 does not say that only raw operations will be used when __len() is not provided, and nothing in section 2.4's description of __len or __index would seem to promise otherwise.

As I read it, your proposed __index would not affect #t. You would need to define __len for that, which makes sense because there really is no sensible meaning to #t in that case.


Dipping into the source code, we find that the stock implementation of the length operator is here:

  https://www.lua.org/source/5.3/lvm.c.html#luaV_objlen

which calls

  https://www.lua.org/source/5.3/ltable.c.html#luaH_getn

if __len() is not defined. luaH_getn() is part of the table implementation and directly peeks at its internals to do the binary search for a boundary value to return.


Now here's another implementation detail: luaH_getn() is declared to return `int`, and the part that searches the hash table works with a range i to j where i and j are declared `unsigned int` with the identified boundary index implicitly cast back to `int` on return.

The catch is not only that `unsigned int` is not the same range as `int`, but that it does not use `LUA_INTEGER` so the range of integer keys implemented for the length of a sequence is restricted to the range of `int` not whatever LUA_INTEGER is configured to be.

Also, there will be a gotcha waiting should #t exceed INT_MAX, which I think could cause #t to be negative. If so, I'd argue that is a bug. I have not attempted to create a test case, however.


But anyway, that wasn’t really the point. The point is that under
this  definition, it seems that there exists a sequence that does not
> have a length.

Only if there can be a largest integer key. And in practice, in every implementation, there does exist a largest integer n such that t[n], t[n-1], and t[n+1] are all distinct entries in the table t.

Then there are many values of i > n, where i == i+1 (and hence t[i] == t[i+1]) and the whole idea of a sequence falls apart. This is a natural result of the quiet and soft roll over from integers to floats in Lua 5.3, and always happened in earlier versions of Lua where numbers are always floats.

The practical answer is to simply note that there is an implementation defined longest possible sequence, and possibly give some guidance about what it might be. As I read the sources, that longest length is INT_MAX as currently implemented.


I think the fix could be quite simple, along the following lines:

 * if t[1] == nil, then 0 is a border;
 * let MAX be the maximum value for an integer, then MAX
   is a border if t[MAX] ~= nil;
 * for all positive integers i, if t[i] ~= nil and t[i+1] == nil,
   then i is a border.

Plus: a table is a sequence if and only if it has exactly one border.

That said, I'm not certain that the definition of sequence cares
about whether the keys are specifically integer type numbers. So the
number math.maxinteger will be 2^63-1, and 2^63 can be exactly
represented in 64-bit float. 2^63+1 cannot, and that is probably where
the fuzzy non-border happens.

I must admit I’m not exactly sure what you mean.....

I'm just pointing out that the implementation detail of there being special treatment for integral values is not necessarily important for the definition of a sequence. 2^63+1 cannot be a border because 2^63+2 is, in the fun world of floating point arithmetic, the very same key.

--
Ross Berteig                               Ross@CheshireEng.com
Cheshire Engineering Corp.           http://www.CheshireEng.com/
+1 626 303 1602