lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On 04/11/2019 14.05, Philippe Verdy wrote:
It just proves that Lua, when it defined tables to be working both
as associative array (hashed dictionnary) and as indexed arrays (vectors, i.e. "sequences" in Lua terminology) […]

You're putting too much weight on an implementation detail.  THERE ARE
NO ("indexed") ARRAYS.  Lua simply doesn't store the keys if they can
be recovered from the position in linear storage (putting stuff in a
"C array") and that happens to save some memory.  For funsies, look at

   t = { a=1, b=1, c=1, 1, 2, 3, 4 }
   t[5] = 5

Is t[5] part of the "array part"?  NOPE!  That'd allocate 64 bytes,
whereas storing key & value in the "hash part" doesn't need any
allocation.  And so off to the "hash part" it goes…

Is / should it be part of the "logical array"?  Probably, and so
`ipairs` will still find the 5 (by doing a linear traversal) and `#`'s
binary search will also identify the correct length.  A hypothetical
`nonipairs` couldn't easily skip it – doing that correctly would be
fairly complicated…  (Exact behavior depends on how many slots are
available in the array / hash part, the order of adding / removing
fields, etc.…)

Defining any meaningful notion of "length" that works on arbitrary
tables and is sufficiently fast is hard (impossible?).  The current
binary search approach works fine in 90%+ of the situations.  (Whenever
you have a contiguous ("nil-free") array, it just works / _is_ the
length.  When you want to collect something with `t[#t+1] = x`, it works
and you don't even need a nil-free array if order doesn't matter. Etc.)

If you need an actual array, manually track the length (and use __len to
return your "better" length) – that can be done efficiently.  IIRC,
there are lots of implementations, they're just not part of Lua's
standard library.  ("Batteries not included", as the saying goes.)

It just proves that Lua […] chose the wrong implementation approach.

Nope.  Lua chose simple over complexity explosion.  Of the three "basic
building blocks" (array, (multi-/)set, hash), hash tables make it
easiest to implement all others efficiently.  (Due to Lua implementation
details, a table used as an array "upgrades gracefully" under the hood
and is just as fast / memory-efficient as a "real" array.)  Unlike most
other languages, Lua only has one data structure – hash tables.  (Unless
you count nested closures[1] or other "Turing tar pit" constructions as
"data structures"…)  If you need other data structures, use a library
(or just write the stuff inline – most of it is trivial).

If you look at other "modern" languages (i.e. design wasn't constrained
by 80s hardware), they probably (still) have a native array type.  But
because implementing a (even very simple) hash type is hard, they
probably also have that.  Oh and probably also tuples, because they're…
not arrays, for some reason.  And before you even notice, you've got
everything but the kitchen sink… (no wait actually there are *two* over
here!)

...

AFAICT, the Lua authors decided that not having a native array type was
less pain than unleashing this hydra.  I concur.

For now, the #t operator is completely broken. Tables have to be segregated in applications between those used as integer-indexed sequences (almost only data), and those using hashed dictionnaries (notably all objects for their properties).

I assume you'd not mix lots of different kinds of data in a single
array?  Then why'd you want to do that with tables?  (Of course,
against better judgement, you can use some manual splitting / magic
indexing on something like a C array… but it's mostly quite painful.
While doing that with tables is less painful, that still doesn't make
it a good idea.)

(Or do you mean that you have multiple "logical types" that share the
same basic representation (tables) and you have to tell them apart?  If
so, I don't understand the problem.  Trying to get the length of a
non-array is just as meaningless as adding centimeters and inches
without conversion, and neither gets you a warning / error unless you
add your own checks…)

Something is severely wrong in Lua, compared to what Javascript and PHP do, with similar constraints: the underlying implementation of tables should NEVER affect what #t returns, and the API was not consistantly written. The result in applications is unpredictable (and why tables in Lua are so slow and memory-inefficient compared
to Javascript or PHP, even without their modern JIT compilers like v8
for Javascript or Zend for PHP, or modern implementations of Perl
or Python?)

LOL.

Some JavaScript implementations use non-portable ugly pointer / nan
hacks to compress the memory representation.  Lua intends to be portable
and therefore can't do that.  Other than that, I'm not aware of any
obvious inefficiencies in Lua's implementation of tables.

Do you have any data supporting your claims?

Tables are so universally used in Lua that this part should be urgently refactored for efficiency, stability, and predictability.

I'm not aware of any major problems.

There's the minor quirk that nan cannot be used as a key (because it
compares non-equal to itself and at the same time there are billions of
different possible nan values, and thus it couldn't be found again
unless you add a dedicated searcher to the table implementation).

The only other thing I can think of is the problem with inserting stuff
into a table while iterating over its pairs()… but that's just as broken
everywhere else.  At least you can safely remove fields during
traversal… unlike in Python.  (Also, Python has a `del` *statement*…
seriously?!)

Without it, Lua will remain a "toy", used in niche projects, it will not compete in the same playground: Python rocks! Javascript rocks too! Lua does not. Various projects that initially started with Lua integration have renounced and disabled this feature and prefered using other scripting engines (most frequently now they use Javascript/ECMAscript, often though node.js and jQuery).

JS is a steaming mess. ([2] is just the tip of the iceberg.)  Python is
huge and accumulating more and more historical baggage.  The nice thing
about these languages is that lots of people are using them, which means
there's lots of ready-made code which (due to the size of the standard
library) is usually mostly compatible and can be glommed together with
relatively little pain.  If you like that style of working, this can be
quite comfortable in spite of language shortcomings.

Lua, OTOH, does not have much of a standard library.  For most larger
projects, you'll first have to grow a language[3].  The great thing
about Lua is that you can do that in a day or two (including wrapping /
replacing the parts of the standard library that you need).  In most
other languages, the size of the required parts of the standard library
(plus usually constraints on what you can redefine) make that much
harder / impossible.  If you prefer this way of working, Lua is *way*
better than the other languages you mentioned.

May be one way to revive Lua would be to have an implementation in Javascript (I'm sure it will perform better and more safely than the current implementation in C/C++, and Javascript engines are now much easier to integrate and offer better integration with many more libraries, including with other languages like Python, Perl, SQL, PHP)...

[…*snip*…]

Surely you must be joking…

-- nobody

[1] Functions (or closures) "are" data structures:

    function K( x )  return function()  return x  end  end
    function cons( x, y )
       return function( b )  if b then return x else return y end end
    end
    function fst( c )  return c( true )  end
    function snd( c )  return c( false )  end
    --car, cdr = fst, snd
    empty = K( nil )
    function isEmpty( xs )  return xs == empty  end
    do
       local function go( n, x, ... )
          if n < 1 then  return empty  end
          return cons( x, go( n-1, ... ) )
       end
       function list( ... )
          local n = select( '#', ... )
          return go( n, ... )
       end
    end
    function fold( f, z, xs )
       if isEmpty( xs ) then  return z  end
       return f( fst( xs ), fold( f, z, snd( xs ) ) )
    end
    function map( f, xs )
       local function step( x, xs )  return cons( f( x ), xs )  end
       return fold( step, empty, xs )
    end

    -- and then you can go
    function concat( a, b ) return a.." "..b end
    fold( concat, "", map( string.upper, list( "foo", "bar", "baz" ) ) )
    --> FOO BAR BAZ


[2] http://algorithmicassertions.com/visualization/2014/03/27/Better-JS-Equality-Table.html


[3] https://youtu.be/_ahvzDzKdB0

(At some point during the talk, he goes on and on about number types…
the same applies to data structures.  And in general, I think that
Lua gets _really_ close to what he's describing… there's a bunch of
details[4] that get in the way of really taking this to eleven, but
Lua's implementation is small and (mostly) easy to understand & modify –
it's feasible to patch it yourself.  That's not something I'd claim of
the $FooMonkey JS implementations or CPython etc.)


[4]  There's slightly too much focus on what things "are" / how they are
represented, not how they behave.  (And so you can't mix values of
different "primitive types" into a "logical type", or else __eq refuses
to work.  Instead you'll have to wrap things…)  All the control
structures react to only two values (nil/false) or, equivalently,
there's no falsy value that has an identity / that you can associate
data with.  (And so you have to re-build that functionality using custom
functions, and things turn into Lisp…)  Comparisons immediately coerce
to a boolean.  (And so you can't throw dummy variables (with
AST-building __add/…-metamethods) into a plain Lua function to extract a
trace / AST to (re)compile for a different target, instead you'll have
to Lisp a bit when writing the function…)  Stuff like that…

All of these produce "hard failures" – when you need that stuff and
build it in vanilla Lua (as functions), accidentally using == / if / <=
/ … is practically guaranteed to be a bug.  But each of these can be
patched in a couple dozen lines, and then things just work.