• Subject: Things not to add to Lua (slices) [was: table insertion]
• From: Mike Pall <mikelu-0411@...>
• Date: Fri, 5 Nov 2004 02:48:34 +0100

```Hi,

David Given wrote:
> For dealing with segments of arrays, has anyone considered using some kind
> of slice notation?

Well, this discussion belongs to a different thread. I feel guilty
because I added that table.move() remark ... :-)

While I think there is a need for good and fast support for the 'length'
and the 'append' functionality, I do *not* think that slice extraction
or slice assignment is a worthwile construct to have in the core language
(having it in the library is a different issue).

[More things to avoid, below.]

Since you mentioned Python, I can back this up with statistics:

Some time ago I measured the VM opcode distribution for the huge library
that ships with Python (which amounts to roughly one million opcodes!).
Here are the static probabilities of the relevant opcodes:

0.634%  BINARY_SUBSCR    y = o[n]
1.558%  STORE_SUBSCR     o[n] = x
0.033%  DELETE_SUBSCR    del o[n]   (this is o[n] = nil in Lua)

0.014%  SLICE+0          y = o[:]   (shallow copy)
0.090%  SLICE+1          y = o[n:]
0.089%  SLICE+2          y = o[:m]
0.046%  SLICE+3          y = o[n:m]

0.003%  STORE_SLICE+0    o[:] = x   (replace contents)
0.003%  STORE_SLICE+1    o[n:] = x
0.002%  STORE_SLICE+2    o[:m] = x
0.003%  STORE_SLICE+3    o[n:m] = x

0.002%  DELETE_SLICE+0   del o[:]   (delete contents)
0.001%  DELETE_SLICE+1   del o[n:]
0.001%  DELETE_SLICE+2   del o[:m]
0.002%  DELETE_SLICE+3   del o[n:m]

As you can see it's an absolute waste to add VM opcodes for slices. The
dynamic (runtime frequency) statistics are even worse. And this is in
spite of the fact that Python programs need these operations a lot more
than Lua programs (which has to do with different API styles).

But ... you basically need a new VM opcode if you add a new syntax
(unless you want to repeat the _G.__pow() ugliness). It all boils down to:
Then why bother adding a new syntax if it is so uncommon?

Note that 'length' and a generic 'append' are not uncommon. And they
can be provided with just a little bit of syntactic sugar (but no new
syntax and no new metamethods) as I have shown in my previous mail.
IMHO of course.

Other lessons learned from Python:

- obj.method(x) is split up into two parts:

1. obj.method is resolved with LOAD_ATTR and returns a new binding.
2. The binding is called with CALL_FUNCTION.

The separate binding step is the real performance killer. It has proven
to be very hard to optimize because obj.method alone also has a syntactic
meaning (it returns the binding). The Python VM cannot optimize this
at all. Have a look at the hoops that Psyco has to go through for this.

Lesson #1: Do not provide syntax or semantics that in turn force
you to dynamically create heap-allocated method bindings.

Note: This is not meant as a snide remark about the __methcall discussion.
The binding step can be avoided in most cases at the cost of some
extra API functions (cf. Mark's postings).

- Python has no special syntax for getting the length. There is only a
global function len() that calls different functions for every object
type (indirectly resolved from the type object).

Since global functions can be reassigned dynamically it has proven to
be hard to optimize this. Psyco has to go to some lengths to catch all
modifications to the globals (ugh).

Lesson #2: Tie the length functionality to the object and not vice versa.
Provide special syntactic sugar only if it can be optimized
easily later on (n = t[] qualifies).

- There are subtle and hard to remember differences between all of the
core Python types when it comes to appending, extending, adding or
concatenating one or more objects to a container type. Things get worse
with derived types. I won't go into the details -- look at the Python
quickref chart and get worried.

Lesson #3: Be consistent with basic operations on containers. Choose
only semantics that are applicable to many different container
types (__index and __newindex qualify, most others do not).

- Python passes around arguments and multiple return values with
heap-allocated tuples. There are some tricky optimizations in the
Python VM to shortcut argument passing, but it works only under
special circumstances and not when calling C functions (uh oh).
Passing tuples as return values is not optimized at all.

Yes, you got it: function calls are expensive in Python.

Obviously this is a serious performance killer. Lua does not need this
since it has clever semantics for multiple returns from functions
and offers an (almost) zero-copy stack API for calling C functions.
The new varargs syntax is yet another step in the right direction
(avoiding a heap-allocated argument table).

Lesson #4. Please keep that in mind when the recurring 'we need tuples
for Lua' thread comes around again.

Bye,
Mike

```

• Follow-Ups:
• References: