lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Hi,

I'd like to ask for clarification about some Lua language
semantics. Both about intended (but as of now unspecified)
semantics and about the reliance on implementation semantics by
application developers.

Lua is not a side-effect free language and has some special
semantics for table accesses (__index/__newindex metamethods, even
for globals). So there is a need to define the evaluation order
for expressions in parallel assignments, function calls and table
constructors. Assignment order is relevant for parallel
assignments, too.

[This is only about the order of the execution of any side-effects
and not about the order of the expression operations themselves,
which _is_ fully specified by operator precedence rules and
parentheses.]

The Lua 5.1 manual remains pretty much silent on these topics. It
doesn't even say that evaluation and assignment order is undefined
(which would be a useful statement, too).

This is not just an esoteric issue for language lawyers. Since Lua
is used as a data definition language (DDL) and can be easily
subverted into a domain specific language (DSL), it has practical
relevance, too (see below).


The current _implementation_ (as of Lua 5.1.1) uses a strict
left-to-right evaluation order (while respecting operator
precedence and parentheses), and a right-to-left assignment order
(maybe surprising for some).

E.g. 'f(a+b,c*d)' generates the following bytecodes:

  GETGLOBAL  0 -1    ; f
  GETGLOBAL  1 -2    ; a
  GETGLOBAL  2 -3    ; b
  ADD        1 1 2
  GETGLOBAL  2 -4    ; c
  GETGLOBAL  3 -5    ; d
  MUL        2 2 3
  CALL       0 3 1

And 'x,y,z = a,b,c' generates the following bytecodes:

  GETGLOBAL  0 -4    ; a  <== left-to-right evaluation order
  GETGLOBAL  1 -5    ; b
  GETGLOBAL  2 -6    ; c
  SETGLOBAL  2 -3    ; z  <== but inverse assignment order
  SETGLOBAL  1 -2    ; y
  SETGLOBAL  0 -1    ; x

Order matters since every of the *GLOBAL opcodes may call a
metamethod in turn which need not be side-effect free. Similar
examples can be made for generic table accesses or calls to
closures which modify upvalues.

Try to guess what this prints: :-)
  local x = 1
  local function f() local t = x; x = t+1; return t; end
  x,x,x = f(),f(),f()
  print(x)


Recently I've encountered two examples where I would need a
definitive statement about the language semantics (because these
are supposed to be stable) and not just the implementation
semantics (which may change):

1. I played with the idea of a DSL for processing data streams and
got it down to:

  while not eof do
    local cmd, id, name, tag = byte, uint, str(byte), ushort
    if cmd == PROTO1_CMD_INIT then
      ...
      uint, strz = PROTO2_CMD_INIT, real_name or name
    elseif ...
    end
  end

This works by catching reads and writes to globals and then in
turn doing stream reads or writes. E.g. when the global 'byte' is
accessed, it reads one byte from the input stream, converts it to
an unsigned number and returns it. Similarly, when the global
'uint' is written to, the passed number is written as an unsigned
int to the output stream: magic globals with side-effects.

Very concise syntax and very flexible. You can easily mix stream
parsing and stream generation with logical decisions. Mix that
with automatic yielding when a stream blocks and you've got a nice
framework for network programming.

Alas ... it doesn't work. It relies on a consistent left-to-right
evaluation _and_ assignment order. The former would be relying on
unspecified (apart from the source) implementation semantics and
the latter just doesn't match the implementation semantics.

Too bad. :-(


2. An optimizing compiler may be able to gain some performance by
reordering expression evaluation:

The obvious candidate for this would be a (hypothetical) future
version of LuaJIT. The current version very strictly follows the
order of bytecodes and doesn't reorder anything which might have
side-effects.

But it would be advantageous to reorder e.g. the evaluation order
for the CALL opcode. Moving the resolution of the called function
closer to the CALL allows some shortcuts.

Alas, with (say): 'local y = math.sin(x)'

  GETGLOBAL  0 -1    ; math
  GETTABLE   0 0 -2  ; "sin"
  GETGLOBAL  1 -3    ; x
  CALL       0 2 2

... one cannot reorder the GETGLOBAL+GETTABLE for 'math.sin'
closer to the CALL because the access to the global 'x' might
change the globals table itself or the math table. The same
problem arises when reordering upvalue accesses or even accesses
to locals (pointed to by open upvalues).

Parallel assignment has similar issues. But here the situation is
worse, because 'local x,y = a,b' generates the same bytecode as
'local x = a; local y = b'. Whereas the former could be reordered
(if evaluation order was explicitly undefined), the latter may not
be reordered. Except that LuaJIT can't tell these cases apart at
the bytecode level. :-/


To summarize:

I'd like to have a definitive statement from the Lua authors
whether the Lua language as of version 5.x has a defined
evaluation and assignment order (with respect to side-effects) or
whether it is explicitly undefined. Also whether there is any
guarantee about the semantics of future versions.

The former would be nice for DDLs and DSLs (but only if the
assignment order is reversed). The latter would be nice for
optimizing compilers and offers more freedom for future
implementations.


OTOH assuming future implementations take the liberty to exploit
the 'undefined' aspect (e.g. LuaJIT reordering expressions with
side-effects), how much would break?

I guess that most developers coming from a C/C++ background have
been cautious about relying on a particular argument evaluation
order for function calls. Mainly because the C/C++ standards
explicitly say this is undefined. But it might be just the
opposite when you come from a Java background, which explicitly
allows you to rely on a strictly defined evaluation order.

I'm more worried about the table constructor which is heavily used
in DDLs and DSLs. It looks deceivingly like a declarative
statement, but it really isn't. I'm not so sure there are no
assumptions about the evaluation order in everyone's code ...

How often have you used: io.write(f(), f())  -or-  { f(), f() } ?

Any feedback is welcome!

Bye,
     Mike