lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Hi,

Christian Müller wrote:
> 1. I read that it is a register vm. How do you solve argument passing?

The Lua 5 VM employs a sliding register window on top of a stack. Frames
(named CallInfo aka 'ci' in the source) occupy different (overlapping)
ranges on the stack. Successive frames are positioned exactly over the
passed arguments (luaD_precall). The compiler ensures that there are no
live variables after the arguments for a call. Return values need to be
copied down (with truncate/extend) to the slot holding the function object
(luaD_poscall). This is because the compiler has no idea how many values
another function may return -- only how many need to be stored.


Example:

function f2(arg1, arg2, ..., argN)
  local local1, local2, ...
  ...
  return ret1, ret2, ..., retO
end

function f1(arg1, arg2, ..., argM)
  local local1, local2, ...
  ...
  local ret1, ret2, ..., retP = f2(arg1, arg2, ..., argN)
  ...
end


Simplified stack diagram:

stack
  |
  | time: >>>> call >>>>>>>>>>>>>>>>>> call ~~~~~~~~~~~~~~~~~~ return >>>>>
  |
  |   ciX.func-> f1      f1      f1                                 f1
  |   ciX.base-> arg1    arg1    arg1                               arg1
  |              arg2    arg2    arg2                               arg2
  |              ...     ...     ...                                ...
  |              argM    argM    argM                               argM
  |   ciX.topC->         local1  local1                             local11
  |                      local2  local2                             local2
  |                      local3  local3                             local3
  |                      ...     ...                                ...
  |                              f2      ciY.func-> f2      f2      ret1
  |                              arg1    ciY.base-> arg1    arg1    ret2
  |                              arg2               arg2    arg2    ...
  |                              ...                ...     ...     retP
  |                              argN               argN    argN
  |   ciX.topL-> ------  ------  ------  ciY.topC-> local1  local1
  |                                                 local2  local2
  |                                                 ...     ...
  |                                                         ret1
  |                                                         ret2
  |                                                         ...
  |                                                         retO
  |                                      ciY.topL-> ------  ------
  V


Note that there is only a single 'top' for each frame:

For Lua functions the top (tagged topL in the diagram) is set to the base
plus the maximum number of slots used. The compiler knows this and stores
it in the function prototype. The top pointer is used only temporarily
for handling variable length argument and return value lists.

For C functions the top (tagged topC in the diagram) is initially set to
the base plus the number of passed arguments. C functions can access their
part of the stack via Lua API calls which in turn change the stack top.
C functions return an integer that indicates the number of return values
relative to the stack top.

In reality things are a bit more complex due to overlapped locals, block
scopes, varargs, coroutines and a few other things. But this should get
you the basic idea.

> Do you deviate from the typical register based architecture in 
> that case to save memory traffic?

I think the architecture is pretty unique as far as VMs go. Some CPUs
have sliding register windows, but this gets quite complicated since they
need to spill/fill registers to/from the stack. A VM can of course use an
unbounded (reallocated) stack on the heap.

> 2. As far as I learned you do instruction encoding close to hardware 
> architectures. Therefore you always have to decode the opcode in contrast to 
> the JVM where opcode and arguments are stored in several independent bytes. 
> Is opcode decoding cheap (one might forgive my poor knowledge of C operator 
> performance;-)?

All instructions are 32 bit. The current layout as of Lua 5.1work4 is:

BBBBBBBB BCCCCCCC CCAAAAAA AAOOOOOO   ABC format
BBBBBBBB BBBBBBBB BBAAAAAA AAOOOOOO   ABx format
sBBBBBBB BBBBBBBB BBAAAAAA AAOOOOOO   AsBx format

Fetching a 32 bit value once from memory and then extracting the bits to
other registers is cheaper than doing single-byte fetches for variable
length operands. Byte alignment does not matter at all (word alignment does).

Memory bandwith is usually not an issue for VM instructions since there
is so much else going on for each instruction. It's much more important
to keep the execution units busy by avoiding interlocks caused by memory
fetches. Tuning the code to make it easy for the compiler to generate
good code is another issue (the Lua authors have done quite a bit of
tuning in some important spots).

Bye,
     Mike