next_inactive up previous


The Lua Virtual Machine

Zed A. Shaw

1. Introduction

1.1 How It All Started

I was interested one day in trying to write an interpreter for the R programming language that would be small and embedable. I needed this to improve the usability of a program I was writing called Obversive. As I started to write the parser, I found that I needed a decent virtual machine. I considered many different ones-Java, Python, Parrot, Lightning-but none of them were small enough and many were just too hard to work with. I knew that Lua had a really tight and simple virtual machine, and after finding a nice assembler called Luas (by DE) I figured I would use it.

As I worked with the LVM, I found that there needed to be a good description of the instructions and how it worked. That's where this document came from.

1.2 What You Should Learn

You should learn how to write assembler on the LVM and get introduced to using the LVM to write your own tiny embedded languages.

Usually you should just use Lua as the language, since it is already designed and works well.

Many times though Lua is not appropriate as a language or you may just like playing with it.

In that case, this document is the tool for you.

1.3 Why Lua?

Well, it's small and fast and easy to understand with a very versatile implementation. The source code is tiny and there's lots of support. Also it's completely open.

But, the major reason is that it works extremely well with C.

1.4 Book Organization

First we cover a crash course in Lua. You should probably read the Lua Manual if you've never used Lua.

Next we take a look at the design of the Lua Virtual Machine and how it is used to process Lua and possibly how to use it for other languages.

Then we get into using Luas and writing assembly language level programs for the LVM.

Finally, we use Luas and other tools to build a simple calculator.

At the end of the book is a complete set of descriptions for all of the LVM instructions.

1.5 Contacting The Author

You can reach me at zedshaw@zedshaw.com. No Spam Please.

2. The Lua Language

2.1 Introduction

Lua is a nifty little language that looks something like Pascal and friends. It is easy to work with, has a consistent syntax, and provides facilities for functional or object-oriented styles of programming.

2.2 For Language Nuts Only

The description I give of Lua is really targetted at language freaks like me. I don't really get into specifics or give many examples. I mostly just cover the different elements of the language in a very succinct way so you can get an idea of how the language is designed. See the Lua Manual for more detailed instructions.

2.3 Data

Data in Lua is rediculously simple yet still quite powerful. Lua has one data type: the Table. The table structure is basically similar to XML, SGML, BibText or other ``attributed trees'' where each node has attributes and children.

2.4 Control

Lua has the basic control structures of if, else, while, for, and foreach.

2.5 Functions

Functions are first-class in Lua and you can write functional style code without going blind from Lots of Infuriating Stupid Parenthesis, no matter what Scheme the parentheses are in. You can create full closures in the functions, declare them pretty much anywhere, and mix them with Lua's object-oriented facilities.

2.6 Objects

Objects in Lua are just Tables with functions attached to them through a special syntax. There is also a metatable mechanism for turning objects into full types, and operator overloading is available. Lua does OO in a way that is similar to Python where Objects are created from examples rather than defined classes as in C++ or Java.

2.7 Other Stuff

Lua also supports other things such as foreign function calls and garbage collection.

2.8 The Lua C API

Understanding the Lua C API helps understand the LVM since you need to work with the LVM's stack in pretty much the same way as you would with Lua's assembler instructions.

2.9 Conclusion

Understanding the Lua language helps when you try to learn the LVM's assembler op-codes. It also helps when you try to target a language at the LVM since you'll understand what might be possible or not possible. Since Lua has functional and object-oriented features, it should be possible to implement nearly every language currently thought of by people.

3. The Lua Virtual Machine

3.1 Introduction

3.2 Stacks Done ``Wrong''

3.3 Tables And Stacks

3.4 Closures And Stacks

3.5 Functions And Stacks

3.6 Overview Of LVM Instructions

3.7 Some Sample Lua to LVM Assembler

3.8 The LVM/C API Connection

3.9 Conclusion

4. The Luas LVM Assembler

4.1 Introduction

4.2 Getting and Installing Luas

4.3 Assembling and Disassembling

4.4 Luas Source Code Structure

4.5 The Instruction Set

4.6 Declaring Data

4.7 Implementing Functions & Closures

4.8 Working With Tables

4.9 Functional Programming

4.10 Object Orient Programming

4.11 When In Doubt, Ask Lua

4.12 Some Small Samples

4.13 Conclusion

5. Writing Programs with LVM and Luas

5.1 Introduction

5.2 A Simple Calculator

5.3 Initial Steps

5.4 Testing

5.5 Adding Input/Output Processing

5.6 Adding Conversion

5.7 Final Features

5.8 A Functional Approach

5.9 An Object-Oriented Approach

5.10 Conclusion

6. Implementing Alternative Languages

6.1 Introduction

6.2 The Tools: RE2C, Lemon, and c2lib

6.3 A Better Calculator

6.4 Implementing Scanning

6.5 Implementing Parsing

6.6 Implementing Syntax Analysis

6.7 Implementing Code Generation

6.8 Conclusion

7. Detailed Luas Instructions

7.1 Introduction

7.2 Functions

The Function instructions are involved in calling functions which were defined within the luas source. Depending on how the function is defined, you'll use each of the instructions. The most common used instructions are call and return, but closure, tailcall, getupval and setupval are all used in special situations.

closure $a n
Used to setup calls to functions defined within the current function.
call a b c
Calls the function in a with b registers on used on the stack and returning c registers on the stack (be sure to include the function in a in the number or registers given in b and c).
tailcall a b c
Works like a call, but rather than create a new stack, it just reuses the current function's stack. Used in functional programming for doing efficient functional processing of lists.
return a (b | $top)
Returns from a function with register a as the return start and keeping b registers, or $top to keep them all.
getupval $a b
Gets up-values for function calls.
setupval $a b
Sets up-values for function calls.

7.2.1 closure $a n

Places the function n (numbered from the top of the current function) into register a allowing you to call it. Basically, you start declaring .FUNCTION blocks at the top of your main function (or any function really). Each function is numbered starting at the top and going down to the last one inside your current function. Then, when you want to call it, you do:

closure $0 1

call 0 1 1

Which shows the second function (n=1) being put in register $0 and getting called with 1 stack and 1 return stack (just enough to hold the function on it).

The following Lua program:

function test(a)
print(a)
end

test("hello")

becomes this luas code:

.FUNCTION 0 2 0
.STRINGS

"test"

"hello"

.END_STRINGS

.FUNCTION 1 3 0
.STRINGS

"print"

.END_STRINGS

GETGLOBAL $1 @0 

MOVE $2 $0 

CALL 1 2 1 

RETURN 0 1 

.END_FUNCTION

CLOSURE 0 0 

SETGLOBAL $0 @0 

GETGLOBAL $0 @0 

LOADK $1 @1 

CALL 0 2 1 

RETURN 0 1 

.END_FUNCTION
You can see that the line CLOSURE 0 0 is putting the first function (n=0 or test) into register 0. At this point you could just do CALL 0 2 1 (call function in $0 with a stack of 2 to hold the function and one parameter, and return stack of size 1 to hold just the function). Instead, luac puts the function at $0 into the global spot @0, then gets it from @0 back into the $0 register (why does it do that?).

7.2.2 call a b c

calls the function in register a with b registers on the stack and expecting c registers on the stack when it returns, including the function itself. This means that, if you have a function in $0, and a parameter in $1, and you expect a single return value, then you would do call 0 2 2 to say that you want to "call the function in 0, with the function ($0) and a parameter ($1), and expect the function ($0) and a return value ($1)." That would match this Lua code:

function test(b)
b = b + 1

return b

end

a = 5

a = test(a)

Where test is loaded into $0, a loaded into $1, and the function is called with call 0 2 2 resulting in the function being in $0 and the return value (b) is left in $1. At the end of this code you would have a = 6.

See the example for closure for details on how it is used in practice.

7.2.3 tailcall a b c

This is part of Lua 5.0's new tailcall functionality. A tailcall is part of the functional programming way of doing things where you setup a list of stuff and then recursively call a function on it to process it. Normally, your function stack would start to grow insanely if you had a large list of data and you'd end up using double if not more memory to process the list. This because each element is processed with a function call with calls another function call with the next element and so on until the whole thing is processed. At the end of the stream of function calls, you have a huge stack of functions that are all waiting for a return value.

With tailcalls, the compiler recognizes that many times you are using such recursive function calls to process a list the same way you would with a regular for loop. Rather than allocate a new stack for each tailcall, the LVM just re-uses the stack from each call. This is possible only when the recursive function call is the last thing that happens before the return from the function, since there is no need to use the stack further and the next function can use it. Take this contrived example:

function tail_count(a, i, j)
if i == 0 then return j end

return tail_count(a, i - 1, j + 1)

end

function head_count(a, i, j)

if i == 0 then return j end

j = head_count(a, i - 1, j + 1)

return j

end

tc = tail_count(sample, table.getn(arg), 0)

hc = head_count(sample, table.getn(arg), 0)

print(tc)

print(hc)

When you call this with:

lua luac.out 1 2 3 4 5 6 7
It prints out:

7

7

Which shows that both tail_count and head_count do the same thing. The difference is that in tail_count I have:

return tail_count(a, i - 1, j+1)
But in head_count I have:

j = head_count(a, i - 1, j+1)

return j

What happens is that Lua detects that tail_count is calling tail_count right before a return, and figures out it can do a tail call to re-use the stack space-effectively calling tail_count in place. This means that tail_count can work on tables of any size without increasing the memory more than what is needed for one call to tail_call. This basically simulates what you'd get from a while or forloop. The assembler for tail_count is then:

.FUNCTION 3 7 0
.NUMBERS

0

.END_NUMBERS

.STRINGS

"tail_count"

.END_STRINGS

.NUMBERS

1

.END_NUMBERS

EQ $1 @0 False

JMP 1 

RETURN 2 2 

GETGLOBAL $3 @1 

MOVE $4 $0 

SUB $5 $1 @2 

ADD $6 $2 @2 

TAILCALL 3 4 0 

RETURN 3 0 

RETURN 0 1 

.END_FUNCTION
With head_count things are very different as the result of recursively calling head_count is assigned to j, so Lua must keep the stack around for each recursive call in order to maintain j. At the end of all the head_count recursive calls, lua basically just does a bunch of returns and adds, but during all the calls it has to keep them in memory on the stack so that it doesn't lose the value of j at each step. Here's the assembler for head_count:

.FUNCTION 3 7 0
.NUMBERS

0

.END_NUMBERS

.STRINGS

"head_count"

.END_STRINGS

.NUMBERS

1

.END_NUMBERS

EQ $1 @0 False

JMP 1 

RETURN 2 2 

GETGLOBAL $3 @1 

MOVE $4 $0 

SUB $5 $1 @2 

ADD $6 $2 @2 

CALL 3 4 2 

MOVE $2 $3 

RETURN 2 2 

RETURN 0 1 

.END_FUNCTION
You can see that, rather than TAILCALL, Lua uses a regular call, which will create another stack and start at the top of this function again.

The interesting thing about looking at the assembly language for this you can sort of see why a tail call works. If we treat the of the function as the top of a forloop, and the CALL or TAILCALL instruction as FORLOOP instruction at the bottom, then you can see the parallel between the two in this case: CALL is basically just an inneficient way to jump to the top of the function body and loop again since a new stack is needed each time to avoid losing the value of j through each pass. With TAILCALL, the function is re-run, but a new stack is not created since j is not needed during the body of the call.

Now, that's how tailcall works, but it's not clear how it is used. I believe you use it just like you would the call instruction, but that it re-uses the current function's stack space.

7.2.4 return a (b | $top)

This is how you end a function and it must be present. It returns from the function with return values starting in register a with b registers (including the function) in the return. So, if you have no return values, then b=1. If you have 1 return value, then b=2, and so on. Remember that you need to include the function in the return count (b) otherwise things will get bad (i.e. b should not equal 0).

You can also say $top which will return everything from 0 to the top of the stack. I imagine this is intended for having variable return values, but the problem with this is that you must declare the function's stack size when you create the function. This means that you cannot dynamically allocate a random size stack and return it, even though you have the $top notation. It is proably just a convenience notation for saying ``from a to the end of the stack that this function declared it needed.''

7.2.5 getupval $a b

I don't really understand this one, and I can't get luac to generate any code for it. It is supposed to "move the upbalue at position b into register a (b is the position in the upvalues list; it can also be the upvalue name, as it has been declared in upvalues list).

7.2.6 setupval $a b

Again, another instruction I don't fully understand and can't get luac to generate. It is supposed to "move value in the register a into the upvalue at position b (b is the position in the upvalues list; it can also be the upvalue name, as it has been declared in upvalues list).

7.3 Stack/Register Manipulation

These instructions manipulate the different elements currently on the stack. Elements of the stack are sometimes (confusingly) called registers when they are used in instructions. These instructions also work very closely with the Table Manipulation instructions to move data between tables and the stack.

move $a $b
Moves the contents of register $b into register $a.
loadk $a b
Loads a constant referred to by constant index b into register $a.
loadbool $a b c
Loads a boolean value into $a from b and skips the next instruction depending on how c is set.
loadnil a b
Loads the registers indexed from a to b with nil.
getglobal $a b
Loads register $a with the value of the global named after constant b.
setglobal $a b
Sets the global referred to by constant b with the value of register $a.
concat $a b c
Concatenates the registers on the stack from b to c and puts the results in register $a.

7.3.1 move $a $b

This moves whatever is in b to a and a or b can be any combination of a register, or local variable (which are really just registers). Remember that the grammar for move instructions in assembler are always backwards, you don't read them as "move from $a to $b" but backwards as "move from $b to $a". It is actually more like saying "fill $a with $b".

7.3.2 loadk $a b

Loads a constant referred to by b into the register $a. The b constant can be a global reference (@0, @1, etc.) or a constant typed directly like 234234 and "hello".

The Lua code:

a = 1234

a = "asdfasdf"

Becomes the following luas code:

.FUNCTION 0 2 0
.STRINGS

"a"

.END_STRINGS

.NUMBERS

1234

.END_NUMBERS

.STRINGS

"asdfasdf"

.END_STRINGS

LOADK $0 @1 

SETGLOBAL $0 @0 

LOADK $0 @2 

SETGLOBAL $0 @0 

RETURN 0 1 

.END_FUNCTION
This code basically loads 1234 (@1) into $0, then sets $0 into the global table into @0 ("a") thus effectively setting "a"=1234. It then does it for the "asdfasdf" (@2) variable and loads it into the global table at @0 setting variable "a".

7.3.3 loadbool $a b c

This moves the boolean value b into the register $a where b can be 1, 0, True or False. c is also a boolean that controls whether the next instruction is skipped or not. If c is true, then the next instruction is skipped, otherwise the next instruction is run. I'm not too sure what this instruction's purpose is and I can't seem to get luac to generate any code with loadbool in it.

7.3.4 loadnil a b

This loads the registers from number a to number b with nil. This is a way to clear the stack of results when it is necessary. The a and b parameters that specify a range of registers (from a to b).

7.3.5 getglobal $a b

This moves the content of global value b into $a. The b parameter can be the name of a global variable (like "print") or a @0 address in the constant table. You typically use this to get access to functions and packages that are loaded into the LVM for you to use like print. Here's an example that gets the print function and calls it with "hello":

getglobal $0 "print"

loadk $1 "hello"

call 0 2 1

7.3.6 setglobal $a b

This moves the content of register $a into global variable indicated by b which can be an address in the content table (@0, @1, ...) or a constant string indicating the name of the global ("print", "a"). Remember that this is not like the move instruction in that the value goes from a to b and not the other way around.

7.3.7 concat $a b c

Takes the registers in the range from b to c and concatenates them into register $a. Not really sure what can be in the b-c registers, and whether you can put anything in it, or just strings, or what.

7.4 Table Manipulation

Table manipulation instructions are used to work with data that is held in Lua Tables (pretty much the only datastructure used in Lua). The instructions work with the Stack/Register Instructions to allow the program to get data out of the tables, manipulate it on the stack, and then put data back into the registers.

newtable $a [b c]
Creates a table in register $a that is size b,c or 1,0 if [b c] is not given.
gettable $a $b c
Puts the value of position c in table $b into register $a. The c operand can be a constant or register with a position index in it.
settable $a $b c
Moves the contents of register $a to the table in $b at position c.
setlist a b
Sets the table in a with the values of registers a through c.
setlist0 a b
This seems to be exactly the same as setlist.

7.4.1 newtable $a [b c]

Creates a table in register $a of size b,c or 1,0 if [b c] is not given (b c is optional). This is used along with gettable and settable to manipulate the native table data structure within Lua. All manipulations of the Lua tables are done by moving things into and out of the table via registers.

7.4.2 gettable $a $b c

Put the value of position c in table $b into register $a, where c can be a register with a position or a constant from the constant table. This function is how you would get things out of a table, and you would also use it to navigate sub-tables. To go into a sub-table, you would gettable it from the super-table, and then use that register to manipulate the sub-table.

7.4.3 settable $a $b c

This is the analog to gettable and it moves the contents of register $a to the table in $b at position c.

The following Lua code:

t = {}

t["hello"] = 1

a = t["hello"]

Translates into this luas code:

.FUNCTION 0 2 0
.STRINGS

"t"

"hello"

.END_STRINGS

.NUMBERS

1

.END_NUMBERS

.STRINGS

"a"

.END_STRINGS

NEWTABLE $0 0 0 

SETGLOBAL $0 @0 

GETGLOBAL $0 @0 

SETTABLE $0 @1 @2 

GETGLOBAL $0 @0 

GETTABLE $0 $0 @1 

SETGLOBAL $0 @3 

RETURN 0 1 

.END_FUNCTION
The first thing that this code does is create a table in the $0 register and then set it to the global variable @0 ("t"). It then gets the table from @0 and puts it back into $0 (what a waste as it's already in $0) and then uses the settable instruction to set the @1 ("hello") index to the @2 (1) value.

It then gets the table at @0 (again!) and moves the contents of the $0 table, position @1 ("hello") into register $0. Then it takes register $0 (which holds t["hello"]) into global @3 ("a").

7.4.4 setlist a n

This sets the table in $a with values of registers from a+1 to a+n. This call is rather confusing since it is not clear what the n parameter really means. The purpose of this instruction seems to be to fill in a table with a bunch of register values really quick, and it seems like this instruction and setlist0 are the same thing (at least, they are the same switch case in lvm.c). Maybe someone else could explain this better.

7.4.5 setlist0 a n

This also sets the table in $a with values of registers from a+1 to a+n. It looks like it is just the same as setlist above.

7.5 Math

The math instructions are responsible for basic arithmetic in the LVM (add, divide, etc.). They are pretty straight forward to work with, and their descriptions are so short that I've chosen to skip the overview.

7.5.1 add $a $b c

Add the register $b and register $c or constant table @c and put the results in $a. This is pretty straight-forward stuff and all of the other arithmetic operators work the same way.

7.5.2 sub $a $b c

This works just like add and do what their names say: subtract, multiply, divide, power function.

7.5.3 mul $a $b c

This works just like add and do what their names say: subtract, multiply, divide, power function.

7.5.4 div $a $b c

This works just like add and do what their names say: subtract, multiply, divide, power function.

7.5.5 pow $a $b c

This works just like add and do what their names say: subtract, multiply, divide, power function.

7.5.6 unm $a $b

Basically puts the negative value of $a into $b.

7.5.7 not $a $b $c

Takes the negation of $c and $b and puts the result in $a. The instructions in syntax.html might have an error since it has only not $a $b in the instructions but mentions a c variable.

7.6 Comparison

Comparison instructions work closely with the Flow Control instructions to control branching and looping in the program. They are pretty easy to use, but become slightly complicated when combined with jmp, forloop, tforloop, and others to manipulate the program's instruction flow. Their descriptions are very short, so I've chosen to skip the overview.

7.6.1 eq $a b

Tests whether register $a is equal to whatever is in b which can be a register ($b), a constant address (@b), a numeric constant (1234), or a string ("hello"). If the two are not equal, then it skips the next instruction which is usually a jmp or some other movement instruction. It would be used in an if statement or such.

7.6.2 lt $a b

Works the same as eq, but means less than rather than equal.

7.6.3 gt $a b

Works the same as eq, but means less than rather than equal.

7.6.4 le $a b

Works the same as eq, but means less than rather than equal.

7.6.5 ge $a b

Works the same as eq, but means less than rather than equal.

7.6.6 test $a $b c

The instructions in syntax.html is ``test result is the value of register b and if it is true, register a receives the value of register b.'' Which is kind of confusing, and doesn't even mention the c parameter. After looking at how it is used in the test .las files, and reading the source to the LVM, it looks like the usage is: ``Test compares $b to constant c, and if they are not equal then $a is assigned the value of $b, otherwise the next instruction is skipped (usually a jmp).'' In actuality, it gets a bit more complex than even that. After testing the instruction, I found that this is still not true. The comment for the instruction in lopcodes.h says:

``A B C if (R(B) <=> C) then R(A) := R(B) else pc++''
But I did several tests with b and c set to different values and did not get this as a result. If the above description is true, then I should have a truth table like so:

  b=1 b=0
c=1

I wrote a small test program to verify this, it and produced the following truth table:

  b=1 b=0
c=1

As you can see, this is not working as described since, if $a=$b when b!=c, then condition #1 and #3 would both be $a=$b and #2 and #4 would both be pc++. I haven't delved deeper into the code, but it looks like this instruction is all over the map with the comparison and the documentation so far does not make it clear.

7.7 Flow Control

Flow Control is how you get different parts of a Lua program to execute depending on the results of comparisons. The flow control instructions available in the LVM are limited to jumping (jmp), and looping (either over a range of integers or over a range of table elements).

jmp (label | delta)
Goes to the instruction indicated by a label or by a certain delta distance forward/backward.
forloop a (label | delta)
Loops over a section of code based on the state of three registers starting at a. The loop will go to either a label or delta instructions forward/backward.
tforprep a (label | delta)
Sets up for a tforloop in preparation for looping over the elements of a table.
tforloop a (label | delta)
For loop that is used to iterate through the elements of a table.

7.7.1 jmp (label | delta)

This instruction is used with test, eq, le, lt, ge, and gt to perform branching (like if in Lua). It typically follows one of the above mentioned instructions and indicates a location to go to either by a label or by a certain number of instructions to skip (delta).

7.7.2 forloop a (label | delta)

The forloop instruction is used to loop through a series of indexes indicated by three parameters starting at register a. The registers should be setup as $a holds the index, $(a+1) is the max for the index, and $(a+2) is a step. The label or delta component is the beginning of the forloop and usually goes backward to a point where the body of the forloop is in the instruction set. It is common for these kinds of loops to be created "inside out", where you setup the registers for the loop, jump to the instruction with the forloop, and then the forloop jumps backwards to the instruction right under the top jump. This is because, when the forloop reaches its limit, it simply continues on to the next instruction. If you had the forloop instruction at the top of the body, then the body would run one last time which would probably be incorrect. Here's an example to make it concrete:

.FUNCTION 0 7 0
; setup the for loop to start at 0, go to 10, step by 1

loadk $0 0

loadk $1 10

loadk $2 1

; must jump to the forloop instruction

jmp 5

; get print to print out index and stuff

getglobal $3 "print"

move $4 $0 ; put the current for loop index in $4

move $5 $1 ; and the max in $5

move $6 $2 ; and the step in $6

call 3 4 1 ; print index max step

; jump back 6 instructions to the "getglobal" and iterate

forloop 0 -6 

; forloop is done we now continue

return 0 1

.END_FUNCTION
When this luas program is run, it will print three columns of numbers: index max and step or $0 $1 $2. You can see that I did this loop inside-out by:

  1. Using loadk to setup $0=0 (the start), $1=10 (the max) and $2=1 (the step).
  2. Then jmp 5 jumps to the forloop down at the bottom, skipping over the set of instructions (you could have also used a label).
  3. The forloop 0 -6 says that I want $0 to be index register, $1 to be the max, and $2 to be the step (like we setup as described in #1). The -6 part says that, if the index < the max ($0 < $1), then jump backwards 6 instructions (which puts us at the "getglobal" line right after "jmp 5").
  4. Since I jumped to this forloop, none of the other instructions ran so the forloop can now set everything up and jump to the getglobal line. At that line (getglobal $3 "print"), I load the print function, move the $0,$1,$2 registers into $4,$5,$6 and call print. I do this so I can print out the index ($0), max ($1), and step ($2), without disturbing the registers that the forloop is using.
  5. When the call instruction runs, the forloop instruction is hit. It increments $0, checks whether to continue, and if so jumps back -6 to the getglobal again (which we could probably eliminate really).
  6. After the forloop line is hit enough that $0>$1, the forloop instruction "drop out" or stops running and the next instruction is run, which is just a return.
Now, what happens if I wanted to write this forloop in a way that made more sense for a forloop: with the forloop instruction at the top and the body under-neath. It turns out that it's possible, but gets a little messy since you need to duplicate the test that the forloop is already doing and to check whether you should jump back to the forloop or continue. It's such a mess that you might as well simulate the forloop yourself using register incrementing, jumping, and eq instructions. By putting the forloop at the bottom, you'll avoid duplicating the forloop's work and things will be nice and clean, if maybe a little confusing at first.

7.7.3 tforprep a (label | delta)

This instruction is used to start of a table for loop, which is "for i in blah..." in Lua. It is similar to forloop, but the actual looping is broken into tforprep and tforloop. The a parameter indicates the table ($a), the current index ($a+1), the key ($a+2), the value ($a+3). The label or delta are where the companion tforloop instruction reside. This is basically like combining the jmp and loadk instructions from the previous one, the difference being that you must load a table into $a. See tforloop for an example.

7.7.4 tforloop a (label | delta)

This works with tforprep to loop over a table and get all of the key=value pairs. You create loops of this kind the same way you do forloops (inside-out), but you use a tforprep to set them up rather than a jmp instruction. The a register coincides with the same registers as the tforprep instruction: $a is the table, $(a+1) is the internal index, $(a+2) is the key, and $(a+3) is the value. The weird thing is that this doesn't behave the way you would expect. Rather, the tforprep takes the arguments and sets up the following situation:

$a = lua_next (a C API function)

$(a+1) = the table

$(a+2) = the key

The lua_next function takes two parameters: the table and the current key and it returns the key and value. So, after the first call the elements are:

$a = lua_next (a C API function)

$(a+1) = the table

$(a+2) = the key

$(a+3) = the value

The tforprep then jumps to the tforloop instruction (that's what the second parameter is for) which then calls the next function with something like:

call $a 3 4
The next function is at $a, with $(a+1) and $(a+2) (table,key) as the parameters, and then next returns table,key,value in the stack again. Finally, tforloop either goes to the next instruction (which is usually a jmp back to the top of the loop body) or it skips the next instruction if (a+2) == nil (if the key is nil). The lopcodes.h file is actually wrong as it says:

``OP_TFORLOOP,/* A C R(A+2), ... ,R(A+2+C) := R(A)(R(A+1), R(A+2));

if R(A+2) = nil then pc++ */''
Which would mean that when $(a+2) doesn't equal nil, then the next instruction is skipped. This also says that the C constant is so you can have a variable range of registers (the R(A+2+C) part), but if tforprep can't handle these, then what's the point of that? I haven't been able to figure out why this is allowed.

Here's an example that puts it all together:

.FUNCTION 0 9 0
; get the get args function and load the command line args as a table

getglobal $0 "arg" ; get the command line arguments

; $0 now has the arguments table, we can loop through it and print it

tforprep 0 6 ; $0=table, $1=index, $2=key, $3=value

getglobal $4 "print" ; setup to print each value

move $5 $0

move $6 $1

move $7 $2

move $8 $3

call 4 5 1 ; print them out

tforloop 0 1 ; continue the loop

jmp -8

return 0 1

.END_FUNCTION
This program basically gets the "arg" variable which has the command line arguments. It then sets up a forloop to process them with tforprep 0 6 (the 0 means register 0 and the 6 means jump 6 ahead to the tforloop instruction). Between the tforprep and tforloop we have a simple "print out the registers" block of code. The tforloop 0 1 then says to run what tforprep setup and then run the next instruction. The next instruction is a jmp that goes back to the top of the loop (not the tforprep though). If you were to do "jmp -9", you would jump to the tforprep, which would set $(a+2)=nil, and then jump to tforloop, which would catch this and skip the next instruction-effectively doing nothing. Finally, when the key ($(a+2)) is nil, the tforloop drops out of the loop by skipping the next instruction (the jmp -8). That ends the program.

When you run this program with some arguments (let's say "hello this is some stuff to print"), then you get this output:

function: 0x673d0 table: 0x6a920 1 hello

function: 0x673d0 table: 0x6a920 2 this

function: 0x673d0 table: 0x6a920 3 is

function: 0x673d0 table: 0x6a920 4 some

function: 0x673d0 table: 0x6a920 5 stuff

function: 0x673d0 table: 0x6a920 6 to

function: 0x673d0 table: 0x6a920 7 print

function: 0x673d0 table: 0x6a920 -1 ../lua-5.0/bin/lua

function: 0x673d0 table: 0x6a920 0 mytest.las.luac

function: 0x673d0 table: 0x6a920 n 7

Which shows tforprep moving things around so that $a=lua_next(), $(a+1)=arg, $(a+2)=key, and $(a+3)=value (after the next call).

About this document ...

The Lua Virtual Machine

This document was generated using the LaTeX2HTML translator Version 2002 (1.62)

Copyright © 1993, 1994, 1995, 1996, Nikos Drakos, Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999, Ross Moore, Mathematics Department, Macquarie University, Sydney.

The command line arguments were:
latex2html -no_subdir -split 0 -show_section_numbers /tmp/lyx_tmpdir557gv50Dq/lyx_tmpbuf0/luas_lvm_instructions.tex

The translation was initiated by Zed Shaw on 2003-06-09


next_inactive up previous
Zed Shaw 2003-06-09