lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Greetings everyone, it has been a while since I have posted, though I have
been reading and keeping up with the list. Unfortunately, due to family
medical issues and other work projects I have not had anywhere near as much
time to work on Lunia as I would of liked. The good news is that I have
recently had some time to work on Lunia and I almost have another patch
available, which will addresses one of the limitations of my previous
switch/case patch, constants.

I have created a full preprocessor for Lunia which supports both stand-alone
file importing, and importing via require, along with constants,
enumerations (standard and bitfield), macros and support for inline
"functions". Macros are fully nestable and work much like their C
counterparts, though there are some differences. Also with the preprocessor
you may @define names and optionally give them values, these names may then
be used in @if/@elseif/@else statements to conditionally compile source
code. Additionally, instead of creating bytecode the preprocessor can
instead save out the processed source code (in tidy or compacted form),
which can then be compiled in vanilla Lua.

I have mentioned the Token Storage system before, but to quickly recap; it
allows you to parse the current token stream into a buffer and then play
back the buffers (which live on a separate token stack) at any point. This
is the basis for the CEMI (constants, enumerations, macros and inline)
patch, and works very well so far.

I am also in the process of adding a new data type to Lunia, the Index,
which is essentially a struct-like data type. Preliminary results are very
good with the performance of the patch. I will have more information, and a
patch which includes both the Index data type along with the Token Storage
system and Preprocessor patches soon.

However, I wanted to ask some advice from those "in the know" about the
layout of items in the Lua bytecode, especially since I have expanded the
bytecode from 32- to 64-bits to accommodate the Index data type.

The current bytecode layout for vanilla Lua is as follows:

|  Mode  | Vanilla Lua 32-bit Instruction |
|  iABC  |   B    |   C    |   A   |  OP  |
|        |   9b   |   9b   |   8b  |  6b  |
|  iABx  |       Bx        |   A   |  OP  |
|  iAsBx |       18b       |   8b  |  6b  |
|  iAx   |           Ax            |  OP  |
|        |           26b           |  6b  |

For my 64-bit'isation of the bytecode I have expanded it as follows:

|  Mode  |         High 32-bits    ~    Lunia    ~    Low 32-bits          |
|  iABC  |    A    |     B     |    C     |   Ci   |  Bi   |   Ai   |  OP  |
|        |   10b   |    11b    |   11b    |   8b   |  8b   |   8b   |  8b  |
|  iABx  |    A    |  NA  |    Bx (Hi)    ~     Bx (Low)   |   Ai   |  OP  |
|        |   10b   |  6b  |     16b     (32b)       16b    |   8b   |  8b  |
| iABCx  |    A    |     B     | Cx (Hi)  ~     Cx (Low)   |   Ai   |  OP  |
| iABsCx |   10b   |    11b    |   11b  (27b)       16b    |   8b   |  8b  |

As you can see the Opcode field has been expanded from 6 bits to 8 bits to
accommodate more than 64 opcodes. 256 opcodes may be overkill, though I was
thinking I could make the opcode field 7 bits which could leave the last bit
as some sort of flag bit.

Additionally, the ABC registers have all been expanded by 2 bits, thus
quadrupling the original range of values available. To support the Index
data type I have also added three new index fields, Ai Bi Ci, which I will
explain in detail below. I have also changed the 4 available opcode modes;
iAx is no longer required as iABx replaces it, and I moved the iABx/iAsBx
into iABCx and iABsCx.

The Index data type works by defining a @struct (preprocessor directive)
which is essentially a list of variables and optionally their data type...
oh, I forgot to mention, my next patch also includes support for optional
type locking. I modified the parser so it is now aware of the items on
the left hand side of an assignment operation, this was required so the
parser knows when an Index data type is being used.

Once an Index is defined the individual elements may be accessed using the
standard dot notation, new indexes may be created with a constructor. One
difference vs the table constructor is the keys must be a name present in
the definition of the struct followed by a colon, then the value to assign.

@struct Player
  uid as Integer,
  name as String,
  level as Integer,
  xpos as Float,
  ypos as Float,
  zpos as Float

local p as Player = { name: "Tess T'ing", level: 3 }
p.uid = 12345
p.xpos, p.ypos, p.zpos = 1.23, 4.56, 7.89

When compiled the Ai field would be set to '1' for "p.uid = 12345", which
means that the value '12345' would be set to the first element of the Index
'p', instead of simply in the local variable 'p'. Same for the xpos, ypos
and zpos, for which the Ai field would be set to 4, 5 and 6 respectively.

The Bi/Ci fields are for indexing the B/C values, however, if the opcode
mode is not iABC then only Ai will be available as Bi/Ci are instead used to
extend the B/C field values.

Additionally, Index data types may be nested within each other, and there is
a pseudo-index system which allows for accessing data 4 levels deep in a
single instruction. If data needs to be accessed further the last Index will
be extracted to a register and indexing will continue, including
pseudo-indexing if needed. While this optimisation provides a relatively 
insignificant speed increase, it can dramatically reduce the number of
opcodes, and thus file size, required to access data in nested Indexes.

My request for advice relates to the order of the fields in the expanded
64-bit Instruction. I recall reading somewhere about Lua changing the order
from ABC to ACB for performance reasons, though for the life of me I have
been unable to find that reference. I am hoping someone else remembers where
that was, or some way to find it... I have tried a number of searches on the
list and have come up empty handed.

I understand the performance boost from changing ABC to ACB has to do with 
hardware level optimisations, but that is not a level of optimisation I am
terribly familiar with. I am wondering how changing from ABC to ACB results
in a performance gain, and given that it does, is there any way I can
arrange my 64-bit opcode to be more efficient then my first attempt is?

If so, how can I improve it? Hopefully with an explanation! :)