[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Questions about opcodes used in LPEG's vm implementation
- From: lua.greatwolf@...
- Date: Fri, 13 Sep 2013 14:01:19 -0700
Hi all,
I'm attempting to add JIT functionality(Will be using DynASM to help with this) to the reference
LPeg 0.12 implementation but don't really have any prior experience doing this so not sure how far
I'll get on this project. Maybe I'll learn something at the end.
As a starting point I'm currently studying the implementation for vm parsing machine in lpvm.c.
Additionally, I'm using the published peg documentation here by Roberto:
http://www.inf.puc-rio.br/~roberto/docs/peg.pdf
Anyways, onto my questions.
I'm looking at the 'ISet' and 'ITestSet' opcodes but I'm not really understanding how it checks
whether the current character is part of the set or not. Here's a code except for 'ISet':
case ISet: {
int c = (byte)*s;
if (testchar((p+1)->buff, c) && s < e)
{ p += CHARSETINSTSIZE; s++; }
"testchar" is a macro defined in lptypes.h:143 as:
#define testchar(st,c) (((int)(st)[((c) >> 3)] & (1 << ((c) & 7))))
Now according to the peg.pdf:
... Sets are represented as bit sets, with one bit for
each possible value of a character. Each such instruction
uses 256 extra bits, or 16 bytes, to represent its set.
There's probably a mistake in there because 16 bytes is actually 128-bits. Also LPeg had a rewrite
since this doc was published so I don't know if all the info here is still accurate.
So my question is, when there's a "ISet", "ITestSet" or possibly "ISpan" opcode in the compiled
pattern, what would the opcode listing for that actually look like? Is the set of characters being
checked for also encoded into this listing -- eg. similar to a line offset? If so, how many bytes
would that take?
Here's the union definition for opcode instructions in lpvm.h:
typedef union Instruction {
struct Inst {
byte code;
byte aux;
short key;
} i;
int offset;
byte buff[1];
} Instruction;
So from this I can infer that each opcode instruction entry in the listing can be either i(an actual
opcode instruction), an offset(for the opcode above it) or a byte buffer with 1 element in it.
The two fields I'm unclear about is the purpose of "key" and "buff". "key" doesn't seem to be used
during the interpreting phase so I'm guessing maybe it's used during pattern construction? In which
case I can probably ignore that?
Now the "buff" field is being used above in "testchar". Why is buff an array of byte with one
element in it? Even if say this decayed into a pointer so it takes 4 bytes I still can't see how it
fits in. Let say the pattern had a really big set. How would that translate into the opcode and what
would the listing look like?
Can anyone shed some light and help clear this up?
Thanks!