lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Hi all,

I'm attempting to add JIT functionality(Will be using DynASM to help with this) to the reference LPeg 0.12 implementation but don't really have any prior experience doing this so not sure how far I'll get on this project. Maybe I'll learn something at the end.

As a starting point I'm currently studying the implementation for vm parsing machine in lpvm.c. Additionally, I'm using the published peg documentation here by Roberto:

Anyways, onto my questions.

I'm looking at the 'ISet' and 'ITestSet' opcodes but I'm not really understanding how it checks whether the current character is part of the set or not. Here's a code except for 'ISet':

      case ISet: {
        int c = (byte)*s;
        if (testchar((p+1)->buff, c) && s < e)
          { p += CHARSETINSTSIZE; s++; }

"testchar" is a macro defined in lptypes.h:143 as:

    #define testchar(st,c)    (((int)(st)[((c) >> 3)] & (1 << ((c) & 7))))

Now according to the peg.pdf:

... Sets are represented as bit sets, with one bit for each possible value of a character. Each such instruction
uses 256 extra bits, or 16 bytes, to represent its set.

There's probably a mistake in there because 16 bytes is actually 128-bits. Also LPeg had a rewrite since this doc was published so I don't know if all the info here is still accurate.

So my question is, when there's a "ISet", "ITestSet" or possibly "ISpan" opcode in the compiled pattern, what would the opcode listing for that actually look like? Is the set of characters being checked for also encoded into this listing -- eg. similar to a line offset? If so, how many bytes would that take?

Here's the union definition for opcode instructions in lpvm.h:

  typedef union Instruction {
    struct Inst {
      byte code;
      byte aux;
      short key;
    } i;
    int offset;
    byte buff[1];
  } Instruction;

So from this I can infer that each opcode instruction entry in the listing can be either i(an actual opcode instruction), an offset(for the opcode above it) or a byte buffer with 1 element in it.

The two fields I'm unclear about is the purpose of "key" and "buff". "key" doesn't seem to be used during the interpreting phase so I'm guessing maybe it's used during pattern construction? In which case I can probably ignore that?

Now the "buff" field is being used above in "testchar". Why is buff an array of byte with one element in it? Even if say this decayed into a pointer so it takes 4 bytes I still can't see how it fits in. Let say the pattern had a really big set. How would that translate into the opcode and what would the listing look like?

Can anyone shed some light and help clear this up?