Questions about opcodes used in LPEG's vm implementation

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Questions about opcodes used in LPEG's vm implementation
From: lua.greatwolf@...
Date: Fri, 13 Sep 2013 14:01:19 -0700

Hi all,

I'm attempting to add JIT functionality(Will be using DynASM to help with this) to the referenceLPeg 0.12 implementation but don't really have any prior experience doing this so not sure how farI'll get on this project. Maybe I'll learn something at the end.

As a starting point I'm currently studying the implementation for vm parsing machine in lpvm.c.Additionally, I'm using the published peg documentation here by Roberto:


http://www.inf.puc-rio.br/~roberto/docs/peg.pdf

Anyways, onto my questions.

I'm looking at the 'ISet' and 'ITestSet' opcodes but I'm not really understanding how it checkswhether the current character is part of the set or not. Here's a code except for 'ISet':


      case ISet: {
        int c = (byte)*s;
        if (testchar((p+1)->buff, c) && s < e)
          { p += CHARSETINSTSIZE; s++; }

"testchar" is a macro defined in lptypes.h:143 as:

    #define testchar(st,c)    (((int)(st)[((c) >> 3)] & (1 << ((c) & 7))))

Now according to the peg.pdf:

... Sets are represented as bit sets, with one bit foreach possible value of a character. Each such instruction
uses 256 extra bits, or 16 bytes, to represent its set.

There's probably a mistake in there because 16 bytes is actually 128-bits. Also LPeg had a rewritesince this doc was published so I don't know if all the info here is still accurate.

So my question is, when there's a "ISet", "ITestSet" or possibly "ISpan" opcode in the compiledpattern, what would the opcode listing for that actually look like? Is the set of characters beingchecked for also encoded into this listing -- eg. similar to a line offset? If so, how many byteswould that take?


Here's the union definition for opcode instructions in lpvm.h:

  typedef union Instruction {
    struct Inst {
      byte code;
      byte aux;
      short key;
    } i;
    int offset;
    byte buff[1];
  } Instruction;

So from this I can infer that each opcode instruction entry in the listing can be either i(an actualopcode instruction), an offset(for the opcode above it) or a byte buffer with 1 element in it.

The two fields I'm unclear about is the purpose of "key" and "buff". "key" doesn't seem to be usedduring the interpreting phase so I'm guessing maybe it's used during pattern construction? In whichcase I can probably ignore that?

Now the "buff" field is being used above in "testchar". Why is buff an array of byte with oneelement in it? Even if say this decayed into a pointer so it takes 4 bytes I still can't see how itfits in. Let say the pattern had a really big set. How would that translate into the opcode and whatwould the listing look like?


Can anyone shed some light and help clear this up?

Thanks!

Follow-Ups:
- Re: Questions about opcodes used in LPEG's vm implementation, Pierre-Yves Gérardy

Prev by Date: code snippet check
Next by Date: Please help debug - usage of luars232
Previous by thread: Re: code snippet check
Next by thread: Re: Questions about opcodes used in LPEG's vm implementation
Index(es):
- Date
- Thread