• Subject: Re: lpeg.U ?
• From: albertmcchan <albertmcchan@...>
• Date: Thu, 25 Jan 2018 20:15:24 -0500

```
On Jan 25, 2018, at 8:03 PM, Sean Conner <sean@conman.org> wrote:

>
>  It is at this point I'm going to ask what you are trying to accomplish
> here, because it's coming across as an XY problem [1].  I also saw this you
> posted on the forum linked to earlier [2]:
>
>    All of my code is for learning lpeg re, since many claimed that lpeg
>    can do everything lua pattern can do.
>
>  I do believe that LPeg can do everything that patterns can do, but nothing
> in that statement says anything about speed or ease.
>
>  With that said ...
>
> It was thus said that the Great albertmcchan once stated:
>> pat = re.compile "{g <- .g / &'and' }"   -- lua pattern "(.*)and"
>> = lpeg.pcode( pat )                             -- using debug version of lpeg
>>
>> i noticed its pcode has a "behind 3" instruction to not consume the last 'and'
>
>  I didn't find this lpeg.pcode() function.  There is an lpeg.ptree()
> function, and the re expression above generates:
>
>    [1 = g  ]
>    capture kind: 'simple'  key: 0
>      grammar 1
>        rule n: 0  key: 1
>          choice
>            seq
>              any
>              call key: 1  (rule: 0)
>            and
>              seq
>                char 'a'
>                seq
>                  char 'n'
>                  char 'd'
>
>  I'm not sure what you mean by "behind 3" instruction.
>
>> there is a lpeg.B function to do look-behind, but how to go back to it if B matched ?
>>
>> Is there a lpeg.U(n) (for undo n characters) or something similar ?
>
>  I think that lpeg.Cmt() can do something like that, but as I wrote in a
> previous message [3] to Jonathan Goble:
>
>>  I think you are thinking about LPeg with the wrong mindset---yes, you can
>> look for patters in text with LPeg [1] but it's for *parsing*---pulling out
>> semantic information from text, rather than just patterns.  I've written a
>> lot of LPeg code [2], and not once have I needed a greedy, non-possessive
>> repetition to parse text.
>
>  Now, back to your message:
>
>> As an example of its usefulness, say # is lpeg re for undo 1 character
>>
>> REDO above re pattern, but without backtrack stack overflow problem:
>> NOTE: I want to capture ALL except LAST 'and'
>>
>> pat = re.compile "{ (g <- 'and' / . [^a]* g)+ ### }"
>>
>> Without UNDO, I have to do this (likely much slower):
>>
>> pat = re.compile( "(g <- 'and' / . [^a]* g)+ -> drop3", {drop3 = function(s) return s:sub(1,-4) end} )
>
>  If you are looking for a final "and" (which ends the input), then this
> works:
>
>    last_and = P"and" * P(-1)
>    char     = R("\0\96","b\255")^1
>                 + -last_and * P"a"
>    pat      = C((char)^0) * last_and
>
>    print(pat:match(string.rep("this and that land",400) .. "and"))
>
>  The wierd production of 'char' is to burn through large sequences of
> charaters that don't contain the letter 'a'.  It's probably faster than this
> version:
>
>    last_and = P"and" * P(-1)
>    pat      = C((P(1) - last_and)^0) * last_and
>
> but I did not bother to benchmark it.  Personally, I would probably use the
> above version since it's easier to understand.  If it became an issue, then
> I might go with the version with 'char' and if that was still slow, then I
> would take stock with what I'm really trying to accomplish and adjust
> accordingly.
>
>  -spc (So, what is it you are really trying to do?)
>
> [1]    http://xyproblem.info/
>