[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: lpeg.U ?
- From: albertmcchan <albertmcchan@...>
- Date: Thu, 25 Jan 2018 20:15:24 -0500
On Jan 25, 2018, at 8:03 PM, Sean Conner <sean@conman.org> wrote:
>
> It is at this point I'm going to ask what you are trying to accomplish
> here, because it's coming across as an XY problem [1]. I also saw this you
> posted on the forum linked to earlier [2]:
>
> All of my code is for learning lpeg re, since many claimed that lpeg
> can do everything lua pattern can do.
>
> I do believe that LPeg can do everything that patterns can do, but nothing
> in that statement says anything about speed or ease.
>
> With that said ...
>
> It was thus said that the Great albertmcchan once stated:
>> pat = re.compile "{g <- .g / &'and' }" -- lua pattern "(.*)and"
>> = lpeg.pcode( pat ) -- using debug version of lpeg
>>
>> i noticed its pcode has a "behind 3" instruction to not consume the last 'and'
>
> I didn't find this lpeg.pcode() function. There is an lpeg.ptree()
> function, and the re expression above generates:
>
> [1 = g ]
> capture kind: 'simple' key: 0
> grammar 1
> rule n: 0 key: 1
> choice
> seq
> any
> call key: 1 (rule: 0)
> and
> seq
> char 'a'
> seq
> char 'n'
> char 'd'
>
> I'm not sure what you mean by "behind 3" instruction.
>
>> there is a lpeg.B function to do look-behind, but how to go back to it if B matched ?
>>
>> Is there a lpeg.U(n) (for undo n characters) or something similar ?
>
> I think that lpeg.Cmt() can do something like that, but as I wrote in a
> previous message [3] to Jonathan Goble:
>
>> I think you are thinking about LPeg with the wrong mindset---yes, you can
>> look for patters in text with LPeg [1] but it's for *parsing*---pulling out
>> semantic information from text, rather than just patterns. I've written a
>> lot of LPeg code [2], and not once have I needed a greedy, non-possessive
>> repetition to parse text.
>
> Now, back to your message:
>
>> As an example of its usefulness, say # is lpeg re for undo 1 character
>>
>> REDO above re pattern, but without backtrack stack overflow problem:
>> NOTE: I want to capture ALL except LAST 'and'
>>
>> pat = re.compile "{ (g <- 'and' / . [^a]* g)+ ### }"
>>
>> Without UNDO, I have to do this (likely much slower):
>>
>> pat = re.compile( "(g <- 'and' / . [^a]* g)+ -> drop3", {drop3 = function(s) return s:sub(1,-4) end} )
>
> If you are looking for a final "and" (which ends the input), then this
> works:
>
> last_and = P"and" * P(-1)
> char = R("\0\96","b\255")^1
> + -last_and * P"a"
> pat = C((char)^0) * last_and
>
> print(pat:match(string.rep("this and that land",400) .. "and"))
>
> The wierd production of 'char' is to burn through large sequences of
> charaters that don't contain the letter 'a'. It's probably faster than this
> version:
>
> last_and = P"and" * P(-1)
> pat = C((P(1) - last_and)^0) * last_and
>
> but I did not bother to benchmark it. Personally, I would probably use the
> above version since it's easier to understand. If it became an issue, then
> I might go with the version with 'char' and if that was still slow, then I
> would take stock with what I'm really trying to accomplish and adjust
> accordingly.
>
> -spc (So, what is it you are really trying to do?)
>
> [1] http://xyproblem.info/
>
> [2] http://www.gammon.com.au/forum/?id=14149&reply=31#reply31
>
> [3] http://lua-users.org/lists/lua-l/2017-10/msg00143.html
>