lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


I've got an awkward grammar that requires me to reach back to a previously parsed element to see what it is (is it a newline or not) before I can go accept the match. Because of the way I need to implement things, the newline is actually captured in a previous match and there should be a NewlineChunk object as the capture. That object isn't available for me to use in the next match, so I decided to try and use Cmt and Cb to figure out what I need to do. My function is getting called correctly, but after I return true (or i) LPEG re-evaluates the same text but provides me with a different back reference to compare, which fails and then sends me down the wrong path.

Some examples will clear this up I hope. I need to implement what amounts to an if statement, with very specific rules about what whitespace I must strip. Specifically if I have

'one$if(foo)$ two \n$endif$\n three'

The the newline before and after the $endif has to be stripped. The resulting text would be 'one two three'.

If I have

'one$if(foo)$ two $endif$\n three'

the resulting text would be 'one two\n three'. Note that I'm not supposed to chew the newline after $endif$, as it isn't on a line by itself.

The 'two' bit can obviously be more than just text, and in particular it can be more $if(..)$ blah $endif$ statements, meaning I need to treat blah as an arbitrary series of chunks to be parsed. Because of the way I need to handle newlines, they are treated as a separate flyweight object.

So in the first case where I have to chew that newline object, I can't just do (ignoring captures, etc.):

EndifExpr = s.NEWLINE * ExprStart * s.ENDIF * ExprEnd * s.NEWLINE

The first newline is captured in the parsing of the chunks. So the rule I came up with is (sorry for any bad wrapping):

 EndifExpr = Cmt(Cb(1) * ExprStart * s.ENDIF * ExprEnd * s.NEWLINE,
                    function(s,i,a)
print('checking if we need to kill a newline in \'' .. s ..
                              '\' at position ' .. i)
                        if a.isA then
                            if a:isA(NewlineChunk) then
                                print('really need to kill newline')
                                return i, "kill"
                            else
                                print('not a NewlineChunk class')
                                return false
                            end
                        else
                            print('not a ST class (not expected)', a)
                            return false
                        end
                    end) +
                Cs((ExprStart * C(s.ENDIF) * ExprEnd) / "dontkill"),

Basically I want to see "kill" or "dontkill" to decide what to do with the last chunk in my table of chunks. When the overall match is determined, a function is called to create the if chunk, which is where I make my decision.

And yes, this match does actually execute. The problem for me is that it executes twice. Before you ask, if I replace 'return i, "kill"' with 'return i+1, "kill"' in an attempt to advance the match position, I still see this same behavior. The following is debug output from LPEG:

|| s: |$endif$
||  three| stck: 9 c: 14  195: choice -> 198 (0)
|| s: |$endif$
||  three| stck: 10 c: 14  196: call -> 69
|| s: |$endif$
||  three| stck: 11 c: 14  69: choice -> 90 (0)
|| s: |$endif$
||  three| stck: 12 c: 14  70: opencapture runtime(n = 0)  (off = 8)
|| s: |$endif$
||  three| stck: 12 c: 15  71: emptycapture backref(n = 0)  (off = 1)
|| s: |$endif$
||  three| stck: 12 c: 16  72: call -> 309
|| s: |$endif$
||  three| stck: 13 c: 16  309: set [(24)]
|| s: |endif$
||  three| stck: 13 c: 16  318: ret
|| s: |endif$
||  three| stck: 12 c: 16  73: char 'e'
|| s: |ndif$
||  three| stck: 12 c: 16  74: char 'n'
|| s: |dif$
||  three| stck: 12 c: 16  75: char 'd'
|| s: |if$
||  three| stck: 12 c: 16  76: char 'i'
|| s: |f$
||  three| stck: 12 c: 16  77: char 'f'
|| s: |$
||  three| stck: 12 c: 16  78: call -> 307
|| s: |$
||  three| stck: 13 c: 16  307: char '$'
|| s: |
||  three| stck: 13 c: 16  308: ret
|| s: |
||  three| stck: 12 c: 16  79: set [(0a)(0d)]
|| s: | three| stck: 12 c: 16  88: closeruntime close(n = 0)  (off = 0)
|| checking if we need to kill a newline in 'one$if(foo)$ two
|| $endif$
||  three' at position 27
|| really need to kill newline
|| s: | three| stck: 12 c: 16  89: commit -> 102
|| s: | three| stck: 11 c: 16  102: ret
|| s: | three| stck: 10 c: 16  197: failtwice
|| s: |$endif$
||  three| stck: 7 c: 14  435: closecapture close(n = 0)  (off = 0)
|| s: |$endif$
||  three| stck: 7 c: 15  436: call -> 69
|| s: |$endif$
||  three| stck: 8 c: 15  69: choice -> 90 (0)
|| s: |$endif$
||  three| stck: 9 c: 15  70: opencapture runtime(n = 0)  (off = 8)
|| s: |$endif$
||  three| stck: 9 c: 16  71: emptycapture backref(n = 0)  (off = 1)
|| s: |$endif$
||  three| stck: 9 c: 17  72: call -> 309
|| s: |$endif$
||  three| stck: 10 c: 17  309: set [(24)]
|| s: |endif$
||  three| stck: 10 c: 17  318: ret
|| s: |endif$
||  three| stck: 9 c: 17  73: char 'e'
|| s: |ndif$
||  three| stck: 9 c: 17  74: char 'n'
|| s: |dif$
||  three| stck: 9 c: 17  75: char 'd'
|| s: |if$
||  three| stck: 9 c: 17  76: char 'i'
|| s: |f$
||  three| stck: 9 c: 17  77: char 'f'
|| s: |$
||  three| stck: 9 c: 17  78: call -> 307
|| s: |$
||  three| stck: 10 c: 17  307: char '$'
|| s: |
||  three| stck: 10 c: 17  308: ret
|| s: |
||  three| stck: 9 c: 17  79: set [(0a)(0d)]
|| s: | three| stck: 9 c: 17  88: closeruntime close(n = 0)  (off = 0)
|| checking if we need to kill a newline in 'one$if(foo)$ two
|| $endif$
||  three' at position 27
|| not a ST class (not expected)        table: 0x809ad88

So for some reason, despite returning the fact that I said the match succeeded, it appears to have failed and is calling it again. At this point, I'm not sure what captured element is being provided, unless its the entire table of chunks.

I'm stumped. Anyone have any pointers as to why my Cmt capture isn't working as I expected?

--
Glenn McAllister     <glenn@somanetworks.com>      +1 416 348 1594
SOMA Networks, Inc.  http://www.somanetworks.com/  +1 416 977 1414