[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: [LPeg] intermittent stack exhaustion
- From: Sebastian Cato <seb.cato@...>
- Date: Tue, 13 Sep 2016 10:15:39 +0200
>> If I understand things correctly (and I may not), it seems that the
>> "Q" rule references itself instead of "P" (call key: 2 instead of call
>> key: 1 in the failing tree above). This leads to infinite
>> recursion/stack exhaustion during compilation.
>
> This is correct. The 'key' there is the index in the ktable (shown
> in the beginning of the tree) of the key of the rule being called.
> In that table, "2 = P", which means that the rule is calling "P"
> (which is correct).
Ah, so the tree itself is correct?
> [1 = P 2 = P 3 = Line ]
> grammar 3
> rule n: 0 key: 3
> seq
> char 'x'
> capture cap: 5 key: 0 n: 0
> call key: 1
> rule n: 1 key: 2
> char 'a'
> rule n: 2 key: 0
> call key: 2
rule 0 says "char 'x' followed by the capture of rule ktable[1]
rule 1 says "char 'a'"
rule 2 says "rule ktable[2]"
and rules are zero indexed and the ktable is indexed from 1. Am I
understanding it correctly?
I thought before that rule n: 2 [...] call key: 2 was a circular call,
but if it's in fact calling ktable[2] then that doesn't hold up
anymore. The stack trace also suggests that the problem happens when
generating code for the capture. For the test case above:
[...]
#261648 0x00007fd414f8c92c in hascaptures (tree=0x556503112b34) at lpcode.c:144
#261649 0x00007fd414f8c92c in hascaptures (tree=0x556503112b34) at lpcode.c:144
#261650 0x00007fd414f8c92c in hascaptures (tree=0x556503112b34) at lpcode.c:144
#261651 0x00007fd414f8dc6e in codecapture (compst=0x7ffc92cf02f0,
tree=0x556503112b14, tt=-1, fl=0x7fd414f8fd60 <fullset_>) at
lpcode.c:720
#261652 0x00007fd414f8e67b in codegen (compst=0x7ffc92cf02f0,
tree=0x556503112b14, opt=0, tt=-1, fl=0x7fd414f8fd60 <fullset_>) at
lpcode.c:907
#261653 0x00007fd414f8e303 in codegrammar (compst=0x7ffc92cf02f0,
grammar=0x556503112af4) at lpcode.c:851
#261654 0x00007fd414f8e6ae in codegen (compst=0x7ffc92cf02f0,
tree=0x556503112af4, opt=0, tt=-1, fl=0x7fd414f8fd60 <fullset_>) at
lpcode.c:909
#261655 0x00007fd414f8e9c2 in compile (L=0x556503109018,
p=0x556503112ae8) at lpcode.c:979
[...]
Another (semi-/unrelated) thing. In lpcode.c:codegrammar, code
generation is performed on all rules. I saw that lptree:verifygrammar
skips unreferenced rules. Is it safe to skip code generation on
unreferenced rules too? e.g.,
lpeg = require"lpeg"
lpeg.P{
lpeg.V(2),
lpeg.P"a",
lpeg.P"z" + (1-lpeg.P"q")*lpeg.V(2),
}:pcode()
generates:
[1 = 2 2 = 2 3 = 1 ]
00: call -> 7
02: end
03: any
04: jmp -> 7
06: ret
07: char 'a'
08: ret
09: testchar 'z'-> 14
11: any
12: ret
13: any
14: set [(00-70)(72-ff)]
23: jmp -> 7
25: ret
26: end
with a patch to lpcode.c:codegrammar [1], it generates:
[1 = 2 2 = 2 3 = 1 ]
00: call -> 7
02: end
03: any
04: jmp -> 7
06: ret
07: char 'a'
08: ret
09: ret
10: end
It's probably possible to skip the extra 'ret' too, but there was an
assertion on that rules should end with IRet in lpcode.c:correctcalls
which I didn't want to touch. test.lua passed when I ran the patched
version against it.
[1]:
--- lpeg-1.0.0/lpcode.c 2015-09-28 19:40:46.000000000 +0200
+++ lpeg-1.0.0_branch/lpcode.c 2016-09-13 09:20:07.209067309 +0200
@@ -847,7 +847,9 @@
jumptohere(compst, firstcall);
for (rule = sib1(grammar); rule->tag == TRule; rule = sib2(rule)) {
positions[rulenumber++] = gethere(compst); /* save rule position */
- codegen(compst, sib1(rule), 0, NOINST, fullset); /* code rule */
+ if (rule->key > 0) { /* only generate code for referenced rules */
+ codegen(compst, sib1(rule), 0, NOINST, fullset); /* code rule */
+ }
addinstruction(compst, IRet, 0);
}
assert(rule->tag == TTrue);
On Mon, Sep 12, 2016 at 8:25 PM, Roberto Ierusalimschy
<roberto@inf.puc-rio.br> wrote:
>> I also reduced the test case further.
>>
>>
>> #!/usr/bin/env lua
>> lpeg = require "lpeg"
>>
>> p = lpeg.P{
>> "Line";
>> P = lpeg.P"a",
>> Q = lpeg.V"P",
>> Line = lpeg.P"x" * lpeg.C(lpeg.V"P"),
>> }
>>
>> p:ptree()
>> p:match("xx")
>>
>> [...]
>>
>> I think the rule ordering is different because of string (key) hash
>> randomization.
>
> We can simplify a little more that grammar, using numeric indices:
>
> p = lpeg.P{
> (lpeg.P"x" * lpeg.C(1-lpeg.V(2))),
> lpeg.P"a",
> lpeg.V(2),
> }
>
> The use of numeric indices avoids the string hash randomization; as
> a result, at least in my machine, we get a consistent seg. fault in
> every run.
>
> -- Roberto
>