lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


>> If I understand things correctly (and I may not), it seems that the
>> "Q" rule references itself instead of "P" (call key: 2 instead of call
>> key: 1 in the failing tree above). This  leads to infinite
>> recursion/stack exhaustion during compilation.
>
> This is correct. The 'key' there is the index in the ktable (shown
> in the beginning of the tree) of the key of the rule being called.
> In that table, "2 = P", which means that the rule is calling "P"
> (which is correct).

Ah, so the tree itself is correct?

> [1 = P  2 = P  3 = Line  ]
> grammar 3
>   rule n: 0  key: 3
>     seq
>       char 'x'
>       capture cap: 5  key: 0  n: 0
>         call key: 1
>   rule n: 1  key: 2
>     char 'a'
>   rule n: 2  key: 0
>     call key: 2

rule 0 says "char 'x' followed by the capture of rule ktable[1]
rule 1 says "char 'a'"
rule 2 says "rule ktable[2]"

and rules are zero indexed and the ktable is indexed from 1. Am I
understanding it correctly?

I thought before that rule n: 2 [...] call key: 2 was a circular call,
but if it's in fact calling ktable[2] then that doesn't hold up
anymore. The stack trace also suggests that the problem happens when
generating code for the capture. For the test case above:

[...]
#261648 0x00007fd414f8c92c in hascaptures (tree=0x556503112b34) at lpcode.c:144
#261649 0x00007fd414f8c92c in hascaptures (tree=0x556503112b34) at lpcode.c:144
#261650 0x00007fd414f8c92c in hascaptures (tree=0x556503112b34) at lpcode.c:144
#261651 0x00007fd414f8dc6e in codecapture (compst=0x7ffc92cf02f0,
tree=0x556503112b14, tt=-1, fl=0x7fd414f8fd60 <fullset_>) at
lpcode.c:720
#261652 0x00007fd414f8e67b in codegen (compst=0x7ffc92cf02f0,
tree=0x556503112b14, opt=0, tt=-1, fl=0x7fd414f8fd60 <fullset_>) at
lpcode.c:907
#261653 0x00007fd414f8e303 in codegrammar (compst=0x7ffc92cf02f0,
grammar=0x556503112af4) at lpcode.c:851
#261654 0x00007fd414f8e6ae in codegen (compst=0x7ffc92cf02f0,
tree=0x556503112af4, opt=0, tt=-1, fl=0x7fd414f8fd60 <fullset_>) at
lpcode.c:909
#261655 0x00007fd414f8e9c2 in compile (L=0x556503109018,
p=0x556503112ae8) at lpcode.c:979
[...]

Another (semi-/unrelated) thing. In lpcode.c:codegrammar, code
generation is performed on all rules. I saw that lptree:verifygrammar
skips unreferenced rules. Is it safe to skip code generation on
unreferenced rules too? e.g.,

lpeg = require"lpeg"

lpeg.P{
  lpeg.V(2),
  lpeg.P"a",
  lpeg.P"z" + (1-lpeg.P"q")*lpeg.V(2),
}:pcode()

generates:

[1 = 2  2 = 2  3 = 1  ]
00: call -> 7
02: end
03: any
04: jmp -> 7
06: ret
07: char 'a'
08: ret
09: testchar 'z'-> 14
11: any
12: ret
13: any
14: set [(00-70)(72-ff)]
23: jmp -> 7
25: ret
26: end

with a patch to lpcode.c:codegrammar [1], it generates:

[1 = 2  2 = 2  3 = 1  ]
00: call -> 7
02: end
03: any
04: jmp -> 7
06: ret
07: char 'a'
08: ret
09: ret
10: end

It's probably possible to skip the extra 'ret' too, but there was an
assertion on that rules should end with IRet in lpcode.c:correctcalls
which I didn't want to touch. test.lua passed when I ran the patched
version against it.

[1]:
--- lpeg-1.0.0/lpcode.c    2015-09-28 19:40:46.000000000 +0200
+++ lpeg-1.0.0_branch/lpcode.c    2016-09-13 09:20:07.209067309 +0200
@@ -847,7 +847,9 @@
   jumptohere(compst, firstcall);
   for (rule = sib1(grammar); rule->tag == TRule; rule = sib2(rule)) {
     positions[rulenumber++] = gethere(compst);  /* save rule position */
-    codegen(compst, sib1(rule), 0, NOINST, fullset);  /* code rule */
+    if (rule->key > 0) { /* only generate code for referenced rules */
+      codegen(compst, sib1(rule), 0, NOINST, fullset);  /* code rule */
+    }
     addinstruction(compst, IRet, 0);
   }
   assert(rule->tag == TTrue);

On Mon, Sep 12, 2016 at 8:25 PM, Roberto Ierusalimschy
<roberto@inf.puc-rio.br> wrote:
>> I also reduced the test case further.
>>
>>
>> #!/usr/bin/env lua
>> lpeg = require "lpeg"
>>
>> p = lpeg.P{
>>   "Line";
>>   P = lpeg.P"a",
>>   Q = lpeg.V"P",
>>   Line = lpeg.P"x" * lpeg.C(lpeg.V"P"),
>> }
>>
>> p:ptree()
>> p:match("xx")
>>
>> [...]
>>
>> I think the rule ordering is different because of string (key) hash
>> randomization.
>
> We can simplify a little more that grammar, using numeric indices:
>
> p = lpeg.P{
>   (lpeg.P"x" * lpeg.C(1-lpeg.V(2))),
>   lpeg.P"a",
>   lpeg.V(2),
> }
>
> The use of numeric indices avoids the string hash randomization; as
> a result, at least in my machine, we get a consistent seg. fault in
> every run.
>
> -- Roberto
>