Re: LPeg indent/whitespace based grammar

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: Re: LPeg indent/whitespace based grammar
From: Matthias Dörfelt <lists@...>
Date: Fri, 7 Jul 2017 18:34:03 -0700

Hi Duncan,

I just tried your implementation and I can’t really get it to work. My main question is where current_indent inside match_same_indent comes from. I tried some different things based on guessing without success.

Thanks,
Matthias

> On Jul 7, 2017, at 07:38, Duncan Cross <duncan.cross@gmail.com> wrote:
> 
> On Fri, Jul 7, 2017 at 5:23 AM, Matthias Dörfelt <lists@mokafolio.de> wrote:
>> While my implementation works, I feel like I am circumventing a lot of the
>> actual capturing mechanisms of LPeg in order to deal with back-referencing
>> to build the hierarchy. I’d like to get some feedback on weather there are
>> any better ways to approach this kind of grammar using LPeg. I am new to it
>> in general so I might be missing something obvious here.
> 
> Here's how I would approach it.
> 
> First, some helper patterns:
> 
> 
> --
> local initial_indent = lpeg.Cg(lpeg.P(''), 'indent')
> 
> local match_same_indent = lpeg.Cmt(
>  lpeg.Cb('indent'),
>  function(s, p, indent)
>    if s:sub(p, p + #indent - 1) ~= current_indent then
>      return false
>    end
>    return p + #indent
>  end)
> 
> local capture_deeper_indent = lpeg.Cg(
>  match_same_indent * lpeg.S' \t'^1,
>  'indent')
> --
> 
> 
> initial_indent captures an empty string and stores it as the named
> group "indent".
> 
> match_same_indent checks that the immediate next set of characters is
> identical to the current contextual value of the named group "indent".
> 
> capture_deeper_indent requires that the current indent matches at this
> position, and is immediately followed by at least one more whitespace
> character. If it succeeds, the full new indent is stored as a new
> named group capture for "indent".
> 
> To create the actual hierarchy document grammar, I would turn to the
> "re" module, passing in the helper patterns via the "defs" table
> parameter:
> 
> 
> --
> local hierarchy = re.compile([[
> 
>  hierarchy <- %initial_indent {|
>    {:children: {| (<first_root> <next_root>*)? |} :}
>  |} !.
> 
>  first_root <- <entry>
>  next_root <- %nl <entry>
> 
>  entry <- {|
>    {:name: <name> :}
>    {:children: {| (<first_child> <next_child>*)? |} :}
>  |}
> 
>  first_child <- %nl %capture_deeper_indent <entry>
>  next_child <- %nl %match_same_indent <entry>
> 
>  name <- (!%nl .)+
> 
> ]], {
>  initial_indent = initial_indent;
>  match_same_indent = match_same_indent;
>  capture_deeper_indent = capture_deeper_indent;
> })
> --
> 
> 
> To explain the elements of this grammar in English:
> 
> <hierarchy>:
>  - capture the empty string as an initial base value for "indent"
>  - create a table with a "children" sub-table
>  - populate "children" with either no values, or <first_root>
> followed by zero or more <next_root>s
>  - make sure there is no more input
> 
> <first_root>: same as <entry>
> 
> <next_root>: newline character followed by <entry>
> 
> <entry>: a table containing
>  - a <name> captured as field "name"
>  - a sub-table "children" containing either no values, or
> <first_child> followed by zero or more <next_child>ren
> 
> <first_child>:
>  - newline character
>  - capture a new, deeper indent than the current contextual indent
>  - <entry>
> 
> <next_child>:
>  - newline character
>  - match the current contextual indent
>  - <entry>
> 
> <name>: one or more non-newline characters
> 
> 
> A couple of quirks to note:
> 
> 1. This implementation does not enforce indentation in sets of four
> spaces, but instead any arbitrary set of spaces and tabs. Hopefully it
> should not be too difficult to modify capture_deeper_indent to be
> strict about this, if you want.
> 
> 2. The "indent" value ends up retained as a field inside "children"
> tables in the final result. For example:
> 
> --
> print(hierarchy:match('a\n    b').children[1].children.indent)
> --
> 
> ....will print four spaces. If this is an undesirable side-effect, a
> function capture could be used to remove this field once the
> "children" table is fully captured.
> 
> 
> -Duncan
> 
>

Follow-Ups:
- Re: LPeg indent/whitespace based grammar, Duncan Cross

References:
- LPeg indent/whitespace based grammar, Matthias Dörfelt
- Re: LPeg indent/whitespace based grammar, Matthias Dörfelt
- Re: LPeg indent/whitespace based grammar, Duncan Cross

Prev by Date: Re: LPeg indent/whitespace based grammar
Next by Date: Re: LPeg indent/whitespace based grammar
Previous by thread: Re: LPeg indent/whitespace based grammar
Next by thread: Re: LPeg indent/whitespace based grammar
Index(es):
- Date
- Thread