[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: LPEG: captures
- From: Parke <parke.nexus@...>
- Date: Thu, 28 May 2015 06:25:34 -0700
On Thu, May 28, 2015 at 12:52 AM, Alexander Mashin
<alex.mashin@gmail.com> wrote:
> This is the input:
> Perhaps, [[Peter|Simon]], or [[Paul]], so they say (see [[:Apocrypha]])
>
> This is the desired output:
>
> table {
> full = Perhaps, [[Name::Peter|Simon]], or [[Name::Paul]], so they say (see
> [[:Apocrypha]])
> items = table {
> 1 = Peter,
> 2 = Paul
> }
> separator = ,
> }
The above would be very tricky. "Peter" and "Paul" are captured
inside two different named captures (full and items). Additionally,
Peter and Paul are appended to the same named capture (items) even
though Peter and Paul occur at different locations in the input.
Will the following work for you?
grammar = [==[
wikitext <- {| ( link / separator / text )* |}
link <- {|
{:t:''->'link':}
{'[['}
!':'
''->'Name::'
{ ( !']]' !'|' . )+ }
{ '|' ( !']]' . )* / }
{']]'}
|}
separator <- {|
{:t:''->'separator':}
{ [,;*#] } { %s* } |}
text <- {|
{:t:''->'text':}
{ ( !link !separator . )+ } |}
]==]
s = 'Perhaps, [[Peter|Simon]], or [[Paul]], so they say (see [[:Apocrypha]])'
parser = require ( 're' ).compile ( grammar )
t = parser : match ( s )
print ( s )
print ()
for k,v in pairs ( t ) do
print ( string.format ( '%d %-20s %s', k, v.t, table.concat ( v ) ) )
end
print ()
for k,v in pairs ( t ) do
if v.t == 'link' then
print ( string.format ( '%d %s %d %-5s %s',
k, 'link', #v, v[3], v[4] ) )
end end
---
The above will output:
Perhaps, [[Peter|Simon]], or [[Paul]], so they say (see [[:Apocrypha]])
1 text Perhaps
2 separator ,
3 link [[Name::Peter|Simon]]
4 separator ,
5 text or
6 link [[Name::Paul]]
7 separator ,
8 text so they say (see [[:Apocrypha]])
3 link 5 Peter |Simon
6 link 5 Paul