[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Profiling LPEG
- From: Sean Conner <sean@...>
- Date: Wed, 26 Apr 2023 15:44:43 -0400
It was thus said that the Great Roberto Ierusalimschy once stated:
> > Thanks for the confirmation. I'm going to have to rethink my approach.
>
> If that indeed is the culprit, you can still build a grammar in LPeg.
> You can build it piecemeal if you want:
>
> + local G = {}
> ...
> - local P = <some pattern>
> + G.P = <some pattern>
> + P = lpeg.V"P"
> ...
> + G[1] = final_pattern
> + final_pattern = lpeg.P(G)
I'm a bit unsure how to approach this though. I'm looking at RFC-5322:
Internet Message Format. It contains a bunch of rules that are used in
many other RFCs. Just as an example, the RFC defines the following rules:
quoted-pair = ("\" (VCHAR / WSP))
FWS = ([*WSP CRLF] 1*WSP) ; folding text
ctext = ... ; ASCII minus ( ) \
ccontent = ctext / quoted-pair / comment
comment = "(" *([FWS] ccontent) [FWS] ")"
CFWS = (1*([FWS] comment) [FWS]) / FWS
atext = ... ; ASCII plus some others
atom = [CFWS] 1*atext [CFWS]
dot_atom_text = 1*atext *("." 1*atext)
dot_atom = [CFWS] dot-atom-text [CFWS]
qtext = ... ; ASCII minus " \
qcontent = qtext / quoted-pair
quoted_string = [CFWS] DQUOTE *([FWS] qcontent) [FWS] DQUOTE
word = atom / quoted-string
phrase = 1*word
unstructured = (*([FWS] VCHAR) *WSP)
RFC-7208 reuses 'dot-atom', 'quoted-string', 'comment', 'CFWS' and 'FWS'
(along with 'CRLF', 'ALPHA', 'DIGIT' and 'SP' from RFC-5234, and
'Local-part', 'Domain' and 'Mailbox' from RFC-5321). RFC-5536 reuses 'FWS',
'comment', 'CFWS', 'atext', 'dot-atom-text' and 'phrase' (along with some
others not mentioned) from RFC-5322 (along with rules from RFC-2045,
RFC-3986, and RFC-5234). Using LPEG grammars just doesn't work with that I
was trying to do. At least, I don't think so.
-spc