[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Elegant design for creating error messages in LPEG parser
- From: Sean Conner <sean@...>
- Date: Mon, 8 Apr 2019 04:33:36 -0400
It was thus said that the Great joy mondal once stated:
> Hi Sean,
>
> It seems everything lpeg.Cf can do can be done with lpeg.Cmt.
I think everything with LPEG can be done with lpeg.Cmt() but I haven't
proved that to myself yet (it's a conjecture). But I don't use lpeg.Cmt()
all that much actually.
> Under what circumstances would using lpeg.Cmt INSTEAD of lpeg.Cf be
> considered a severe design failure ?
I don't know. Here's one recent instance I've used lpeg.Cf():
local cutf8 = ... -- LPEG code to parse a single UTF-8 character
local nc = ... -- LPEG code of certain UTF-8 characters to not count
local cnt = lpeg.Cf(
lpeg.Cc(0) * (nc + cutf8 * lpeg.Cc(1))^0,
function(c) return c + 1 end
)
So I start by capturing a 0 (which is the accumulated value), then cycle
through the string. A "character" matching cutf8 then returns a 1, which is
added to the running accumulator; otherwise, the "character" is ignored and
we skip to the next character.
I've also used lpeg.Cf() in code that parses the format string given to
os.date(), such as:
"%A, %d %B %Y @ %H:%M:%S"
and return another LPEG expression that will parse a string of that format,
like:
"Monday, 02 July 2018 @ 16:02:48"
into a Lua table. Code:
https://github.com/spc476/LPeg-Parsers/blob/master/strftime.lua
I use lpeg.Cf() when accumulating a result into something *other* than a
table. To answer your question, when would using lpeg.Cmt() over lpeg.Cf()
be a design failure? When you aren't taking advantage of the match time
capability of lpeg.Cmt(). I gave an example earlier of:
dec_octet = lpeg.Cmt(lpeg.R"09"^1,function(_,pos,capture)
local v = tonumber(capture)
if v < 256 then
return pos,v
end
end)
But some other examples can be found in the strftime.lua module referenced
above, such as:
function chkrange(min,max)
return function(_,position,capture)
local val = tonumber(capture)
if val < min or val > max then
return false
else
return position,val
end
end
end
dday = Cmt(digit * digit, chkrange(1, 31))
I could do this stuff with patterns only, but the expressions can become
unweildy and hard to follow. This makes it pretty easy to see what's being
checked. And the reason I'm doing it this way with lpeg.Cmt() and not like:
dday = (digit * digit)
/ function(capture)
local val = tonumber(capture)
if val >= 1 and val <= 31 then
return val
end
end
Is that the pattern will *NOT* fail to match a day of "45" but instead
return nil, which is *NOT* what I want. I want the pattern to fail. I'm
using lpeg.Cmt() because I'd rather not write a complicated LPEG expression
to parse numbers from 1 to 31 (or 01 to 31).
> I tried using lpeg.Cf recursively and its quite convoluted.
>
> For parsing thing like this:
>
> [[[]]]
What, exactly are you trying to parse there? Something like:
[ 1 2 3 [ 4 5 6 [ 7 8 9 ] 10 11 ] 12 13 ] --?
> Its quite a bit easier with Cmt since I can create an empty table ( state )
> at the start of the loop. with Cf you are not sure if you at the
> start,middle or end of the loop.
I rarely use lpeg.Cf() to fold captures into a table. I'd rather use
lpeg.Ct() with a capture pattern inside:
list = lpeg.P"[" * lpeg.Ct((SPACE * C(number))^0) * lpeg.P"]"
If it's recursive, then an LPEG grammar would be appropriate:
list = lpeg.P {
'list',
list = lpeg.P'[' * lpeg.Ct(lpeg.V"data") * lpeg.P']',
data = SPACE * C(number)
+ lpeg.V"list"
}
This would be enough to parse the example I'm asking about (given an
appropriate definition for SPACE and number).
> I had a look at the moon-script code base ( written using LPEG ) and there
> seems no usage of lpeg.Cf.
>
> What I am trying to find is the minimum number of functions that is needed
> for LPEG.
>
> Up until now I haven't found a function that cannot be used uniquely in a
> given situation, so I am quite curious to be proven wrong.
I don't understand what you mean here.
-spc