[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: [LPeg] How can I signal parsing errors with LPeg?
- From: Sean Conner <sean@...>
- Date: Sat, 30 Jul 2016 20:45:34 -0400
It was thus said that the Great Soni L. once stated:
>
> Well... I did expect you to at least read the comments in the code:
It helps to be more explicit.
> + lpeg.P("\\") * lpeg.P(function(_, i)
> error("Unknown backslash escape at position " .. i) -- this error() is what I wanna get rid of.
> end)
First off, lpeg.P(function) and lpeg.Cmt(P(true),function) are
functionally the same. And in the code I presented:
+ lpeg.P"\\" * lpeg.Cmt(lpeg.Carg(1) * lpeg.C(1),
function(subject,pos,e,c)
table.insert(e,"Bad escape " .. c)
return nil
end)
I did generate a "dynamic error string" as you wanted. Okay, it wasn't the
same verbiage, but that's a simple change:
+ lpeg.P"\\" * lpeg.Cmt(lpeg.Carg(1) * lpeg.C(1),
function(subject,pos,e,c)
table.insert(
e,
string.format(
"Unknown backslash escape '\\%s' at position %d",
c,
pos
))
return nil
end)
(I did you one better by being even more dynamic! But I digress)
The whole reason for lpeg.Carg() is to preserve the error message.
lpeg.Cmt() is a "match-time compile" and a match returning nil doesn't mean
the expression is aborted---it just means that particular part of the
expression fails. For example:
pat = lpeg.P"a" * (lpeg.P"b" + lpeg.P"c")
x = pat:match "ac"
"a" matches, so we get to the second half of the pat expression. The first
part, lpeg.P"b" fails, and it returns nil. But since we have an
alternative (the '+' operator) we check the next alternative and get a
non-nil result (a match) so the entire expression matches. The following:
x = pat:match "ad"
fails entirely. The first part matches ('a') but the second half fails both
parts (it's not 'b' or 'c') so that subexpression returns nil, and since
there isn't an alternative, the expression as a whole fails. I'm going into
detail here for a reason. Now, let's change the expression a bit:
pat = lpeg.P"a"
* (
lpeg.Cmt(lpeg.P(1),function(subject,position,capture)
if capture == 'b' then
return position
else
return nil,"invalid input " .. capture
end
end
)
+ lpeg.P"c"
)
Yes, it's a bit harder to read, but I replaced lpeg.P"b" with a match-time
capture such that if it isn't 'b', returns nil and a supposed error message.
If you run:
x = pat:match "ac"
you won't see the error message. Heck, if you run
x = pat:match "ad"
you won't see the error message (nor will you for pat:match"a"). That's
because LPeg appears to ignore any results past the first nil. It seems to
contradict this bit of the LPeg documentation:
Any extra values returned by the function become the values produced
by the capture.
but hey, sometimes you have to do emperical tests and go with how the code
reacts and maybe send a message to the LPeg authors to maybe clarify the
issue.
In any case, that's why I went with lpeg.Carg(), to preserve the error
message. Yes, it sucks that you can't do what you want with LPeg. Are
there better ways? Perhaps. It's something you'll have to experiment with.
Perhaps change LPeg if the authors don't see fit to add the functionality
you want (to be fair, error handling in parsers has traditionally been
poor; for a real horror show, try lex and yacc).
Could I have better presented the choice I made? Yes. Could you have
been a bit less terse (and coming across as demanding)? Yes. We're both at
fault here.
-spc (Just remember that any code presented as a possible solution to your
problem will have to be modified to fit in with your current
solution)