lpeg.cut? (Re: Elegant design for creating error messages in LPEG parser)

lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Subject: lpeg.cut? (Re: Elegant design for creating error messages in LPEG parser)
From: nobody <nobody+lua-list@...>
Date: Thu, 4 Apr 2019 00:28:20 +0200

On 03/04/2019 20.56, Sean Conner wrote:

   Any failed pattern would return nil,position, but as long as there are
alternatives, it won't matter.  But on a failure, at least you get the
offset into the string being parsed where the error was.

   That's about the minimum I could see for LPEG.  How about it Roberto?

would that just be to record line number as you go along using Carg(1) and
just print out the line where your parser fails ?

I feel like the minimum viable error would handle unknown unknowns without
being completely useless like nil, while keeping

parser code simple. ( I do not want to end up in a situation where the code
is 50% error handling ).


   For the majority of my LPEG programs, I've been able to get away with
parsed vs. failed, as there wasn't much I could do about a failed parse
(especially when parsing SIP messages---log the failure, drop it and move on
to the next message).  But having a position of failure would be nice.


## background

So far, almost every single time I used LPEG, I spent upwards of an hour(sometimes 6+ hours) on debugging. If you have a grammar and "only"have to translate it to LPEG, identifying a problem is usuallymanageable. But if you're trying to incrementally reconstruct a grammarfrom a bunch of known samples, this is really really painful. With verylarge or otherwise hard to inspect files (binary etc.), if the 1234threpetition of some structure has an extra field, the only way I know toidentify the problem is to do lots of match time print()ing…

As far as I can tell, part of the problem is that all branches are triedrecursively – i.e. match failures at any point are expected and don'tmean there's actually a problem, and so there's no hard informationavailable that could be printed after all branches failed.

My observation is that very often, there are points in the grammar /pattern, where trying alternatives is known to be useless. (If there wasa match for '<entity ' and now 'id=' is expected but 'ref=' is found, itdoesn't make sense to backtrack and try '<message ' etc. – theycertainly won't match – but LPEG doesn't know that.)



## proposal / question

Would it make sense to add a way to tell LPEG "do not backtrack pastthis point" – e.g. by 'lpeg.cut( )'? (I'm taking the name from Prolog –maybe there's a better name?) With cuts, there *would* be hard knowninformation that could be printed: The position in the input when LPEGattempted to backtrack over the cut.

Going a step further, `lpeg.cut( [name] )` could (maybe) be used toproduce something like a stack trace? (I haven't looked at the LPEGinternals, don't know how hard/easy this would be.)


With cut, I could have

  entity = lpeg.P "<entity" * WS^1 * lpeg.cut( "tag:entity" ) * "id=" …

and then LPEG has enough information to tell me that in 'tag:entity' atposition 12345 (in 'tag:state' at position 123 in 'tag:savegame' atposition 10) no alternative matched, and by using the position I cangrab the next couple of lexemes (or bytes) from the file, and then Iknow that there was 'ref=' instead of 'id=' and debugging would be *so*much easier.

At least that's the dream… Would that actually work? And is thissufficiently compatible with LPEG's internals? Or is that maybepossible already?


-- nobody

Follow-Ups:
- Re: lpeg.cut? (Re: Elegant design for creating error messages in LPEG parser), joy mondal
- Re: lpeg.cut? (Re: Elegant design for creating error messages in LPEG parser), Philipp Janda
- Re: lpeg.cut? (Re: Elegant design for creating error messages in LPEG parser), Sérgio Medeiros

References:
- Elegant design for creating error messages in LPEG parser, joy mondal
- Re: Elegant design for creating error messages in LPEG parser, Hugo Musso Gualandi
- Re: Elegant design for creating error messages in LPEG parser, joy mondal
- Re: Elegant design for creating error messages in LPEG parser, Sean Conner

Prev by Date: Re: Elegant design for creating error messages in LPEG parser
Next by Date: Fun math puzzle: cin(X)
Previous by thread: Re: Elegant design for creating error messages in LPEG parser
Next by thread: Re: lpeg.cut? (Re: Elegant design for creating error messages in LPEG parser)
Index(es):
- Date
- Thread