lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On 03/04/2019 20.56, Sean Conner wrote:
   Any failed pattern would return nil,position, but as long as there are
alternatives, it won't matter.  But on a failure, at least you get the
offset into the string being parsed where the error was.

   That's about the minimum I could see for LPEG.  How about it Roberto?

would that just be to record line number as you go along using Carg(1) and
just print out the line where your parser fails ?

I feel like the minimum viable error would handle unknown unknowns without
being completely useless like nil, while keeping

parser code simple. ( I do not want to end up in a situation where the code
is 50% error handling ).

   For the majority of my LPEG programs, I've been able to get away with
parsed vs. failed, as there wasn't much I could do about a failed parse
(especially when parsing SIP messages---log the failure, drop it and move on
to the next message).  But having a position of failure would be nice.

## background

So far, almost every single time I used LPEG, I spent upwards of an hour (sometimes 6+ hours) on debugging. If you have a grammar and "only" have to translate it to LPEG, identifying a problem is usually manageable. But if you're trying to incrementally reconstruct a grammar from a bunch of known samples, this is really really painful. With very large or otherwise hard to inspect files (binary etc.), if the 1234th repetition of some structure has an extra field, the only way I know to identify the problem is to do lots of match time print()ing…

As far as I can tell, part of the problem is that all branches are tried recursively – i.e. match failures at any point are expected and don't mean there's actually a problem, and so there's no hard information available that could be printed after all branches failed.

My observation is that very often, there are points in the grammar / pattern, where trying alternatives is known to be useless. (If there was a match for '<entity ' and now 'id=' is expected but 'ref=' is found, it doesn't make sense to backtrack and try '<message ' etc. – they certainly won't match – but LPEG doesn't know that.)


## proposal / question

Would it make sense to add a way to tell LPEG "do not backtrack past this point" – e.g. by 'lpeg.cut( )'? (I'm taking the name from Prolog – maybe there's a better name?) With cuts, there *would* be hard known information that could be printed: The position in the input when LPEG attempted to backtrack over the cut.

Going a step further, `lpeg.cut( [name] )` could (maybe) be used to produce something like a stack trace? (I haven't looked at the LPEG internals, don't know how hard/easy this would be.)

With cut, I could have

  entity = lpeg.P "<entity" * WS^1 * lpeg.cut( "tag:entity" ) * "id=" …

and then LPEG has enough information to tell me that in 'tag:entity' at position 12345 (in 'tag:state' at position 123 in 'tag:savegame' at position 10) no alternative matched, and by using the position I can grab the next couple of lexemes (or bytes) from the file, and then I know that there was 'ref=' instead of 'id=' and debugging would be *so* much easier.


At least that's the dream… Would that actually work? And is this sufficiently compatible with LPEG's internals? Or is that maybe possible already?

-- nobody