[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Yieldable LPEG Parser
- From: William Ahern <william@...>
- Date: Wed, 1 Feb 2012 10:52:54 -0800
On Wed, Feb 01, 2012 at 03:51:41PM -0200, Roberto Ierusalimschy wrote:
> > [Hmm, that brings to mind another question: How much of the input
> > string does it have to accumulate in memory? Can it start discarding
> > earlier portions of it at some point? If not, it wouldn't be so useful
> > for the use I have in mind: parsing really big files without needing to
> > have them entirely in memory.]
>
> That is an important point.
I've considered the ability to discard early, but that was when I wanted to
write a PEG engine from scratch. I wrote the above patch yesterday
afternoon, and I haven't investigated how easy it would be to do something
like that. The most obvious problem is captures. The capture code would need
to be substantially refactored, because currently it requires the origin
address. The patch basically converts all the (const char *) members to a
size_t offset; but the offset is still relative to the origin. getsidx() and
getsptr() both take the current origin address to derive an index and an
address, respectively.
I suppose there'd need to be an additional offset which registers how much
has been discarded, and which is subtracted from the origin offset.
Alternatively, string captures could be taken prospectively. I suppose the
latter would be preferable, actually.
> Also, do you have benchmarks comparing your patch to standard LPeg?
None so far. I'm going to clean things up some today, then I can benchmark.