[Date Prev][Date Next][Thread Prev][Thread Next]
- Subject: Re: Yieldable LPEG Parser
- From: William Ahern <william@...>
- Date: Wed, 1 Feb 2012 19:13:41 -0800
On Thu, Feb 02, 2012 at 11:50:34AM +0900, Miles Bader wrote:
> Tony Finch <email@example.com> writes:
> > Miles Bader <firstname.lastname@example.org> wrote:
> >> [Hmm, that brings to mind another question: How much of the input
> >> string does it have to accumulate in memory? Can it start discarding
> >> earlier portions of it at some point? If not, it wouldn't be so useful
> >> for the use I have in mind: parsing really big files without needing to
> >> have them entirely in memory.]
> > lpeg doesn't bother trying to discard unneeded string prefixes. In theory
> > it can only discard the prefix of a string that is not covered by any
> > captures and which has no alternation backtracking points in it.
> I suspect that in many cases, backtracking doesn't cover much of the
> file, or the grammar can be arranged so that this is the case...
> [I don't know how easy the implementation makes _detecting_ when parts
> of the text are discardable though...]
> As has been discussed elsewhere on this thread, the capture issue could
> probably be sorted out, e.g. by lazy conversion of capture contents into
> real Lua strings.
Perhaps the engine could ask the buffer object for a window. If the object
can't provide the window (because it discarded data already), then bail. But
also add a new match-time capture type, similar to lpeg.Cmt, which
immediately internalizes string captures and executes a function which could
be used to discard data in the buffer.