[Date Prev][Date Next][Thread Prev][Thread Next]
- Subject: Re: Yieldable LPEG Parser
- From: Miles Bader <miles@...>
- Date: Wed, 01 Feb 2012 15:39:21 +0900
William Ahern <william@25thandClement.com> writes:
> Below is a _preliminary_ patch to make lpeg yieldable. It Works For Me(tm),
> but I haven't hammered it yet.
> Instead of passing a string to lpeg.match, you can pass an object (userdata
> or table). The object must have a "tovector" method, which returns a tuple:
> [light]userdata or string, length, end-of-string indicator. The method is
> called whenever the engine gets to the end of the current vector and the
> end-of-string indicator is still false. It calls tovector again and expects
> a vector that is at least one more byte longer or end-of-string indicator
> set. The source memory address can be different.
> Crucially, the tovector method can yield. (See checkvector() and
> growvector() in the patched file for how tovector is called.)
> This allows parsing a non-blocking source (raw socket, http body, etc)
> entirely online using a dynamic buffer.
>From an interface point-of-view, wouldn't it make more sense to use the
same protocol used by "load()" etc? I mean, a "callable" (function /
lambda / table-etc-with-__call-metatable-entry), which is called to
return the next portion of the input [or something else for EOF]?
[Hmm, that brings to mind another question: How much of the input
string does it have to accumulate in memory? Can it start discarding
earlier portions of it at some point? If not, it wouldn't be so useful
for the use I have in mind: parsing really big files without needing to
have them entirely in memory.]
Everywhere is walking distance if you have the time. -- Steven Wright