[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Yieldable LPEG Parser
- From: William Ahern <william@...>
- Date: Wed, 1 Feb 2012 17:15:30 -0800
On Wed, Feb 01, 2012 at 05:05:24PM -0800, William Ahern wrote:
> On Wed, Feb 01, 2012 at 04:27:16PM -0800, William Ahern wrote:
> > On Wed, Feb 01, 2012 at 03:51:41PM -0200, Roberto Ierusalimschy wrote:
> > > > [Hmm, that brings to mind another question: How much of the input
> > > > string does it have to accumulate in memory? Can it start discarding
> > > > earlier portions of it at some point? If not, it wouldn't be so useful
> > > > for the use I have in mind: parsing really big files without needing to
> > > > have them entirely in memory.]
> > >
> > > That is an important point. Also, do you have benchmarks comparing your
> > > patch to standard LPeg?
> > >
> >
> > Attached is my LPeg JSON library used for testing.
>
> Same library, but parsing a single 5.1MB JSON file:
>
Parsing same 5.1MB JSON file, but yielding every 512 bytes. It's 0.03
seconds slower.
% for I in 1 2 3; do time ./rfc-index.lua < /tmp/rfc-index.json; done
lpeg 0.10 (yieldable)
./rfc-index.lua < /tmp/rfc-index.json 2.25s user 0.46s system 99% cpu 2.725
total
lpeg 0.10 (yieldable)
./rfc-index.lua < /tmp/rfc-index.json 2.25s user 0.47s system 99% cpu 2.732
total
lpeg 0.10 (yieldable)
./rfc-index.lua < /tmp/rfc-index.json 2.25s user 0.47s system 99% cpu 2.739
total
Here's the yielding rfc-index.lua script:
#!/tmp/build/bin/lua5.2
local which = ...
local json = require(which or "json")
local input = io.stdin:read("*a")
local buffer = {
len = 0,
tovector = function(self, yieldable)
if yieldable then
coroutine.yield()
end
if self.len < #input then
self.len = math.min(self.len + 512, #input)
end
return input, self.len, (self.len == #input)
end
}
local done = false
local step = coroutine.wrap(function()
local table = json.decode(buffer)
done = true
end)
repeat
step()
until done