lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Mon, Apr 11, 2011 at 9:36 PM, Josh Haberman <jhaberman@gmail.com> wrote:
> Suppose I had written a super-fast parser that can parse webserver
> access.log entries through an iterator-like interface:
>
> for entry in ParseLogEntriesAtTheSpeedOfLight(io.stdin) do
>  print(entry.ip_address)
>  print(entry.url)
> end
>
> There are a lot of text strings in my "entry" structure, and the app
> will likely not use them all, so I can save a lot of work by not actually
> copying the strings into a Lua string object until they are accessed.

Assuming you've benchmarked that this is actually a performance problem!

> Until then I can just keep references into my input buffer, and copy
> out of the input buffer on-demand only if the field ("ip_address") is
> used.
> The question is how to expose an API like this in a way that makes
> this limitation clear and makes crashes impossible.  My best shot so
> far is:
>
> entry = SpeedOfLightEntryParser(io.stdin)
> while entry:next() do
>  -- It's clear now that entry:next() is a destructive operation
>  -- and that attempting to save "entry" for later is pointless.
>  print(entry.ip_address)
> end


If I had such an unconvenient API, the first thing I'd do was wrap it
in an iterator, anyhow:

function niceandfast(file)
  entry = SpeedOfLightEntryParser(file)
  return function(entry)
           if entry:next() then
           return entry
           else return nil
           end
  end, entry
end

for _ in niceandfast(file) do ... end

So I don't think that approach is so helpful.

> It's sad to have to give up the very nice iterator syntax, but this
> seems like the cleanest way to provide these semantics which
> can be implemented much more efficiently!
>
> Thoughts?

You don't want accesses to an old entry to "crash", but are you ok
with it returning nils, or erroring?

Why not use userdata, so if they keep an entry in a global, they can
keep it, and if they don't assign it to a global it will get garbage
collected?

If you did that, you might have to pay for the copy of the data into
the userdata, though. Assuming that matters, you could allocate the
userdata, then do the network read directly into it, so no there would
be no wasted copies.

Or, you could allocate a much smaller userdata of sizeof(void*), and
remember the last one you allocated. The void* would point towards
whatever internal state you have that will get destroyed in the next
iteration. When the next iteration occurs, NULL the *userdata of the
last one so that it knows its invalid, and allocate a new one with a
pointer to the new state. Any calls on the last userdata will now see
their state is NULL, and error or return nil.

Cheers,
Sam