lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Suppose I had written a super-fast parser that can parse webserver
access.log entries through an iterator-like interface:

for entry in ParseLogEntriesAtTheSpeedOfLight(io.stdin) do
  print(entry.ip_address)
  print(entry.url)
end

There are a lot of text strings in my "entry" structure, and the app
will likely not use them all, so I can save a lot of work by not actually
copying the strings into a Lua string object until they are accessed.
Until then I can just keep references into my input buffer, and copy
out of the input buffer on-demand only if the field ("ip_address") is
used.

If I have the whole file mmap()'d this works great since I can
reconstruct any string at any time.  But if I'm operating over a
stream of data like a network socket, this is a problem since I don't
want to have to keep all the data I've ever read buffered in memory.
But in most cases this isn't a problem, because the application will
only need to look at one record at a time:

for entry in ParseLogEntriesAtTheSpeedOfLight(io.stdin) do
  -- Pull anything you need from "entry" out now, because it
  -- will no longer be valid once we advance to the next one!
  print(entry.ip_address)
  print(entry.url)

  -- Storing the entry away for later: not allowed!
  some_global_var = entry
end

-- Will crash, source data buffer no longer available.
print(some_global_var.ip_address)

The question is how to expose an API like this in a way that makes
this limitation clear and makes crashes impossible.  My best shot so
far is:

entry = SpeedOfLightEntryParser(io.stdin)
while entry:next() do
  -- It's clear now that entry:next() is a destructive operation
  -- and that attempting to save "entry" for later is pointless.
  print(entry.ip_address)
end

-- Returns nil, since done() returned true.
print(entry.ip_address)

It's sad to have to give up the very nice iterator syntax, but this
seems like the cleanest way to provide these semantics which
can be implemented much more efficiently!

Thoughts?
Josh