lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


It was thus said that the Great Josh Haberman once stated:
> Suppose I had written a super-fast parser that can parse webserver
> access.log entries through an iterator-like interface:
> 
> for entry in ParseLogEntriesAtTheSpeedOfLight(io.stdin) do
>   print(entry.ip_address)
>   print(entry.url)
> end
> 
> There are a lot of text strings in my "entry" structure, and the app
> will likely not use them all, so I can save a lot of work by not actually
> copying the strings into a Lua string object until they are accessed.
> Until then I can just keep references into my input buffer, and copy
> out of the input buffer on-demand only if the field ("ip_address") is
> used.
> 
> The question is how to expose an API like this in a way that makes
> this limitation clear and makes crashes impossible.  My best shot so
> far is:
> 
> It's sad to have to give up the very nice iterator syntax, but this
> seems like the cleanest way to provide these semantics which
> can be implemented much more efficiently!

  I wrote a parser for Apache log files (long time ago in C) and last year
sometime wrote some Lua bindings for it.  It returns a regular Lua table
with regular Lua strings; I just ran a quick test:

package.cpath = "/home/spc/source/lua/webanal/?.so;" .. package.cpath
require "parseweblog"

log   = io.open("boston.conman.org","r")
total = 0
for line in log:lines() do
  t = parseweblog(line)
  total = total + t.bytes
end

print(total)
io.stdin:read()
log:close()
print(total)

The log file I ran it over was 33M in size and the largest the program grew
to was 2276 k (total size---resident was 944 k) and in fact, it was 2276k
for pretty much the entire run of the program.

  I also have a syslogd replacement written in C/Lua [1] where I throw
everything recieved via the syslog protocol into a Lua table (mostly
strings), and well ... 

root 28515 0.0 0.0 1868 900 ? S Feb04 0:21 /usr/local/sbin/syslogintr --ip --local --lua-path /home/spc/source/sysloginter/modules/?.lua
                     ^   ^
                     |   +-- resident size in K
                     +------ overall size in K

  It's been running for over two months (had to reboot the server back then)
and it's pretty much always this size regardless of the amount of traffic
going through the system.

  Also, you might find some previous work I've done with Lua of interest
[2][3].
  

> Thoughts?

  I'd return a regular Lua table with Lua strings and if the script runs out
of memory, well ... that's the programmer's problem, not yours.  Lua does a
nice job with the garbage collection.

  One more data point---I have C code to parse email headers that I again,
created some Lua bindings for (returning again, a Lua table filled with
strings).  A C program can parse through (but otherwise does nothing else)
20,272 messages (my archives of this mailing list actually) in 3.628 seconds
(or 5,587 per second) while the equivalent Lua code burns through the same
messages in 4.775 seconds (or 4,245 per second).  Slower yes, but not
horribly so.

  Another thought---document somewhere how your API works so the person
writing the script will know what to do.

  Okay, one more question before I go---who will be writing the scripts?  

  -spc

[1]	http://www.conman.org/software/syslogintr/

[2]	http://lua-users.org/lists/lua-l/2009-11/msg00463.html

[3]	The bug I reported has since been fixed.