lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


I read this article ( 2007/09/20/Wide-Finder) where Tim Bray explains a Ruby program that processes apache logs with regexps in Ruby. In the following articles different implementations in many languages are shown, but none in Lua.

As I just started learning Lua (first pass through Progamming in Lua done) I thought I give it a go.

From the functional point of view I succeeded so far. It's probably not pretty, but works. But the performance is really bad in comparison with the Ruby sample from Tim. I am wondering why that is and if there is a better way to implement the solution?!

The ruby program mentionend in the article takes roughly 2 seconds [1] on my osx box [5] using Lua 5.1 [5] for processing 100k lines. With 5.6 seconds my first implementation (wf.lua) takes more than the double amount of time [2]. Here I use io.lines() to extract the lines and then use string.match() to extract the relevant url parts. In "Programming in Lua" I read that it is a good idea performance- wise to read files as a whole. So in wf2.lua I tried reading the file as a whole and then using string.gmatch() to iterate over the matches, but that was even worse. It took 17 seconds [3], 15 of that precious seconds were spent reading the file. That sounds strange to me, because the other variant, wf.lua, takes less time to do everything including reading the file, when using io.lines().

Any ideas and hints what I can do to make things faster and/or prettier?

Btw. the input file contains 191 MB. You'll find a small sample below [4]. Just for the sake of completness: I have 2 GB ram in the box that I used for testing and I also tried it with a smaller file. The difference between the Ruby version and the wf2.lua version was still around four times.


-- [1] --
localhost:~/l mkamp$ time ruby count.rb o100k.ap

8900: 2006/09/29/Dynamic-IDE!
600: 2005/07/27/Atomic-RSS!

real    0m1.937s
user    0m1.698s
sys     0m0.238s

-- [2] --
localhost:~/l mkamp$ time lua wf.lua
1. 2006/09/29/Dynamic-IDE : 8900
10. 2005/11/03/Cars-and-Office-Suites : 600

real    0m5.606s
user    0m5.375s
sys     0m0.231s

-- [3] --
localhost:~/l mkamp$ time lua wf2.lua
1. 2006/09/29/Dynamic-IDE : 8900
10. 2005/11/03/Cars-and-Office-Suites : 600

real    0m17.472s
user    0m9.363s
sys     0m8.104s

-- [4] -- - - [01/Oct/2006:08:01:29 -0700] "GET /ongoing/ ongoing.rss HTTP/1.1" 301 327 "-" "CFNetwork/129.16" - - [01/Oct/2006:08:01:29 -0700] "GET /ongoing/ ongoing.rss HTTP/1.1" 301 327 "-" "CFNetwork/129.16" - - [01/Oct/2006:08:01:29 -0700] "GET / ongoing/potd.png HTTP/1.1" 200 33496 ""; "Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en) AppleWebKit/418.8 (KHTML, like Gecko) Safari/419.3" - - [01/Oct/2006:08:01:30 -0700] "GET /ongoing/ ongoing.atom HTTP/1.1" 200 44877 "-" "CFNetwork/129.16" - - [01/Oct/2006:08:01:30 -0700] "GET /ongoing/ ongoing.atom HTTP/1.1" 200 44877 "-" "CFNetwork/129.16" - - [01/Oct/2006:08:01:31 -0700] "GET /ongoing/ongoing.atom HTTP/1.1" 304 - "-" "NetNewsWire/2.1b33 (Mac OS X;" - - [01/Oct/2006:08:01:31 -0700] "GET /ongoing/When/ 200x/2006/03/14/Saskatchewan HTTP/1.0" 200 6287 "-" "<a href='http://'> Forex Trading Network Organization </a>" - - [01/Oct/2006:08:01:32 -0700] "GET /ongoing/ ongoing.atom HTTP/1.1" 304 - "-" "Feedpath/1.0 (; 1 subscribers)" - - [01/Oct/2006:08:01:32 -0700] "GET /ongoing/When/ 200x/2005/03/11/WSInTheSpring HTTP/1.0" 200 10908 "-" "msnbot/1.0 (+"

-- [5] --
localhost:~/l mkamp$ uname -a
Darwin localhost 8.10.1 Darwin Kernel Version 8.10.1: Wed May 23 16:33:00 PDT 2007; root:xnu-792.22.5~1/RELEASE_I386 i386 i386
localhost:~/l mkamp$ lua -v
Lua 5.1.2  Copyright (C) 1994-2007, PUC-Rio