lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Miles Bader <miles@gnu.org> writes:
> Hmm, I suppose I should try to come up with a succinct example that
> illustrates this.

Ok, take the attached Lua program; it's intended to be run as an
executable shell-script (so if you try it, do "chmod +x" first).

It takes two arguments, MODE, and FILENAME.  MODE should be "inside"
to do the outermost loop in LPeg, and "outside" to do the loop in Lua,
calling LPeg for each iteration.  FILENAME is the input file to use;
it should be a big file to get measurable results.

This is of course an extreme example, as the inner LPeg pattern only
matches a single character... :)  But it demonstrates the issue which
I've run into with real world use with much less simplistic patterns.

Here's a test run:

   $ ls -l $snogr/images/dof18.exr
   -rw-r--r-- 1 miles miles 6761826 Dec 31  2011 /home/miles/src/snogray/images/dof18.exr

   $ /usr/bin/time ./lpeg-loop.lua outside $snogr/images/dof18.exr
   mode    outside
   char_sum        857014934
   5.50user 0.02system 0:05.53elapsed 99%CPU (0avgtext+0avgdata 18504maxresident)k
   0inputs+0outputs (0major+5673minor)pagefaults 0swaps

   $ /usr/bin/time ./lpeg-loop.lua inside $snogr/images/dof18.exr
   mode    inside
   char_sum        857014934
   2.67user 0.29system 0:02.97elapsed 99%CPU (0avgtext+0avgdata 213688maxresident)k
   0inputs+0outputs (0major+64827minor)pagefaults 0swaps


Note that the "inside LPeg" loop is:

  1. faster (I guess it avoids various LPeg invocation setup costs)

  2. Uses ten times as much memory (the "maxresident" value)... ><

It's (2) that's the real problem, of course, as my real-world tests
ended up exceeding system memory for inputs that worked OK with an
outside-of-LPeg loop.  As I also mentioned, however, doing everything
in LPeg would also make non-trivial grammars much easier to handle.

Thanks,

-Miles

#!/usr/bin/lua

-- Calculate the sum of the characters in a file using LPeg.
--
-- One of two "modes" can be used:  "inside" does the loop in LPeg
-- itself, using a single pattern to cover the entire file; "outside"
-- does the loop outside of LPeg, calling LPeg for each character.
-- This is intended to demonstrate the performance differences of the
-- two approaches (using an inside-LPeg loop is faster, but uses much
-- more memory).

local lpeg = require 'lpeg'

local char_sum = 0

local function accum (char)
   char_sum = char_sum + string.byte (char)
end

local CHAR = lpeg.P(1) / accum

local FILE = CHAR^0

local function loop_in_lpeg (string)
   if not FILE:match (string) then
      error ("match failed!")
   end
end

local function loop_outside_lpeg (string)
   local pos = 1
   while pos < #string do
      pos = CHAR:match (string, pos)
      if not pos then
	 error ("match failed!")
      end
   end
end

if #arg ~= 2 then
   io.stderr:write ("Usage: "..arg[0].." (inside|outside) FILENAME\n")
   os.exit (1)
end

local mode = arg[1]
local filename = arg[2]

local stream = io.open (filename, "r")
if not stream then
   error ("cannot open file "..filename)
end

local contents = stream:read '*a'
stream:close ()

if mode == "inside" then
   loop_in_lpeg (contents)
elseif mode == "outside" then
   loop_outside_lpeg (contents)
else
   error ("unknown mode '"..mode.."'")
end

print ("mode", mode)
print ("char_sum", char_sum)

-- 
Saa, shall we dance?  (from a dance-class advertisement)