I've been working on some bindings for lua to google's RE2 regex
library and I got most of the important functions working. At the
moment, I'm working on optimizing and reducing the overhead for the
bindings.
I've been benching the lua bindings rigorously using the regexdna
benchmark from http://benchmarksgame.alioth.debian.org/u32/benchmark.php?test=regexdna&lang=all&data="">,
comparing it against a pure C++ implementation as well as pyre2
bindings.
What I found is that my lua bindings seem to have slightly higher
overhead compared to pyre2's bindings at least
when performing pattern match and count, eg. using RE2::FindAndConsume.
For example, when benchmarking with 1 iteration:
|
C++
|
pyre2
|
lua-re2
|
Time (seconds)
|
~1.401s
|
~1.424s
|
~1.51s
|
with 10 iterations:
|
C++
|
pyre2
|
lua-re2
|
Time (seconds)
|
~14.061s
|
~14.2720s
|
~15.2618s
|
What I'm trying to figure out is why lua-re2 is being slow here.
Here's the minimal relevant code snippets I used to test:
-- regexdna_re2.lua
re2 = require 'lua-re2'
function printelapse(start)
local elapse = os.clock() - start
print(elapse..'s')
end
-- omitted code
-- 'variants' is a table of regex dna patterns to match
-- 'seq' is a *really* long dna sequence string loaded from stdin
local start = os.clock()
for i = 1, 10 do
for _, p in ipairs(variants) do
io.write( string.format('%s %d\n', p,
re2(p):countmatch(seq)) )
end
end
printelapse(start)
// lua-re2.cpp
// a temporary function in my binding code used in the benchmark above
static int re2_countmatch(lua_State *L)
{
lua_settop(L, 2);
RE2 *re2obj = luaRE2_checkobject(L, 1);
StringPiece subject( luaL_checkstring(L, 2)
);
int c = 0;
while(RE2::FindAndConsume(&subject, *re2obj))
++c;
lua_pushinteger(L, c);
return 1;
}
# regexdna.py
from time import time
from sys import stdin, stdout
from re2 import sub, findall
def printelapse(start):
elapse = time() - start
print(str(elapse) + 's')
# omitted code
start = time()
for i in xrange(10):
for f in variants:
write(f.decode("utf-8") + ' ' +
str(len(findall(f.decode("utf-8"), seq))) + '\n')
printelapse(start)
// regexdna.cpp
#include "re2/re2.h"
#include <iostream>
#include <memory>
#include <stdio.h>
#include <time.h>
int main(void)
{
// lots of omitted code
clock_t start = clock();
for(int it = 0; it < 10; ++it)
{
for (int i = 0; i < (int)(sizeof(pattern1) /
sizeof(string)); ++i)
{
int count = 0;
RE2 pat(pattern1[i]);
StringPiece piece = str;
while (RE2::FindAndConsume(&piece, pat)) ++count;
printf("%s %d\n", pattern1[i].c_str(), count);
}
}
double elapse = (clock() - start)*1.0 / CLOCKS_PER_SEC;
cout << elapse << "s\n";
}
After a lot of experimenting, it seems the slowdown is
happening when I call 'luaL_checkstring(L, 2)' on the long dna 'seq'
passed in from lua code. For whatever reason, it seems to take a 10th
of a second to process on each iteration and it really becomes visible
when done for 10 iterations.
For instance, if I modify my re2_countmatch slightly to
use a static string for testing:
static int re2_countmatch(lua_State *L)
{
lua_settop(L, 2);
RE2 *re2obj = luaRE2_checkobject(L, 1);
static const string overhead_test = luaL_checkstring(L, 2);
StringPiece subject( overhead_test );
int c = 0;
while(RE2::FindAndConsume(&subject, *re2obj))
++c;
lua_pushinteger(L, c);
return 1;
}
This will only call luaL_checkstring(L, 2)
once to initialize overhead_test on the very first call
of re2_countmatch . Obviously I can't do this in a real
implementation since the 'haystack' string can be anything. But running
the benchmark again with all other parameters being equal, those same
10 iterations now complete in 14.1778 seconds.
So does anyone have any ideas on how to lower the overhead for the lua
bindings so that they're comparable with pyre2?
Testing setup and tools used:
- Mingw-gcc 4.6.3
- LuaJIT 2.02
- Python 2.7.3
- RE2 compiled from head using mingw above
- Windows 7 64-bit on Intel Q6600 quad-core.
Thank you for reading and sorry for the long post.
|