[Date Prev][Date Next][Thread Prev][Thread Next]
- Subject: Looking for ideas to lower binding overhead for lua-re2. in plain-text)
- From: lua.greatwolf@...
- Date: Fri, 23 Aug 2013 17:42:30 -0700
Thank you for the feedback. Yes in my actual code I am going to be using
"luaL_checklstring". My "re_countmatch" is really just a throwaway
function that I'm using to help isolate possible performance
bottlenecks. I'm finding that trying to shave off those last couple
100ms is proving to be very difficult.
You should be using luaL_checklstring() here. You can pass a pointer _and_
the length to the StringPiece constructor:
const char *p = luaL_checklstring(L, 2, &n);
StringPiece subject(p, (int)n);
Here's a snippet from the RE2 bindings I've written. These routines
implement the gmatch iterator:
Did you happen to do any performance benchmarking and tuning on your
bindings to see how it compares with C/C++ usage or another language
binding to re2?
I came to the conclusion that 'luaL_checkstring' and its variants like
*_tostring, *_tolstring, etc. is the probable cause for that last 100ms
of overhead for long strings after experimenting, testing and observing
- The pyre2 binding's done with Cython. The conversion from a pystring
into C-style string is done using 'PyObject_AsCharBuffer' from Python's
C API which is almost analogous to lua's 'luaL_tolstring' functions.
- In both cases, when benchmarking pattern match count, both lua code
and python code has to make the same number of transitions into C and
back again whenever they call "countmatch" or "findall" that performs
the match operation.
- I thought compiling the pattern might account for the difference. So
I precompiled all the patterns that was going to be used outside the
loop. Timing only the loop doing the match count with the precompiled
re2 pattern actually made no measurable difference.
- I noticed pyre2 wasn't always returning a unique object address when
compiling a pattern. It looks like it's using some kind of memory pool
scheme where it reuses space from dead deallocated pattern objects. I
thought maybe creating a new re2 object for every pattern and hitting up
'lua_newuserdata' everytime might account for the difference. So I
changed my 'luaRE2_newobject' to return the same userdata instead.
Testing again, made no measurable difference.
- I originally used the iterator approach like what you've presented in
your code and benchmarked that. Here's a sample output of the benchmark.
Each line has the re2 dna pattern to match followed by the number of
matches found in the provided dna seq string from stdin.
Those numbers also meant how many times the code had to go from lua -> C
and back again because 'luare2_nextmatch' had to be called that number
of times before finally returning nil. I found this to be almost 400ms
slower on my machine per iteration than if I simply constructed and
returned a table containing the captures.
- And finally, when I amortized the luaL_checkstring call by making it
static like in my 're2_countmatch' or eliminated it completely then the
benchmark runtime drops to ~1.417ms per iteration! So it seems like the
call to 'luaL_checkstring', at least when given very long string,
accounts for those couple 100ms that I'm trying to shave off!
I haven't tested this on another environment out of windows, like linux,
so I cannot say whether these observations will hold.
On another note, have you considered releasing your luare2 bindings to
the public? I'm also curious how 'luare2_captures' is defined?
'RE2::FindAndConsumeN' expects an array of RE2::Arg pointers. Is
'luare2_captures' a class that has an implicit conversion operator that
allows you to pass it in as if it was "RE2::Arg *"?