|
Thank you for the feedback. Yes in my actual code I am going to be using "luaL_checklstring". My "re_countmatch" is really just a throwaway function that I'm using to help isolate possible performance bottlenecks. I'm finding that trying to shave off those last couple 100ms is proving to be very difficult.You should be using luaL_checklstring() here. You can pass a pointer _and_ the length to the StringPiece constructor: size_t n; const char *p = luaL_checklstring(L, 2, &n); StringPiece subject(p, (int)n); Here's a snippet from the RE2 bindings I've written. These routines implement the gmatch iterator:...
Did you happen to do any performance benchmarking and tuning on your bindings to see how it compares with C/C++ usage or another language binding to re2?
I came to the conclusion that 'luaL_checkstring' and its variants like *_tostring, *_tolstring, etc. is the probable cause for that last 100ms of overhead for long strings after experimenting, testing and observing the following:
- The pyre2 binding's done with Cython. The conversion from a pystring into C-style string is done using 'PyObject_AsCharBuffer' from Python's C API which is almost analogous to lua's 'luaL_tolstring' functions. - In both cases, when benchmarking pattern match count, both lua code and python code has to make the same number of transitions into C and back again whenever they call "countmatch" or "findall" that performs the match operation. - I thought compiling the pattern might account for the difference. So I precompiled all the patterns that was going to be used outside the loop. Timing only the loop doing the match count with the precompiled re2 pattern actually made no measurable difference. - I noticed pyre2 wasn't always returning a unique object address when compiling a pattern. It looks like it's using some kind of memory pool scheme where it reuses space from dead deallocated pattern objects. I thought maybe creating a new re2 object for every pattern and hitting up 'lua_newuserdata' everytime might account for the difference. So I changed my 'luaRE2_newobject' to return the same userdata instead. Testing again, made no measurable difference. - I originally used the iterator approach like what you've presented in your code and benchmarked that. Here's a sample output of the benchmark. Each line has the re2 dna pattern to match followed by the number of matches found in the provided dna seq string from stdin.
agggtaaa|tttaccct 356 [cgt]gggtaaa|tttaccc[acg] 1250 a[act]ggtaaa|tttacc[agt]t 4252 ag[act]gtaaa|tttac[agt]ct 2894 agg[act]taaa|ttta[agt]cct 5435 aggg[acg]aaa|ttt[cgt]ccct 1537 agggt[cgt]aa|tt[acg]accct 1431 agggta[cgt]a|t[acg]taccct 1608 agggtaa[cgt]|[acg]ttaccct 2178Those numbers also meant how many times the code had to go from lua -> C and back again because 'luare2_nextmatch' had to be called that number of times before finally returning nil. I found this to be almost 400ms slower on my machine per iteration than if I simply constructed and returned a table containing the captures. - And finally, when I amortized the luaL_checkstring call by making it static like in my 're2_countmatch' or eliminated it completely then the benchmark runtime drops to ~1.417ms per iteration! So it seems like the call to 'luaL_checkstring', at least when given very long string, accounts for those couple 100ms that I'm trying to shave off!
I haven't tested this on another environment out of windows, like linux, so I cannot say whether these observations will hold.
On another note, have you considered releasing your luare2 bindings to the public? I'm also curious how 'luare2_captures' is defined? 'RE2::FindAndConsumeN' expects an array of RE2::Arg pointers. Is 'luare2_captures' a class that has an implicit conversion operator that allows you to pass it in as if it was "RE2::Arg *[]"?
Thanks