lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Hello list,

This email presents a work in progress optimizer for LPeg that can
speed up matching by several orders of magnitude in some cases.

Since some time I have a use case where I need to match a large number
of rather simple patterns against input strings and I tried to use LPeg.

Current implementation tries each alternative until it fails and then
tries the next one, ... This is quite inefficient when there is many
alternatives. However, it never was the initial goal of LPeg, which is
pretty fast in most cases. So I started a series of patches to optimize
that particular use case.

These optimizations simplifies LPeg internal tree by automatically
merging and reordering patterns when possible. This brings a massive
speedup. In ideal case, the matching time should drop from O(r*n) to
O(n) where 'r' is the number of rules and 'n' the length of the text
(like with Aho-Corasick algorithm). In practice it's not quite the case

Currently patches are directly integrated in my fork of LPeg, but if
there is enough interest, I may be able to extract the AST optimizer in
a separate library (not possible for VM-level patches).

Optimizations are not enabled by default: an additional function
`lpeg.optimize(patt) => patt` has been added. It transforms the LPeg
tree using the above rules.

This is currently a Work in Progress, only tested under Linux with GCC.
But it already brings substantial speed improvements (see benchmarks
below). Any feedback is welcome :)

LPeg test suite still pass with success (except the stack overflow
test, because the optimizer transform the call into tail call !). Also,
optimizer should not change pattern semantics, if it does it's a bug.

I think that optimization could go even further because currently the
optimizer is pretty conservative and some details of LPeg are still a
bit obscure for me. Also the optimizer itself needs some optimization.

I'm currently writing a series of blog articles explaining the
implementation choices and results.

Benchmarks (see [1] for download link)
Test conditions:
 * AMD Phenom(tm) II X4 940 Processor
 * 6GB DDR2 RAM (800 Mhz)
 * Linux 3.18.5-1-ARCH
 * LPEG is compiled with GCC 4.9.2 and "-O3 -march=native" options
 * All data is read into Lua strings before starting the timer
 * Timing (in seconds) is taken from CLOCK_THREAD_CPUTIME

Match a subset of easylist.txt AdBlock rules against an history for
160,000 URLs:

Start processing 159146 URLs against 29261 rules.
START	optimized
DONE	optimized	2.524193886
matches	5202
START	vanilla
DONE	vanilla	5684.350017815
matches	5202

That's roughly 2200 times faster! The difference is really huge, I
need to look at it closer. Also, the LPeg pattern is not complete
(rules with 'domain=' modifiers are skipped, DOM selectors too because
it matches only URLs)

Another one is searching 10,000 random English words in a 5MB text file
(compilation of Shakespeare books)

building 10kwords ... done in	4.288736881
optimize 10kwords ... done in	23.494381189
compile original 10k ... done in	2.146324808
original 10k ... done in	66.789134742
matches	original 10k	66355
compile optimized 10k ... done in	0.013979084
optimized 10k ... done in	0.119470768
matches	optimized 10k	66355

560 times faster! But we also see that optimize the pattern was *very*
long, that's because the pattern reordering is currently very slow. The
compile time, on the other hand is munch faster because the pattern
tree is munch smaller.

I also tested some more usual LPeg grammars (Moonscript compiler, Lua
AST generator, ...) but as you expect, the performance win is limited to
negligible. I think it could still perform a bit better, but as I
said, optimizer is currently conservative.