lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Feb 6, 2018, at 1:46 PM, Sean Conner <sean@conman.org> wrote:

> It was thus said that the Great albertmcchan once stated:
>> lpeg.P always match at beginning of string, maybe an efficient matching anywhere function ?
> 
>  You do realize that all LPeg functions take an optional starting position.  

but i dont know the position ! (it could be anywhere)

>> Example:
>> keywords = P'while' + 'repeat' + 'for'
>> 
>> to search keywords anywhere efficiently, skip all bad head chars
>> 
>> skip = 1 - S"wrf"    -- set of skipped chars
> 
>  Have you timed the above vs
> 
>    skip = P(1) - keywords
> 
>  -spc

Roberto wrote an article on this: http://www.inf.puc-rio.br/~roberto/docs/peg.pdf

His article about searching words in the bible suggested skipping bad head
characters is 2x to 4x faster (similar to PCRE performance, see p25)

Using skip = 1 - keywords defeated the purpose of optimization

In other words, this is probably faster: P{ keywords + 1 * V(1) }

Both skip ideas had been benchmarked: page 27, figure 5
column 2: skip = P(1) - 'transparent'
column 3: skip = P(1) - 't'

skipping head chars is twice as fast (the benchmark is only for 1 word)
Multiple keywords search is going to be even faster.

I just thought this optimization stuff may be better done inside lpeg (in compiling phase)
Just wrap all these optimization in lpeg.M, and we can capture all keywords like this:

M( C(keywords) )^1