lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

On Feb 6, 2018, at 1:46 PM, Sean Conner <> wrote:

> It was thus said that the Great albertmcchan once stated:
>> lpeg.P always match at beginning of string, maybe an efficient matching anywhere function ?
>  You do realize that all LPeg functions take an optional starting position.  

but i dont know the position ! (it could be anywhere)

>> Example:
>> keywords = P'while' + 'repeat' + 'for'
>> to search keywords anywhere efficiently, skip all bad head chars
>> skip = 1 - S"wrf"    -- set of skipped chars
>  Have you timed the above vs
>    skip = P(1) - keywords
>  -spc

Roberto wrote an article on this:

His article about searching words in the bible suggested skipping bad head
characters is 2x to 4x faster (similar to PCRE performance, see p25)

Using skip = 1 - keywords defeated the purpose of optimization

In other words, this is probably faster: P{ keywords + 1 * V(1) }

Both skip ideas had been benchmarked: page 27, figure 5
column 2: skip = P(1) - 'transparent'
column 3: skip = P(1) - 't'

skipping head chars is twice as fast (the benchmark is only for 1 word)
Multiple keywords search is going to be even faster.

I just thought this optimization stuff may be better done inside lpeg (in compiling phase)
Just wrap all these optimization in lpeg.M, and we can capture all keywords like this:

M( C(keywords) )^1