[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: Proposal for lpeg
- From: albertmcchan <albertmcchan@...>
- Date: Tue, 6 Feb 2018 15:19:35 -0500
On Feb 6, 2018, at 1:46 PM, Sean Conner <sean@conman.org> wrote:
> It was thus said that the Great albertmcchan once stated:
>> lpeg.P always match at beginning of string, maybe an efficient matching anywhere function ?
>
> You do realize that all LPeg functions take an optional starting position.
but i dont know the position ! (it could be anywhere)
>> Example:
>> keywords = P'while' + 'repeat' + 'for'
>>
>> to search keywords anywhere efficiently, skip all bad head chars
>>
>> skip = 1 - S"wrf" -- set of skipped chars
>
> Have you timed the above vs
>
> skip = P(1) - keywords
>
> -spc
Roberto wrote an article on this: http://www.inf.puc-rio.br/~roberto/docs/peg.pdf
His article about searching words in the bible suggested skipping bad head
characters is 2x to 4x faster (similar to PCRE performance, see p25)
Using skip = 1 - keywords defeated the purpose of optimization
In other words, this is probably faster: P{ keywords + 1 * V(1) }
Both skip ideas had been benchmarked: page 27, figure 5
column 2: skip = P(1) - 'transparent'
column 3: skip = P(1) - 't'
skipping head chars is twice as fast (the benchmark is only for 1 word)
Multiple keywords search is going to be even faster.
I just thought this optimization stuff may be better done inside lpeg (in compiling phase)
Just wrap all these optimization in lpeg.M, and we can capture all keywords like this:
M( C(keywords) )^1