lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]




On 14/03/16 05:26 PM, Egor Skriptunoff wrote:
On Mon, Mar 14, 2016 at 10:02 PM, Roberto Ierusalimschy <roberto@inf.puc-rio.br <mailto:roberto@inf.puc-rio.br>> wrote:

    > After I've upgraded Lua to 5.3.2, one of my scripts terminates with
    > "pattern too complex" error message.
    >
    > Probably, this is because of gmatch using non-optimal pattern
    > (having quadratic time complexity), which may require up to 2 sec
    > to complete.
    >
    > Of course, it is possible to rewrite that script to make its time
    > complexity linear (at the cost of extra LOC and more complex logic
    > of code).
    >
    > But the are two reasons for NOT rewriting it:
    > 1) I don't want to spent my time on rewriting my old script
    > because I'm quite happy with its current performance (2-3 seconds
    > is OK for me).
    > 2) I don't want to bring extra complexity to the script.
    > As for now, it is one-liner regexp, and I'd like to stay it
    > as simple as it is.

    Can you show your regexp/subject?


I have a code similar to this one:

local pattern =
'id="post(%d+)".-class="Post Header".-<h2>(.-)</h2>.-(/forum/post%1%.htm#details)'
for id, title, link in main_forum_page:gmatch(pattern) do
  analyze_post(id, title, link)
end

Once in a while a post does not have a "View details" link (that is, third capture does not match). In such rare cases non-linear behavior is observed due to chain of four ".-" in the pattern. I prefer waiting in runtime for 2 seconds to losing simplicity of the code.

So this is a variant to the parsing HTML with regex problem...

--
Disclaimer: these emails may be made public at any given time, with or without reason. If you don't agree with this, DO NOT REPLY.