lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Hi - first post here!

I could really do with some advice on how to speed up a script I've written.

Basically, the script does a string.find on a large piece of text (500 words-ish - sometimes more) for occurrences of any word from a list of over 40,000. At the moment, I'm going through a table which contains the 40,000 "check words" and performing this string.find on each one.

Something similar to:
===========================
local #d = getTagValues(); --returns 40,000 entry table of string values
for i=1,#d do
m, document = checkForItem(d[i], document); --check to see if document contains word from table
if m == true then
if ll == "FULL" then
logln(os.date("%Y/%m/%d %H:%M:%S")..": Found item: "..d[i]);
end
end
end
===========================

This is taking anything up to 15 seconds or more. That sounds fairly quick, but this is actually a bottle neck in an extremely quick system. Is there a quicker way to do what I'm doing here? Ideally it would do all 40k checks in a second - at the mo I would be grateful for 3 seconds!

I realise that if there isn't then I should perhaps consider a different methodology altogether - which I am doing, but we've invested quite a lot in this script and it would be great if it could really perform.

Thanks for any help.

Nathan Trevivian
___________________________________________________________________________

The information contained in this message is for the intended addressee onlyand may contain confidential and/or privileged information. If you are not the intended addressee, please delete this message and notify the sender; do not copy or distribute this message or disclose its contents to anyone.

Any views or opinions expressed in this message are those of the author and do not necessarily represent those of GateWest New Media Ltd.
____________________________________________________________________________
I think you could do some preprocessing, as putting the words
in a different array according its first letter, and then checking only
the words which have the same initial letter. This should decrease the
number of checking/iterations.

Sérgio
Thanks, Sérgio.
Unfortunately, I need to check the document for occurrences of all words/phrases in the list.

Is there perhaps a more efficient loop I could be using?

Once again -thanks for your help. Much appreciated.