Hi - first post here!
I could really do with some advice on how to speed up a script I've
written.
Basically, the script does a string.find on a large piece of text
(500 words-ish - sometimes more) for occurrences of any word from a
list of over 40,000.
At the moment, I'm going through a table which contains the 40,000
"check words" and performing this string.find on each one.
Something similar to:
===========================
local #d = getTagValues(); --returns 40,000 entry table of string
values
for i=1,#d do
m, document = checkForItem(d[i], document); --check to see if
document contains word from table
if m == true then
if ll == "FULL" then
logln(os.date("%Y/%m/%d %H:%M:%S")..": Found item: "..d[i]);
end
end
end
===========================
This is taking anything up to 15 seconds or more. That sounds
fairly quick, but this is actually a bottle neck in an extremely
quick system.
Is there a quicker way to do what I'm doing here? Ideally it would
do all 40k checks in a second - at the mo I would be grateful for 3
seconds!
I realise that if there isn't then I should perhaps consider a
different methodology altogether - which I am doing, but we've
invested quite a lot in this script and it would be great if it
could really perform.
Thanks for any help.
Nathan Trevivian
___________________________________________________________________________
The information contained in this message is for the intended
addressee onlyand may contain confidential and/or privileged
information. If you are not the intended addressee, please delete
this message and notify the sender; do not copy or distribute this
message or disclose its contents to anyone.
Any views or opinions expressed in this message are those of the
author and do not necessarily represent those of GateWest New Media
Ltd.
____________________________________________________________________________
I think you could do some preprocessing, as putting the words
in a different array according its first letter, and then checking
only
the words which have the same initial letter. This should decrease the
number of checking/iterations.
Sérgio