[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
- Subject: Re: feedback on chunk
- From: David Rio <driodeiros@...>
- Date: Tue, 19 Apr 2011 14:50:08 -0500
On Tue, Apr 19, 2011 at 2:10 PM, Doug Currie <doug.currie@gmail.com> wrote:
>
> On Apr 19, 2011, at 11:58 AM, David Rio Deiros wrote:
>
>> I was wondering if I could get some feedback on the following chunk of
>> lua code.
>
> How big is pl?
Very big. It can have millions of keys.
> Can you
> (a) replace 'N' with '.' in the strings in pl
Yes, I could you use '.'
> (b) turn your loop inside out and use gmatch [1] over read
> ?
> Something like:
>
> local function slide_over_read(read, pl)
> for patt, tbl in pairs(pl) do
> for w in read.gmatch(patt) do
> -- do something with the matched substring w and table tbl
> end
> end
> end
pl is very big. Prior of running that chunk, a big file is hashed into
a table (pl). Then we iterate over another file and run the above
chunk per each line (read). That approach you are suggesting would
take more time to compute.
The input file that fills pl can be between 1 and 3 millions entries (keys).
The read file can be hundred of millions.
For my tests pl=1M and read file is 4M.
-drd
P.S: full tool's code: goo.gl/PhgGY