lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


Thanks for the reply Liam,

On Tue, Apr 19, 2011 at 1:37 PM, liam mail <liam.list@googlemail.com> wrote:
>
>> My next step is to implement slide_over_read in C, but prior to that I
>> thought you guys
>> may have some ideas on how to improve it.
>>
>> That chunk will be called million of times, that's why I want to squeeze
>> as many cpu
>> cycles as possible.
>>
>> -drd
>>
> Are you using LuaJit or have you tried using it? Have you profiled the code?

Both. In production I will use LuaJit to speed up.
Yes, I did profiling.
No surprisses, these three statements are the ones taking most of the
cpu cycles:

 8     sub_read = read:sub(i, ps+i-1)
 9     nt_value = sub_read:sub(fs+1, fs+1)
10    sub_read = sub_read:sub(1, fs) .. "N" .. sub_read:sub(fs+2)


> On 19 April 2011 16:58, David Rio Deiros <driodeiros@gmail.com> wrote:
>>  while ps + i <= #read+1 do -- while the window is within the size of the
>> read
>>    sub_read = read:sub(i, ps+i-1)
>
> I do not understand why you do not just set ps to 8 and then increment ps
> later, instead of the addition in the while and the in the string.sub .

The idea is to, given a string (read), slice it, and per each slice,
replace the
middle character with N and then check in the ps table to see if the key
exists.

Notice that most of the time you are not going to find any hit. so the internal
conditional statements do not get executed very often.

That's why I think I have to focus on the first three statements between the
while and the conditional.

> I
> also do not understand why you add one to the size of read and then check
> for less or equal, why not just drop the addition and check for less than?

You are right, I'll try it out.

>>    nt_value = sub_read:sub(fs+1, fs+1)
>>    sub_read = sub_read:sub(1, fs) .. "N" .. sub_read:sub(fs+2)
>
>
> Shouldn't both nt_value and sub_read be locals? Is string.format quicker?
> Does concatenation create more temporary strings?

Good point, I'll try string.format also.

>>    print("Trying: " .. sub_read)
>
> This line I would almost certainly remove if it is being called as much as
> you say, io is slow.

yes, it is not there in production.

>>    if pl[sub_read] then -- We have a probe with that sequence
>>      if not pl[sub_read].hits then -- No previous hits
>>        pl[sub_read].hits = {A=0, C=0, G=0, T=0, N=0}
>>      end
>>      pl[sub_read].hits[nt_value] = pl[sub_read].hits[nt_value] + 1
>>      n_hits = n_hits + 1
>>    end
>
> Maybe cache pl[sub_read] as a local?

I'll try also. Thanks.

-drd