lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Wed, Mar 20, 2013 at 5:40 PM, Doug Currie <doug.currie@gmail.com> wrote:

On Mar 20, 2013, at 10:56 AM, Chris Datfung <chris.datfung@gmail.com> wrote:

> I want to parse a string for all possible character level n-gram combinations, e.g. if the string is 'ab'  and n is 2, then the function should return 'aa', 'ab', 'ba' and 'bb'. While I can hard code two nested for loops to generate bi-grams (like above) I rather let n be dynamic so I can easily generate uni - penta grams on the fly.
>
> I'm not sure how to generate dynamic nested for loops in Lua, or for that matter if recursion is a better approach. Can someone give me some advice on the best way to do this?


Hi Doug,

Thank you, this is exactly what I'm looking for. Can I trouble you to explain how the function works as well?

Thanks,
Chris
 
> function ngram (n, str)
>>     local i = 0
>>     local m = str:len()
>>     local last = string.rep(str:sub(m),n)
>>     local done = false
>>     while not done do
>>         local accu = i
>>         local tgram = {}
>>         for j = 1, n do
>>             local k = (accu % m) + 1
>>             accu = math.floor(accu / m)
>>             table.insert(tgram,str:sub(k,k))
>>         end
>>         local gram = table.concat(tgram)
>>         done = (gram == last)
>>         print(gram)
>>         i = i + 1
>>     end
>> end
> ngram(2,"ab")
aa
ba
ab
bb
> ngram(3,"ab")
aaa
baa
aba
bba
aab
bab
abb
bbb
> ngram(2,"abc")
aa
ba
ca
ab
bb
cb
ac
bc
cc
>

e