|
On Mar 20, 2013, at 10:56 AM, Chris Datfung <chris.datfung@gmail.com> wrote:
> I want to parse a string for all possible character level n-gram combinations, e.g. if the string is 'ab' and n is 2, then the function should return 'aa', 'ab', 'ba' and 'bb'. While I can hard code two nested for loops to generate bi-grams (like above) I rather let n be dynamic so I can easily generate uni - penta grams on the fly.
>
> I'm not sure how to generate dynamic nested for loops in Lua, or for that matter if recursion is a better approach. Can someone give me some advice on the best way to do this?
> function ngram (n, str)
>> local i = 0
>> local m = str:len()
>> local last = string.rep(str:sub(m),n)
>> local done = false
>> while not done do
>> local accu = i
>> local tgram = {}
>> for j = 1, n do
>> local k = (accu % m) + 1
>> accu = math.floor(accu / m)
>> table.insert(tgram,str:sub(k,k))
>> end
>> local gram = table.concat(tgram)
>> done = (gram == last)
>> print(gram)
>> i = i + 1
>> end
>> end
> ngram(2,"ab")
aa
ba
ab
bb
> ngram(3,"ab")
aaa
baa
aba
bba
aab
bab
abb
bbb
> ngram(2,"abc")
aa
ba
ca
ab
bb
cb
ac
bc
cc
>
e