|
You can also do this recursively:
-- Generate n-gram fro symbols with specified lengthfunction ngram(symbols, length, part)part = part or ""if length == 0 then print(part); return endfor ix = 1, #symbols dongram(symbols, length - 1, part .. symbols:sub(ix,ix))endendngram("abc", 3)--TimOn Mar 20, 2013, at 8:49 AM, Chris Datfung <chris.datfung@gmail.com> wrote:On Wed, Mar 20, 2013 at 5:40 PM, Doug Currie <doug.currie@gmail.com> wrote:
On Mar 20, 2013, at 10:56 AM, Chris Datfung <chris.datfung@gmail.com> wrote:
> I want to parse a string for all possible character level n-gram combinations, e.g. if the string is 'ab' and n is 2, then the function should return 'aa', 'ab', 'ba' and 'bb'. While I can hard code two nested for loops to generate bi-grams (like above) I rather let n be dynamic so I can easily generate uni - penta grams on the fly.
>
> I'm not sure how to generate dynamic nested for loops in Lua, or for that matter if recursion is a better approach. Can someone give me some advice on the best way to do this?
Hi Doug,
Thank you, this is exactly what I'm looking for. Can I trouble you to explain how the function works as well?Thanks,Chris
> function ngram (n, str)
>> local i = 0
>> local m = str:len()
>> local last = string.rep(str:sub(m),n)
>> local done = false
>> while not done do
>> local accu = i
>> local tgram = {}
>> for j = 1, n do
>> local k = (accu % m) + 1
>> accu = math.floor(accu / m)
>> table.insert(tgram,str:sub(k,k))
>> end
>> local gram = table.concat(tgram)
>> done = (gram == last)
>> print(gram)
>> i = i + 1
>> end
>> end
> ngram(2,"ab")
aa
ba
ab
bb
> ngram(3,"ab")
aaa
baa
aba
bba
aab
bab
abb
bbb
> ngram(2,"abc")
aa
ba
ca
ab
bb
cb
ac
bc
cc
>
e