lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


I recently needed the Levenshtein algorithm in Lua:

function levenshtein(s, t)
  local s, t = tostring(s), tostring(t)
  if type(s) == 'string' and type(t) == 'string' then
    local m, n, d = #s, #t, {}
    for i = 0, m do d[i] = { [0] = i } end
    for j = 1, n do d[0][j] = j end
    for i = 1, m do
      for j = 1, n do
        local cost = s:sub(i,i) == t:sub(j,j) and 0 or 1
        d[i][j] = math.min(d[i-1][j]+1, d[i][j-1]+1, d[i-1][j-1]+cost)
      end
    end
    return d[m][n]
  end
end

Very inefficient but it does the job:

  Lua 5.1.2  Copyright (C) 1994-2007 Lua.org, PUC-Rio
  > = levenshtein('referrer', 'referrer') -- zero distance
  0
  > = levenshtein('referrer', 'referer') -- distance of one character
  1
  > = levenshtein('random', 'strings') -- random big distance
  6

Petite Abeille wrote:

On Jan 4, 2008, at 5:36 PM, Aaron Brown wrote:

Jeff Wise wrote:

Is there a "fuzzy search" function/algorithm available
which would show these strings as "close to being equal"?

There's a technique called stemming that does at least some
of what you want:

http://en.wikipedia.org/wiki/Stemming

Another one of interest:

http://en.wikipedia.org/wiki/Levenshtein_distance