Hamming Numbers

lua-users home
wiki

Showing revision 2
Hamming Numbers are numbers whose only prime factors are 2, 3 and 5. They are named after [Richard Hamming] but became famous (or notorious) after [Edsger Dijkstra] posed the question of how to efficiently enumerate them in numeric order.

The problem is frequently trotted out to explain why "x Programming" is better (particularly for the values of x: functional and logic). An O(n) solution (where n is the size of the sequence of numbers) is possible if you ignore the cost of manipulating large numbers. Buried deep in an interesting [thread] on [Lambda the Ultimate] is an analysis which indicates that the cost might be O(n^(4/3)).

The Haskell solution to this problem is delightfully brief:

-- Merges two infinite lists
merge :: (Ord a) => [a] -> [a] -> [a]
merge (x:xs)(y:ys)
  | x == y    = x : merge xs ys
  | x <  y    = x : merge xs (y:ys)
  | otherwise = y : merge (x:xs) ys

-- Lazily produce the hamming sequence
hamming :: [Integer]
hamming 
  = 1 : merge (map (2*) hamming) (merge (map (3*) hamming) (map (5*) hamming))

although even shorter ones exist. The essence of the above solution is that Haskell evaluates lazily, so the infinite sequence hamming can be self-referential. That's possible to do in Lua, using a sort of promise. The code below implements just enough of a bignum implementation to be able to do the multiplications by 2, 3 and 5, and to print out the results. (The magic number 7305 results from the fact that Hamming number 7305 is 2^52; actually, a slightly larger limit could have been chosen. In practice, the bignum interface only adds about 33% to the cost of the computation so it could have been written as a pure bignum program; the interesting thing is that all of the code to implement Hamming is completely oblivious to the actual implementation of the numbers, provided that the implementation supports <, ==, and multiply-by-a-small-integer.

I checked the performance of the program by computing hamming(k) for values of k from 50,000 to 1,000,000; the performance does appear to be roughly O(n) in both space and time, although I made no attempt to optimise and the space consumption is somewhat porcine.

                             usec    RSS
       n     sec.       Mb     /n   kb/n   value
 -------     ----    -----   ----   ----   --------------------------------------------------------------------
   50000     2.71     3472   54.2   69.4   2379126835648701693226200858624
  100000     5.80     5048   58.0   50.5   290142196707511001929482240000000000000
  150000     9.05     6060   60.3   40.4   136551478823710021067381144334863695872000000
  200000    12.41     7176   62.0   35.9   4479571262811807241115438439905203543080960000000
  250000    15.71     8092   62.8   32.4   29168801439715713801701411811093381120000000000000000
  300000    19.09     8856   63.6   29.5   62864273786544799804687500000000000000000000000000000000
  350000    22.38     9608   63.9   27.5   60133943158606419031489628585123616917982648729600000000000
  400000    26.01    13276   65.0   33.2   30774090693237851027531250000000000000000000000000000000000000
  450000    29.66    14356   65.9   31.9   9544028161703913537712243143807801346335324481500000000000000000
  500000    33.49    14952   67.0   29.9   1962938367679548095642112423564462631020433036610484123229980468750
  550000    37.25    15824   67.7   28.8   286188649932207038880389529600000000000000000000000000000000000000000
  600000    41.06    16976   68.4   28.3   31118367413797724237140050340541118674764220486395061016082763671875000
  650000    44.69    17620   68.8   27.1   2624748786803793305341077210881955201024000000000000000000000000000000000
  700000    48.55    18508   69.4   26.4   177288702931066945536000000000000000000000000000000000000000000000000000000
  750000    52.38    19696   69.8   26.3   9844116074857088479400896102109753641824661421606444941755765750304761446400
  800000    56.15    20196   70.2   25.2   458936503790258814279745536000000000000000000000000000000000000000000000000000
  850000    60.03    20620   70.6   24.3   18286806033409508387421738007105886936187744140625000000000000000000000000000000
  900000    63.88    21772   71.0   24.2   632306706993546185913146473467350006103515625000000000000000000000000000000000000
  950000    68.05    22756   71.6   24.0   19221158232427643481897048859403926759149694825267200000000000000000000000000000000
 1000000    73.27    23084   73.3   23.1   519312780448388736089589843750000000000000000000000000000000000000000000000000000000

So here's the code:

local function Seq(t, func)
  local meta = {}
  function meta:__index(k)
    if k == "tail" then
      local rv = func()
      self.tail = rv
      setmetatable(self, nil)
      return rv
    end
  end
  return setmetatable(t, meta)
end

local function SMap(seq, func)
  return Seq({head = func(seq.head)},
             function() return SMap(seq.tail, func) end)
end

local function SMerge(seq1, seq2)
  local head1, head2 = seq1.head, seq2.head
  local step
  if head1 < head2 then
    function step() return SMerge(seq1.tail, seq2) end
  elseif head2 < head1 then
    function step() return SMerge(seq1, seq2.tail) end
    head1 = head2
  else
    function step() return SMerge(seq1.tail, seq2.tail) end
  end
  return Seq({head = head1}, step)
end

local function times(k)
  return function(x) return x * k end
end

function Hamming(seq)        
  local seq2, seq3, seq5 =
        SMap(seq, times(2)), SMap(seq, times(3)), SMap(seq, times(5))  
  local function step()
    return SMerge(seq2, SMerge(seq3, seq5))
  end
  return Seq(seq, function() return SMerge(seq2, SMerge(seq3, seq5)) end)
end

function hamming(k)
  local start = 1
  if k > 7305 then start = Bignum(start) end
  local h = Hamming({head = start})
  for i = 2, k do h = h.tail end
  return h.head
end

and here's the simple Bignum implementation (define this first if you want to test the above code):

do
  -- very limited bignum stuff; we only implement multiplication
  -- (by less than the base) and comparison.
  local base = 1e15
  local fmt = "%015.0f"
  local meta = {}
  function meta:__lt(other)
    if self.n ~= other.n then return self.n < other.n end
    for i = 1, self.n do
      if self[i] ~= other[i] then return self[i] < other[i] end
    end
  end
  function meta:__eq(other)
    if self.n == other.n then
      for i = 1, self.n do
        if self[i] ~= other[i] then return false end
      end
      return true
    end
  end
  function meta:__mul(k)
    -- If the base where a multiple of all possible multipliers, then
    -- we could figure out the length of the result directly from the
    -- first "digit". On the other hand, printing the numbers would be
    -- difficult. So we accept the occasional overflow.
    local offset = 0
    if self[1] * k >= base then offset = 1 end
    local carry = 0
    local result = {}
    local n = self.n
    for i = n, 1, -1 do
      local tmp = self[i] * k + carry
      local digit = tmp % base
      carry = (tmp - digit) / base
      result[offset + i] = digit
    end
    if carry ~= 0 then
      n = n + 1
      if offset == 0 then
        for i = n, 2, -1 do
          result[i] = result[i - 1]
        end
      end
      result[1] = carry
    end
    result.n = n
    return setmetatable(result, meta)
  end
  function meta:__tostring()
    local tmp = {}
    tmp[1] = ("%.0f"):format(self[1])
    for i = 2, self.n do
      tmp[i] = fmt:format(self[i])
    end
    return table.concat(tmp)
  end
  function Bignum(k)
    return setmetatable({k, n = 1}, meta)
  end
end

For reference, the time/space chart was created (except for the heading) with the following (which will probably only work on FreeBSD or similar):

for ((i=50000; $i<=1000000; i=$i+50000)); \
  do /usr/bin/time -apl -o hamming.log ../src/lua \
     -e "dofile 'hamming.lua'; print('result', $i, hamming($i))" >> hamming.log; \
done
egrep '(result|user|maximum)' hamming.log | \
  lua -e 'local a = io.read"*a"; \
          for n, res, time, size in \
               string.gfind(a, "(%d+)%s+(%d+)%D+([%d.]+)%s+(%d+)") do \
            print(string.format("%8i %8.2f %8i %6.1f %6.1f   %s", \
                     n, time, size, (time*1e6)/n, (size*1e3)/n, res)) \
          end' > hamming.res

RecentChanges · preferences
edit · history · current revision
Edited September 27, 2005 12:15 am GMT (diff)