lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On 3/22/2012 2:37 AM, Luiz Henrique de Figueiredo wrote:
Lua 5.2.1 (work1) is now available at
	http://www.lua.org/work/lua-5.2.1-work1.tar.gz
[snip]
Lua 5.2.1 introduces better handling of string collisions based on a
random seed. This work version is meant to let the community assess
the usefulness and the effectiveness of this experimental feature.
[snip]

A few more cheap data points comparing hex md5-like hash strings to see if files have changed given two datasets. Input data is output from md5deep -rl on linux-3.2 and linux-3.3 trees on Cygwin.

32 byte - MD5 hex strings
40 byte - SHA1 hex strings
64 byte - SHA256 hex strings

The main loop is repeated 10 times to artificially run more string-string compares. The non-scientific results are:

same files (timing in sec, lower of 2 runs)
=========================================
dataset ->      md5     sha1    sha256
--------------------------------------
lua-5.1.5       0.658   0.670   0.715
lua-5.2.0       0.624   0.635   0.692
lua-5.2.1wk1
 shortlen=16    0.621   0.651   0.709
 shortlen=32    0.605   0.639   0.688
 shortlen=48    0.594   0.589   0.673
 shortlen=64    0.596   0.593   0.653
 shortlen=128   0.614   0.622   0.683

It can be seen that lua-5.2.1wk1 is fastest for sha256 compares when the sha256 strings are interned with LUA_MAXSHORTLEN=64. lua-5.2.1wk1 is mostly slightly faster than lua-5.2.0 here probably because of the difference in processing loads for the initial lines of data from io.lines(). Of course, the string compares here are limited to strings of 32/40/64 bytes... If interning-at-first-compare is easy to implement, I'll try it and add a data point.

test-same-files.lua
===================

local io = require "io"
local string = require "string"
local setA = {}
for l in io.lines(arg[1]) do
  local hashA, fpathA = string.match(l, "^(%S+)%s+(.+)$")
  setA[fpathA] = hashA
end
local hashesB, fpathsB = {}, {}
for l in io.lines(arg[2]) do
  local h, fp = string.match(l, "^(%S+)%s+(.+)$")
  hashesB[#hashesB + 1] = h
  fpathsB[#fpathsB + 1] = fp
end
local identicaln, changedn = 0, 0
for i = 1, 10 do -- repeated 10 times
  local identical, changed = {}, {}
  for j = 1, #fpathsB do
    local hashB = hashesB[j]
    local fpathB = fpathsB[j]
    local hashA = setA[fpathB]
    if hashA then
      if hashA == hashB then
        identical[#identical + 1] = fpathB
      else
        changed[#changed + 1] = fpathB
      end
    end
  end
  identicaln = identicaln + #identical
  changedn = changedn + #changed
end
print("Files identical: "..identicaln)
print("Files changed:   "..changedn)

--
Cheers,
Kein-Hong Man (esq.)
Kuala Lumpur, Malaysia