lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On 3/22/2012 2:37 AM, Luiz Henrique de Figueiredo wrote:
Lua 5.2.1 (work1) is now available at
	http://www.lua.org/work/lua-5.2.1-work1.tar.gz
[snip]
Lua 5.2.1 introduces better handling of string collisions based on a
random seed. This work version is meant to let the community assess
the usefulness and the effectiveness of this experimental feature.

The complete diffs from Lua 5.2.0 to 5.2.1 are available at
	http://www.lua.org/work/diffs-lua-5.2.0-lua-5.2.1-work1.txt
[snip]

It doesn't seem any slower... Here are a few cheap data points :-)

Ran a few very simple scripts processing md5-like data. Input data is output from md5deep -rl on a linux-3.3 tree on Cygwin:

linux33-md5.dat	   2.39MB (32 byte MD5 hex strings)
linux33-sha1.dat   2.68MB (40 byte SHA1 hex strings)
linux33-sha256.dat 3.55MB (64 byte SHA256 hex strings)

Each file have 38,069 lines in the form of:
<hash>  <relative-path>

Tried two simple scripts:
(a) load data, parse, dump into two arrays
(b) load data, parse, use table lookup to find
    duplicate hash strings

The non-scientific results are:

load list (timing in sec, lower of 2 runs)
=========================================
dataset ->      md5     sha1    sha256
lua-5.1.5       0.259	0.258   0.286
lua-5.2.0       0.261   0.274   0.285
lua-5.2.1wk1
 shortlen=16    0.180   0.188   0.204
 shortlen=32    0.224   0.201   0.220
 shortlen=48    0.240   0.247   0.235
 shortlen=64    0.249   0.248   0.263
 shortlen=128   0.259   0.256   0.279

hash dupe (timing in sec, lower of 2 runs)
=========================================
dataset ->      md5     sha1    sha256
lua-5.1.5       0.236   0.247   0.265
lua-5.2.0       0.252   0.264   0.304
lua-5.2.1wk1
 shortlen=16    0.188   0.200   0.228
 shortlen=32    0.215   0.211   0.234
 shortlen=48    0.236   0.240   0.241
 shortlen=64    0.242   0.244   0.264
 shortlen=128   0.253   0.263   0.297

The results tend to approach 5.2.0 times with increasing LUA_MAXSHORTLEN, but isn't significantly slower.

System is an AMD Athlon 64 X2 5000+ (64 byte cache lines)
WinXP SP3 32-bit Cygwin gcc 4.5.3 lua "make generic"

test-load-list.lua
==================
local io = require "io"
local string = require "string"
local hash, fpath = {}, {}
for l in io.lines(arg[1]) do
  local h, fp = string.match(l, "^(%S+)%s+(.+)$")
  hash[#hash + 1] = h
  fpath[#fpath + 1] = fp
end
print("Items loaded: "..#fpath)

test-hash-dupe.lua
==================
local io = require "io"
local string = require "string"
local hash = {}
local dupe = 0
for l in io.lines(arg[1]) do
  local h, fp = string.match(l, "^(%S+)%s+(.+)$")
  if hash[h] then
    hash[h] = hash[h] + 1
    dupe = dupe + 1
  else
    hash[h] = 1
  end
end
print("Items loaded: "..#hash)
print("Hash duplicates: "..dupe)

A bigger dataset would be better, but at least in the above all disk I/O get cached in memory. Have not tried extreme string comparisons yet. Failed to dream up a short script that needs mind-boggling amounts of it...

--
Cheers,
Kein-Hong Man (esq.)
Kuala Lumpur, Malaysia