lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Wed, May 09, 2007 at 04:19:17PM +0200, Philippe Lhoste wrote:
> One (frequent?) use of MD5 is to compute a hash value for files and use 
> it for fast comparison (duplicates, is this file changed?, and so on).
> I suppose that for this use, it is still OK, at worse involving a binary 
> comparison to be sure (for some uses).

Aside: The Plan9 OS has a filesystem called "Venti", based on the idea
that each block of data in a stored file can be indexed by its hash
value.  This allows it to store only one copy of the block's data, for
any number of files or copies of files that contain that data.  It's
intended for archival storage.  This design requires that every unique
block of data, in every file in the filesystem, has a unique hash
value.  A collision results in data loss.

    "Using the Sha1 hash function, the probability of a collision is
    less than 10^-20. Such a scenario seems sufficiently unlikely that
    we ignore it [...]"

http://plan9.bell-labs.com/sys/doc/venti.html

                                                  -Dave Dodge