lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


On Thu, 10 May 2007 02:02:04 -0400, Dave Dodge <dododge@dododge.net>
wrote:

>On Wed, May 09, 2007 at 04:19:17PM +0200, Philippe Lhoste wrote:
>> One (frequent?) use of MD5 is to compute a hash value for files and use 
>> it for fast comparison (duplicates, is this file changed?, and so on).
>> I suppose that for this use, it is still OK, at worse involving a binary 
>> comparison to be sure (for some uses).
>
>Aside: The Plan9 OS has a filesystem called "Venti", based on the idea
>that each block of data in a stored file can be indexed by its hash
>value.  This allows it to store only one copy of the block's data, for
>any number of files or copies of files that contain that data.  It's
>intended for archival storage.  This design requires that every unique
>block of data, in every file in the filesystem, has a unique hash
>value.  A collision results in data loss.
>
>    "Using the Sha1 hash function, the probability of a collision is
>    less than 10^-20. Such a scenario seems sufficiently unlikely that
>    we ignore it [...]"
>
>http://plan9.bell-labs.com/sys/doc/venti.html

For how big a filesystem? If you have enough blocks, the probability
of a collision will be 1.

Steve