• Subject: Re: [OT] Re: [ANN] MD5 1.0.2 Released (OT)
• From: Robert Raschke <rrlua@...>
• Date: Thu, 10 May 2007 16:12:34 +0100

```> For how big a filesystem? If you have enough blocks, the probability
> of a collision will be 1.

Hmm, quoting http://plan9.bell-labs.com/sys/doc/venti.pdf in a bit
more detail, says that for 1 exabyte stored in 8K blocks using a 160
bit Sha1 hash for a block, the collision probability is less than 10^-20 :

Are the 160 bit hash values generated by Sha1 large enough to
ensure the fingerprint of every block is unique?  Assuming
random hash values with a uniform distribution, a collection
of n different data blocks and a hash function that generates
b bits, the probability p that there will be one or more
collisions is bounded by the number of pairs of blocks
multiplied by the probability that a given pair will collide,
i.e.
p <= (n(n-1)/2)(1/(2^b))
Today, a large storage system may contain a petabyte (10^15
bytes) of data.  Consider an even larger system that contains
an exabyte (10^18 bytes) stored as 8 Kbyte blocks (~10^14
blocks).  Using the Sha1 hash function, the probability of a
collision is less than 10^-20.  Such a scenario seems
sufficiently unlikely that we ignore it and use the Sha1 hash
as a unique identifier for a block.  Obviously, as storage
technology advances, it may become feasible to store much more
than an exabyte, at which point it maybe necessary to move to
a larger hash function.  NIST has already proposed variants of
Sha1 that produce 256, 384, and 512 bit results.  For the
immediate future, however, Sha1 is a suitable choice for
generating the fingerprint of a block.

Robby

--
r dot raschke at tombob dot com

```