[Date Prev][Date Next][Thread Prev][Thread Next]
[Date Index]
[Thread Index]
**Subject**: **Re: [OT] Re: [ANN] MD5 1.0.2 Released (OT)**
**From**: Robert Raschke <rrlua@...>
**Date**: Thu, 10 May 2007 16:12:34 +0100

> For how big a filesystem? If you have enough blocks, the probability
> of a collision will be 1.
Hmm, quoting http://plan9.bell-labs.com/sys/doc/venti.pdf in a bit
more detail, says that for 1 exabyte stored in 8K blocks using a 160
bit Sha1 hash for a block, the collision probability is less than 10^-20 :
Are the 160 bit hash values generated by Sha1 large enough to
ensure the fingerprint of every block is unique? Assuming
random hash values with a uniform distribution, a collection
of n different data blocks and a hash function that generates
b bits, the probability p that there will be one or more
collisions is bounded by the number of pairs of blocks
multiplied by the probability that a given pair will collide,
i.e.
p <= (n(n-1)/2)(1/(2^b))
Today, a large storage system may contain a petabyte (10^15
bytes) of data. Consider an even larger system that contains
an exabyte (10^18 bytes) stored as 8 Kbyte blocks (~10^14
blocks). Using the Sha1 hash function, the probability of a
collision is less than 10^-20. Such a scenario seems
sufficiently unlikely that we ignore it and use the Sha1 hash
as a unique identifier for a block. Obviously, as storage
technology advances, it may become feasible to store much more
than an exabyte, at which point it maybe necessary to move to
a larger hash function. NIST has already proposed variants of
Sha1 that produce 256, 384, and 512 bit results. For the
immediate future, however, Sha1 is a suitable choice for
generating the fingerprint of a block.
Robby
--
r dot raschke at tombob dot com