lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]


> For how big a filesystem? If you have enough blocks, the probability
> of a collision will be 1.

Hmm, quoting http://plan9.bell-labs.com/sys/doc/venti.pdf in a bit
more detail, says that for 1 exabyte stored in 8K blocks using a 160
bit Sha1 hash for a block, the collision probability is less than 10^-20 :

	Are the 160 bit hash values generated by Sha1 large enough to
	ensure the fingerprint of every block is unique?  Assuming
	random hash values with a uniform distribution, a collection
	of n different data blocks and a hash function that generates
	b bits, the probability p that there will be one or more
	collisions is bounded by the number of pairs of blocks
	multiplied by the probability that a given pair will collide,
	i.e.
		p <= (n(n-1)/2)(1/(2^b))
	Today, a large storage system may contain a petabyte (10^15
	bytes) of data.  Consider an even larger system that contains
	an exabyte (10^18 bytes) stored as 8 Kbyte blocks (~10^14
	blocks).  Using the Sha1 hash function, the probability of a
	collision is less than 10^-20.  Such a scenario seems
	sufficiently unlikely that we ignore it and use the Sha1 hash
	as a unique identifier for a block.  Obviously, as storage
	technology advances, it may become feasible to store much more
	than an exabyte, at which point it maybe necessary to move to
	a larger hash function.  NIST has already proposed variants of
	Sha1 that produce 256, 384, and 512 bit results.  For the
	immediate future, however, Sha1 is a suitable choice for
	generating the fingerprint of a block.


Robby

--
r dot raschke at tombob dot com