lua-users home
lua-l archive

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

On the same vein:

Distributed Source Code Management (DSCM) systems such as git, Mercurial, Bzr and Monotone identify files, file trees and/or changesets using an SHA-1 hash as well. This makes comparing files for equality cheap, and the identifier doubles as a checksum.

BTW: it still astounds me sometimes how much easier branching and merging is in these systems compared to CVS and Subversion.

On May 10, 2007, at 2:02 AM, Dave Dodge wrote:

On Wed, May 09, 2007 at 04:19:17PM +0200, Philippe Lhoste wrote:
One (frequent?) use of MD5 is to compute a hash value for files and use 
it for fast comparison (duplicates, is this file changed?, and so on).
I suppose that for this use, it is still OK, at worse involving a binary 
comparison to be sure (for some uses).

Aside: The Plan9 OS has a filesystem called "Venti", based on the idea
that each block of data in a stored file can be indexed by its hash
value.  This allows it to store only one copy of the block's data, for
any number of files or copies of files that contain that data.  It's
intended for archival storage.  This design requires that every unique
block of data, in every file in the filesystem, has a unique hash
value.  A collision results in data loss.

    "Using the Sha1 hash function, the probability of a collision is
    less than 10^-20. Such a scenario seems sufficiently unlikely that
    we ignore it [...]"

                                                  -Dave Dodge

Gé Weijers