Hash collision

The desirable property of a One way hash algorithm is that for any given input, the output is unique. Given that the hash output is almost always smaller than the input, this property can't be achieved completely. When two sets of input data result in the same hash output, it is called a hash collision.

Hash collisions aren't always a problem. Some techniques, like hashcash, rely on finding collisions as a "proof of work". The problems arise when similar data sets collide. This is the basis of the MD5 collision paper presented at Crypto 2004. Further research has produced a method of creating quite similar documents that have the same MD5 hash value. When the two documents are, for example, slightly different versions of a contract, and the MD5 hash value is used to attest that the original document has not been altered, the hash collision problem becomes obvious.