May 8, 2004 Here's how to use this code to find a N-bit MD5 collision. First, recompile with the appropriate definition of HASHBIT. There's been very little testing with non-byte-aligned values, so stick to multiples of 8. % rm *.o % make OPT=-DHASHBIT=48 It should compile with no warnings (or perhaps very few if you have a newer GCC). The code is currently hard-wired to use 16-bit DPs, and to store them to disk with no pre-truncation. This means that the on-disk database will have 2 leading zero bytes, and also that the largest value of HASHBIT that works is 96 (because the on-disk format stores only 12 bytes of hash). The next step is to run the search. You'll need to specify a path to the DB file for output; currently this does not grow very large so don't worry about free space (10 MB is plenty for HASHBIT=48, 100MB should suffice for HASHBIT=80). % ./md5dp local /tmp/db This is the long-running portion of the code, so you may wish to run it under screen(1) or with nice(1). If you want to use multiple machines to search, you can start a server on one machine with "./md5dp server /tmp/db [port]" and clients with "./md5dp client [server [port]]". However, note that it currently does not do enough buffering, so blocking for the network harms efficency. (Probably, running two clients for each CPU would go a long ways towards alleviating this problem.) The "local" search and the client use the following output: 27746238984 iter 25729.93 sec 1078364.48 iter/sec dc2b07c52e23 422720 This client has done 27 billion iterations in 25,700 seconds (7 hours), at a rate of just over 1 million ops/sec. It's found 420,000 distinguished points. After the database has accumulated enough DPs, open another shell on the host with the DB, and cd into the directory holding it. The code is structured to accumulate a certain number of DPs (64 MB worth, currently) and then process them. The code that does the processing currently does not work for interesting values of HASHBIT, so if your run is going to generate more than 64MB you'll have to go edit 'maxq' in store.c:datum(). Instead of the automatic processing, the following hack process works. First, copy the db.queue file to a temporary; then, run the sorting process on it; then run queries on the sorted DB. % cp db.queue a.q % ./md5dp pushq a a.q If your run has found any HASHBIT-sized collisions, the pushq will print out some messages like this: % ./md5dp pushq a a.q md5dp compiled for 56 bit (7 byte) hashes duplicate DPs 000000e8fdee540000000000 c21c2bd0 5807491 000000e8fdee540000000000 2b363438 2832737 If it doesn't print any "duplicate DPs" messages, there shouldn't be any; wait for a while and try again. (Or, if you run "checkcoll" and do find some, then that's a bug and you should let me know.) After the pushq completes, run the collision-finder on the sorted queue: % ./md5dp checkcoll a.q md5dp compiled for 56 bit (7 byte) hashes Found 112 colliding DPs [...] 263bd3ed 8cd257f7 4042801+37951 4970591+31682 e1=0000b908a860740000000000 p1=0000a8df713ebc0000000000 e2=0000b908a860740000000000 p2=000021fda143030000000000 collision at i=6342 z1=aa8078efa00e08000000000000000000 z2=8207714ec4975d000000000000000000 This means that the streamIDs with the collisions were 263bd3ed and 8cd257f7. For stream 263bd3ed, the last DP before the collision was at iteration 4,042,801 and the colliding DP was 37,951 iterations later. For stream 8cd257f7, the prior DP was at 4,970,591 with the collision 31,682 iterations later. The four DPs are printed (e1, p1, e2, p2) and then the code iterates p1, p1+1, p1+2, ... and p2, p2+1, p2+2, ... until the collision is found 6,342 iterations later. (That's counting from the DP of stream 8cd257f7, because it had a lower iteration count between the prior DP and the colliding DP.) Finally, the z1 and z2 values are the hashes which generated the collision. We can use hash2msg to generate the message and see for ourself: % ./md5dp hash2msg 8207714ec4975d000000000000000000 > m1 md5dp compiled for 48 bit (6 byte) hashes % ./md5dp hash2msg aa8078efa00e08000000000000000000 > m2 md5dp compiled for 48 bit (6 byte) hashes % md5sum m1 m2 f9b40a4962c2e3293862eaa34437436d m1 f9b40a4962c2e33b30b650d1b8aea4fe m2 Note that the hash values are the same for the first 14 hex characters (7 bytes); this example was done with HASHBIT=56. % diff -U10 m1 m2 --- m1 2004-05-08 08:54:26.000000000 -0500 +++ m2 2004-05-08 08:54:26.000000000 -0500 @@ -1,8 +1,8 @@ Orders from your Captain -String Jack Shaftoe up by his thumbs. +Pay Jack Shaftoe 100 pieces of eight.
Cap'n Hook - + Now if Cap'n Hook and his men use a pirated version of PGP that only signs 56 bits of the MD5 hash (they're not very good at math, you know!) Jack can convince the Cap'n to sign the "String him up" message, transfer the signature to the "100 pieces of eight" message, and make off with the dubloons.