It turns out there's a significant flaw in the approach. Because both the medallion and hack numbers are structured in predictable patterns, it was trivial to run all possible iterations through the same MD5 algorithm and then compare the output to the data contained in the 20GB file. Software developer Vijay Pandurangan did just that, and in less than two hours he had completely de-anonymized all 173 million entries.
He says, "If a sequence of conventional mathematical operations isn't patentable, then no software should enjoy patent protection. For example, the 'data compression' patents that Justice Kennedy wants to preserve simply claim formulas for converting information from one digital format to another. If that's not a mathematical algorithm, nothing is. This is the fundamental confusion at the heart of America's software patent jurisprudence: many judges seem to believe that mathematical algorithms shouldn't be patented but that certain kinds of software should be patentable. ... If a patent claims a mathematical formula simple enough for a judge to understand how it works, she is likely to recognize that the patent claims a mathematical formula and invalidate it. But if the formula is too complex for her to understand, then she concludes that it's something more than a mathematical algorithm and uphold it."