US District Court Judge Richard Jones's recent ruling in Johnson v. Microsoft has been much ridiculed for saying that IP addresses are not "personally identifiable information" (PII) because they identify computers, not individual users. Legions of critics have pointed out that this is like saying home addresses are not PII because they identify houses, not people. And it was pretty silly for Jones to say that "the only reasonable interpretation" of PII would be to exclude IP addresses from the definition — when, as the plaintiffs pointed out, Microsoft's own website defined PII to include IP addresses. (Microsoft has since removed from that definition from their online glossary and replaced with a link to their privacy statement.)
But the open secret in the privacy tech industry is that nobody knows exactly what "personally identifiable information" means anyway, and nobody cares, either. This is not because industry leaders don't care about privacy and security. They do. But being a good, privacy-conscious software architect has nothing to do with nit-picking the details of what counts as PII. If you're designing the new Hotmail, you should just know that passwords should be encrypted when users log in over the Web, that third parties should not be able to query the Hotmail database and harvest e-mail addresses, that users shouldn't be able to extract personal data such as birthdates that are associated with another user's e-mail address, etc. If you don't instinctively know those things already, then memorizing a definition for "PII" is not going to make you a good security-conscious programmer.
There are indeed published definitions of "PII" — the US Office of Management and Budget Memo 07-16 defines PII as:
"information which can be used to distinguish or trace an individual's identity, such as their name, social security number, biometric records, etc. alone, or when combined with other personal or identifying information which is linked or linkable to a specific individual, such as date and place of birth, mother's maiden name, etc."
But that doesn't pass the test of what makes a good definition, which is: If two different people read that definition, and then you gave them an example of a piece of data (such as the school that someone graduated from), would they usually be able to agree on whether that data counts as "PII?" How about IP addresses? From the written definition alone, there's no way to tell for sure.
I actually worked as a contractor at Microsoft at the onset of the PII craze, and in order to commence working on what would eventually become Windows Live, we all had to watch a streaming video about PII, what it was, how to secure it, etc. Near the beginning, the narrator gave some examples of PII, including e-mail addresses, and mentioned that PII should be encrypted when transmitted over the Internet. (I'm not violating any confidentiality; these standards were all publicly released later.) Full of first-week-on-the-job idealism, I looked up the narrator in the company directory and earnestly typed out an e-mail raising some points, such as: Doesn't Hotmail display your e-mail address over an unencrypted connection when you're signed in to Hotmail? And anyway, because the standard e-mail protocols always transmit To: and From: addresses unencrypted over the Internet, how would it ever be possible to "encrypt e-mail addresses in transit" anyway? Wouldn't it make more sense to specify that individual e-mail addresses can be transmitted in the clear one at a time, but if we're ever transferring a large number of them in bulk, it would be wise to encrypt the list, to reduce the chance of it falling into the hands of a spammer?
Then the video kept rolling, and making more statements that seemed to contradict earlier ones, or that were too vague to give me any idea of what I was actually supposed to do in a given situation, and eventually I got the point: We do care about privacy and security. But, there is no algorithm that can determine unambiguously what counts as "PII" or what you're supposed to do in order to safeguard it. You just have to use your common sense and ask around if you're not sure. The main point of the video is to reinforce how important this is, not to impart any actual information.
So Judge Jones could have picked from many possible definitions of "PII," and nobody would be able to call him "wrong," as long as the industry doesn't know what it means, either. What he was really trying to decide was whether Microsoft violated its promise "not to collect PII" during the Windows Update process, because the IP addresses of users doing the downloads were visible to Microsoft's servers. The plaintiffs made some other claims in Johnson v. Microsoft that I think have more merit (basically, arguing that the "Windows Genuine Advantage" anti-piracy tool should not have been foisted on users without their consent as part of the Windows Update process), but on this particular point, I think they were bound to lose on the claim that collecting IP addresses during a download was a privacy violation. After all, if the judge had ruled in their favor on this point, Microsoft would have had to discontinue Windows Update in order to comply with the ruling, and I don't think anybody wants that.
So, maybe Judge Jones just decided that he didn't want to be known as the judge who outlawed Windows security updates, so he determined in advance that he was going to rule that Microsoft did not violate users' privacy by collecting IP addresses during Windows Update. Then he worked backwards from there to find reasoning that supported this conclusion. That's not really how it's supposed to work, but at least he could have had good intentions.
Unfortunately, the reasoning that he hit on was the absurd argument that IP addresses are not PII because they identify computers, not the people who own them. Here's something that he could have said instead:
"I'm not counting IP addresses as PII, because in order to find out who was using an IP address at a particular time, you have to subpoena the ISP. That's what makes them different from names and home addresses, which can be matched to individual people without a subpoena. As long as Microsoft isn't subpoenaing ISPs to find out who was using a particular IP address, for all practical purposes they are not 'personally identifiable.'"
Judge Jones actually started out in that direction by quoting from another case, Klimas v. Comcast Cable Communications, Inc., where the court wrote, "We further note that IP addresses do not in and of themselves reveal 'a subscriber's name, address, [or] social security number.' That information can only be gleaned if a list of subscribers is matched up with a list of their individual IP addresses." And that list matching up subscribers with the IP addresses they were using at a given time, can only be obtained with a subpoena. Jones could have quit while he was ahead and stuck with that reasoning, and he would have avoided all the ridicule that came from his statement about IP addresses.
Or maybe Judge Jones could have just said,
"Look, you don't have a standard definition for PII anyway. You adapt it to each individual situation, in order to determine what privacy protections should be built into each program, by using your common sense. So that's what I'm doing to do in this situation too. And my common sense tells me that having IP addresses visible to Microsoft's servers during the Windows Update process, is not a privacy violation, because that's how downloads work."
That's as good a definition of PII as any. Now let's get back to the real work of stopping Russian porno spammers from pwning our machines in the first place.