P.I.I. In the Sky 222
US District Court Judge Richard Jones's recent ruling in Johnson v. Microsoft has been much ridiculed for saying that IP addresses are not "personally identifiable information" (PII) because they identify computers, not individual users. Legions of critics have pointed out that this is like saying home addresses are not PII because they identify houses, not people. And it was pretty silly for Jones to say that "the only reasonable interpretation" of PII would be to exclude IP addresses from the definition — when, as the plaintiffs pointed out, Microsoft's own website defined PII to include IP addresses. (Microsoft has since removed from that definition from their online glossary and replaced with a link to their privacy statement.)
But the open secret in the privacy tech industry is that nobody knows exactly what "personally identifiable information" means anyway, and nobody cares, either. This is not because industry leaders don't care about privacy and security. They do. But being a good, privacy-conscious software architect has nothing to do with nit-picking the details of what counts as PII. If you're designing the new Hotmail, you should just know that passwords should be encrypted when users log in over the Web, that third parties should not be able to query the Hotmail database and harvest e-mail addresses, that users shouldn't be able to extract personal data such as birthdates that are associated with another user's e-mail address, etc. If you don't instinctively know those things already, then memorizing a definition for "PII" is not going to make you a good security-conscious programmer.
Conversely, the major security threats facing Windows users — malware infection through security holes in Windows and Internet Explorer — have nothing to do with the definition of PII or the finer points of Microsoft's privacy policy. There may even be public relations gurus at Microsoft who are glad to see the "IP addresses as PII" controversy in the headlines, if that relatively minor privacy issue distracts the public from the vastly more serious threats posed browser security holes.
There are indeed published definitions of "PII" — the US Office of Management and Budget Memo 07-16 defines PII as:
"information which can be used to distinguish or trace an individual's identity, such as their name, social security number, biometric records, etc. alone, or when combined with other personal or identifying information which is linked or linkable to a specific individual, such as date and place of birth, mother's maiden name, etc."
But that doesn't pass the test of what makes a good definition, which is: If two different people read that definition, and then you gave them an example of a piece of data (such as the school that someone graduated from), would they usually be able to agree on whether that data counts as "PII?" How about IP addresses? From the written definition alone, there's no way to tell for sure.
I actually worked as a contractor at Microsoft at the onset of the PII craze, and in order to commence working on what would eventually become Windows Live, we all had to watch a streaming video about PII, what it was, how to secure it, etc. Near the beginning, the narrator gave some examples of PII, including e-mail addresses, and mentioned that PII should be encrypted when transmitted over the Internet. (I'm not violating any confidentiality; these standards were all publicly released later.) Full of first-week-on-the-job idealism, I looked up the narrator in the company directory and earnestly typed out an e-mail raising some points, such as: Doesn't Hotmail display your e-mail address over an unencrypted connection when you're signed in to Hotmail? And anyway, because the standard e-mail protocols always transmit To: and From: addresses unencrypted over the Internet, how would it ever be possible to "encrypt e-mail addresses in transit" anyway? Wouldn't it make more sense to specify that individual e-mail addresses can be transmitted in the clear one at a time, but if we're ever transferring a large number of them in bulk, it would be wise to encrypt the list, to reduce the chance of it falling into the hands of a spammer?
Then the video kept rolling, and making more statements that seemed to contradict earlier ones, or that were too vague to give me any idea of what I was actually supposed to do in a given situation, and eventually I got the point: We do care about privacy and security. But, there is no algorithm that can determine unambiguously what counts as "PII" or what you're supposed to do in order to safeguard it. You just have to use your common sense and ask around if you're not sure. The main point of the video is to reinforce how important this is, not to impart any actual information.
So Judge Jones could have picked from many possible definitions of "PII," and nobody would be able to call him "wrong," as long as the industry doesn't know what it means, either. What he was really trying to decide was whether Microsoft violated its promise "not to collect PII" during the Windows Update process, because the IP addresses of users doing the downloads were visible to Microsoft's servers. The plaintiffs made some other claims in Johnson v. Microsoft that I think have more merit (basically, arguing that the "Windows Genuine Advantage" anti-piracy tool should not have been foisted on users without their consent as part of the Windows Update process), but on this particular point, I think they were bound to lose on the claim that collecting IP addresses during a download was a privacy violation. After all, if the judge had ruled in their favor on this point, Microsoft would have had to discontinue Windows Update in order to comply with the ruling, and I don't think anybody wants that.
So, maybe Judge Jones just decided that he didn't want to be known as the judge who outlawed Windows security updates, so he determined in advance that he was going to rule that Microsoft did not violate users' privacy by collecting IP addresses during Windows Update. Then he worked backwards from there to find reasoning that supported this conclusion. That's not really how it's supposed to work, but at least he could have had good intentions.
Unfortunately, the reasoning that he hit on was the absurd argument that IP addresses are not PII because they identify computers, not the people who own them. Here's something that he could have said instead:
"I'm not counting IP addresses as PII, because in order to find out who was using an IP address at a particular time, you have to subpoena the ISP. That's what makes them different from names and home addresses, which can be matched to individual people without a subpoena. As long as Microsoft isn't subpoenaing ISPs to find out who was using a particular IP address, for all practical purposes they are not 'personally identifiable.'"
Judge Jones actually started out in that direction by quoting from another case, Klimas v. Comcast Cable Communications, Inc., where the court wrote, "We further note that IP addresses do not in and of themselves reveal 'a subscriber's name, address, [or] social security number.' That information can only be gleaned if a list of subscribers is matched up with a list of their individual IP addresses." And that list matching up subscribers with the IP addresses they were using at a given time, can only be obtained with a subpoena. Jones could have quit while he was ahead and stuck with that reasoning, and he would have avoided all the ridicule that came from his statement about IP addresses.
Or maybe Judge Jones could have just said,
"Look, you don't have a standard definition for PII anyway. You adapt it to each individual situation, in order to determine what privacy protections should be built into each program, by using your common sense. So that's what I'm doing to do in this situation too. And my common sense tells me that having IP addresses visible to Microsoft's servers during the Windows Update process, is not a privacy violation, because that's how downloads work."
That's as good a definition of PII as any. Now let's get back to the real work of stopping Russian porno spammers from pwning our machines in the first place.
Absurd? (Score:5, Insightful)
I would disagree with the premise. (Score:5, Insightful)
"A judge rules that IP addresses are not 'personally identifiable information' (PII) because they identify computers, not people. That's absurd,
I think that is not absurd. IP's could be utterly random, changed by anything... there's no process or standard or central authority or anything that guarantees that its even your computer. In order for you to have a computer identifer that is legally bound to you, you have to go through a quasi government process that has
a) the applicant providing proof of identification
b) the register validating that identification and issuing the ip to the person...
c) payment or proof of payments to associate the identification with the applicant.
d) finally, the ip should remain the property of the applicant, but, the government should track transfers.
If you did all that, then, yes, you might say the ip belongs to a person, because that's the only process that can eliminate reasonable doubt.
Re:Postal addresses identify houses!I (Score:2, Insightful)
Sure, as soon as his home address and car license plates randomly shuffle while requiring an ISP to give you the rest of the information about the location.
Then you can go and post the information.
Absurd? Are you taking the piss? (Score:5, Insightful)
Seriously, the IP address of a computer in your public library, or a school, or in a house with more than one person, how is that personally identifiable information? Talk about absurd...
Re:Postal addresses identify houses!I (Score:5, Insightful)
Obligatory car analogy (Score:5, Insightful)
I think the judge is correct. If your car was leaving a crime scene, and the license plate were noted, your defense attorney would correctly note that someone else could have been driving the car. If your IP address is noted doing something nefarious, your lawyer would again correctly note that someone else could have been using the computer. That indicates that the information isn't uniquely identifying.
PII isusually the information that uniquely identifies a person. Name, SSN, and birthdate are the holy trinity of PII, with account numbers for a business close behind. The data security droids usually lump in address and phone, but I think that's an error in reasoning because of the above observation. I think they could correctly be described as sensitive, and certainly businesses and developers should treat them as such. But I don't think addresses and phone numbers are deserving of the protection that your name, birthdate and SSN get, because you can't go open a checking account in my name just by knowing my address.
Re:Absurd? Are you taking the piss? (Score:2, Insightful)
Sounds like a fantastic precedent to me. The only thing the RIAA has to identify the people they sue are IP addresses. The judge said IP addresses cannot be used to identify people. You can't sue a computer. This is a wookie. Case closed.
Misunderstanding the issue (Score:5, Insightful)
Re:I would disagree with the premise. (Score:4, Insightful)
Reasonable doubt may be the standard in American criminal cases, but it is not the standard in a civil case such as the one being discussed here. In an American civil case the usual standard is preponderance of the evidence, which is a 'more likely than not' or '50% + 1' standard.
Thus, for households in which the computer is used primarily by one adult, an IP address is personally identifiable in that knowing the IP address (in conjunction with information from the ISP) makes it more likely than not that the adult in question was using the computer at the time the transaction with that IP was logged. The problem is that many households have multiple computers and/or multiple users and information from the ISP is necessary to tie an IP to an individual or household. So Microsoft, which had only the IP addresses, did not have personally identifiable information.
Re:Postal addresses identify houses!I (Score:2, Insightful)
So, even with dynamic IPs, if you know the time and date when an activity took place, you can effectively tell who was responsible given the IP and the cooperation of the ISP, neither of which is particularly difficult to get.
Re:I would disagree with the premise. (Score:1, Insightful)
Even with the process you outline it would be difficult to prove that the IP address belonged to an individual since spoofing an IP address can be a relatively simple proposition.
Additionally, I would argue that in most cases IP addresses do not even identify computers but rather access points for computers. In my home we have a minimum of 4 computers that are running on a regular basis all using a single IP (as far as the outside world is concerned at least). Add to that the number of insecure and/or improperly configured wireless access points that are available I don't know how you could even begin to assume that an IP address is PII.
Re:I would disagree with the premise. (Score:4, Insightful)
Untrue. You identify an internet device. Just because an IP was used to perpetrate an act, you can never use that information to link to a person. Anyone can be sitting at a keyboard, or using a smart phone, or tapping an ipod, not just the "owner" of the device.
If my computer's IP was used to steal personal information in a phishing scam, not even mentioning that the computer could be doing this unbeknownst to me while I'm sitting here, anyone else who has physical access to my home, legally or otherwise, could be using this computer at any time.
Technical Reasoning vs Legal Reasoning (Score:4, Insightful)
The author suggests this:
There are several problems with this. First, reliance on common sense and deference to the individual situation creates uncertainty, which in turn invites litigation. Such non-rules create problem spaces that can only be mapped through large amounts of expensive trial and error. Well defined rules eliminate uncertainty and discourage litigation by making the result obvious from the outset.
Second, this is a district court case. The district judge is concerned with the specific problem in front of him or her: are IP addresses personally identifiable information or not. The district court has neither the time nor the need (nor the authority, really) to create rules with broad scope.
Third, this case isn't about the meaning of 'personally identifiable information' generally. It's about the meaning of the phrase within the Windows XP End User License Agreement. The ruling is about construing the language of a contract, not privacy law as such.
Fourth, this is a federal court case dealing with a state contract law issue, in this case the law of the state of Washington (note the judge's citations to Washington contract cases like Seabed Harvesting v. Dep't of Natural Resources and Elliott Bay Seafoods v. Port of Seattle). When dealing with a state law claim, the federal courts are supposed to apply the law of the state as it would be applied by a state court; they are not empowered to make new state law. Erie Railroad v. Tompkins. Thus, it would be wrong for a federal court to make broad statements about the meaning of the term 'personally identifiable information' in contracts under Washington state law. Instead, the judge did the right thing and addressed only the specific problem at hand.
Re:Obligatory car analogy (Score:3, Insightful)
The only PII that really matters is my bank account. It is all about following the money, who cares who you are as long as you can pay for whatever it is you want. SSN is almost like a public key.