Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
Microsoft Privacy

Concerns Over Microsoft's Internet User Profiling 144

jcatcw writes "Microsoft research on Internet user profiling could lead to tools that help repressive regimes identify anonymous dissidents, the Reporters Without Borders advocacy group warned last Friday. Microsoft's new algorithms correctly guessed the gender of a Web surfer 80% of the time, and his or her age 60% of the time. "In China, it is conceivable that this type of technology would be used to spot Internet users who regularly access such 'subversive' content as news and information websites critical of the regime," the group said."
This discussion has been archived. No new comments can be posted.

Concerns Over Microsoft's Internet User Profiling

Comments Filter:
  • by Anonymous Coward on Monday June 04, 2007 @09:45AM (#19381897)
    I'd be uneasy about partnering with a bunch of totalitarian control freaks like Microsoft.
    • by Anonymous Coward on Monday June 04, 2007 @09:53AM (#19382011)
      That said, I reckon I could guess the gender of a web surfer 80% of the time.

      if (site.equals("slashdot.org")) male = true;
      else if (site.equals("cutepuppydogs.com")) male = false;
      • Re: (Score:2, Funny)

        by nschubach ( 922175 )
        Great.. I can see it now:

        Due to a recent trend in programming. Feminist groups have decided they don't like being referred to as !male. They demand rule defined paring of boolean values "male" and "female" where setting male to false would trigger female equality to true. More on this at 11.
      • by mrmeval ( 662166 )
        z0MG PONIES
    • by willabr ( 684561 )
      Jezz... what next a Microsoft Van taking pictures of me?
      • Unnecessary. You know those little lights on your electronic equipment? They told you those are just light emitting diodes. What they didn't told you is that those are actually little cameras. And Big Bill is watching you... And I really mean you Willabr. So put on you pants and comb your hair. We, members of the oppresive regime secretly installed in your country, have feelings too, you know.
  • oh noes! (Score:4, Funny)

    by Anonymous Coward on Monday June 04, 2007 @09:46AM (#19381921)
    Unless you happen to be the only 20-25 year old male in China I think you're safe.
    • Re:oh noes! (Score:4, Insightful)

      by Short Circuit ( 52384 ) <mikemol@gmail.com> on Monday June 04, 2007 @10:24AM (#19382445) Homepage Journal

      Microsoft's new algorithms correctly guessed the gender of a Web surfer 80% of the time, and his or her age 60% of the time. "In China, it is conceivable that this type of technology would be used to spot Internet users who regularly access such 'subversive' content as news and information websites critical of the regime,"
      With that kind if inaccuracy, it could hardly be considered a reliable indicator of identity. It seems much more tuned to generating demographic data.

      Not that China couldn't use that in targeted propaganda...
    • This entire post has just gotten censored (self censorship) [wikipedia.org], I am still hoping for a free copy [official-softwares.com] of Vista [microsoft.com] from Microsoft for posting "favorable" comments on Slashdot.
      Not their idea, mine. Leaving no stone unturned [quotationspage.com], I say.
      If I did actually get a free copy of Vista, I'd put it on the living room table right next to the picture of Thomas Jefferson (1743-1826). [whitehouse.gov] His picture is on the Two Dollar Bill [thinkquest.org] laying there.

      As you can see, I am horribly bored this evening, but I am doing a test:

      I'm running the linux OS [geocities.com]

  • Comment removed (Score:4, Informative)

    by account_deleted ( 4530225 ) on Monday June 04, 2007 @09:50AM (#19381967)
    Comment removed based on user account deletion
    • Re:Poor accuracy (Score:5, Insightful)

      by Lockejaw ( 955650 ) on Monday June 04, 2007 @09:58AM (#19382093)
      I am worried about such low rates. People seem inclined to believe anything a sufficiently large computer says.
    • Re:Poor accuracy (Score:5, Insightful)

      by ajanp ( 1083247 ) on Monday June 04, 2007 @10:07AM (#19382211)

      That same day, Erik Bratt, a Microsoft marketing communications manager, fired a preemptive salvo about the company's age- and gender-guessing research. Bratt first downplayed the research results, saying: "[The researchers] actually found that they could not, with a high degree of accuracy, predict age from Web browsing activity." He also swore the Redmond, Wash. developer off using the resulting algorithms. "Microsoft currently has no plans to use the capabilities found through this research in our products and services," Bratt said.
      So... they tell people that they have the ability to kind of, maybe, sort of, predict age and gender, but that their current algorithms are basically just a bunch of BS. Then they decide to also mention that despite the fact that the foundation for this technology doesn't really work accurately and the have no plans to use it in any products, they're going to continue working on it anyways.

      Microsoft's researchers said they would expand their work to other demographic attributes, such as occupation, educational degree and geographic location
      I mean, ofcourse that's the logical thing to do so that they could hopefully get it to work more accurately, but why mention that you're even working on the technology if you're going to immediately dismiss it by saying that it doesn't really work.

      Then again, now I guess it solidifies the opinion about why Microsoft is really so sore about the Google-Doubleclick deal http://slashdot.org/article.pl?sid=07/04/16/021720 3 [slashdot.org]. Google's got a leg up on what they have apparently already been working on and now they're at a disadvantage.

    • Isn't that worse?

      "We know a guy of 30-35years was looking at anti-freedom/love/justice/whatever material in this net cafe on Thursday at 3pm, would that be you sir? No? Well they all say no at first..."

      While that 20 year old woman is giggling from her window as she beat "the system" yet again.

      At least if it's accurate you're okay if you behave (no I am not supporting it, just saying), where as if it's inaccurate.. well fuck, better odds on a coin flip.
      • Re: (Score:3, Funny)

        We need to start actively throwing their algorithm off. If every one of us guys were to start visiting hellokitty.com and mypinkfluffybarbieplayingwithponies.com and all the women were to start reading slashdot, they wouldn't be able to tell us apart!
        • I'm already doing my part. While my surfing habits are definitely male-oriented, I'm sure they would determine me to be 14-16 tops.
          • by rtb61 ( 674572 )
            When it comes to data poisoning the best way is an automated extension that just distorts your profile by mixing it with others, when done properly we all end up looking like one homogeneous mass. From junk data embedded in emails, to fake search routines and even random background browsing across a whole range of sites, no matter how much data their privacy invasive computers can store, hundreds of millions of users can create even more random worthless data and flood them under. What is cool is as their c
        • There are girls on the internet? Amazing!
    • How accurate it is doesn't matter. What matters is if the person who would make money deploying it can pitch it correctly to the people in charge. If so, they'll still try it. China would be the perfect place to start as they don't seem as though they'd be worried about false positives.
    • Re: (Score:3, Interesting)

      by reddburn ( 1109121 )

      80% and 60% are both actually very poor accuracies. I wouldn't be worried; this won't be taken seriously as any type of reliable profiling.
      It depends on the context. If you were talking baseball, it'd be a pretty damn good batting average. It's also good enough to get a warrant, and good enough for a grand jury to indict, and those latter two are what should worry you.
      • Baseball's being a defensive sport aside, I'd argue that 80% accuracy should certainly not be good enough for a grand jury. Let's examine a hypothetical case where someone is brought before a grand jury for cime X. The situation:
        • A profiling procedure has been developed which can tell with 80% accuracy if this person is the type of person who would commit crime X.
        • Out of the population at large, let's say X is a very common crime, and 10% of the population has committed it. (In other words P(Crime) = 0.1
    • Well you you why...

      ps: Humour

    • by catbutt ( 469582 )
      Regarding the age, how can they say "60%", and how can you say is is not accurate, when the tolerance isn't specified? Do they mean to the year? (I kinda doubt it)
    • by acvh ( 120205 )
      this is their first stab. they'll get better at it.
    • But it could correctly indicate you were 60% likely to be a dissident, which is good enough in most jurisdictions.
  • by supersnail ( 106701 ) on Monday June 04, 2007 @09:52AM (#19382003)
    ... wouldn't it be easier to look up the IP address and persuade the ISP to hand over the user details?

    The old ways are often the best.
    • Or block access to the websites in question? It's going to be a cold day in Hell when a totalitarian regime starts relying on Microsoft to do the spying for them.
    • wouldn't it be easier to look up the IP address and persuade the ISP to hand over the user details?

      Requiring the ISP to keep records with "wiretapping" laws and then getting the details is the US method. Farming out the collation of records to a company like Choice Point goes beyond the laws and is both cheap and efficient.

      In China, the regime is the ISP and they have the best equipment and methods that US companies could provide.

    • Re: (Score:2, Funny)

      by monk.e.boy ( 1077985 )

      Hey! I live in the UK, where they can just look through my window with one of the ubiquitous CCTV cameras and just watch me browse the net...

      ...monitor my books, my mail...

      ...the rate at which I scratch my arse (there are by-laws that I won't go into, but let's just say if I do it more than 5 times in an hour and don't immediately go see the doctor, the para's turn up)...

      Microsoft are waaay behind.

      monk.e.boy

  • What percentage of net-users are male ? female ? If 80% are male, and the algorithm just guessed Male all the time, unless they bought 'ladies items' online, then they would have a pretty good accuracy. Age group usage would represent a bell graph, not difficult to again skew your results to favourably reflect on your algorithm.
    Unless of course these results were made under strict scientific obervance and imparitiality.. nah !
    • Actually, more women than men use the internet.
      • Only since they figured out they can get at their partners money via online banking
      • Re: (Score:3, Informative)

        Actually, no. More women under the age of 65 than their male counterparts use the Internet, according to Pew Internet. If you take into account all women and men of all age groups, then women still trail men by a few percentage points.

        If anything, you could say that men and women use the Internet about equally.

        Now, I'm almost positive that at least 80% of Slashdot readers are male, though there are an increasing number of females on this site. Many of them I think hide and don't reveal that they are fema
    • by PsEvo ( 1075643 )
      What about all those men buying 'ladies items' :O
  • Microsoft's new algorithms correctly guessed the gender of a Web surfer 80% of the time, and his or her age 60% of the time.

    Link to paper [bell-labs.com]. I don't claim to be knowledgeable about this stuff but that success rate doesn't look too remarkable to me. China's sex ratio is hardly so skewed (yet, anyway) that this could remotely identify someone from a pool of a billion users, or even out of a single Internet cafe.

    I'd wonder more about the quality of research Microsoft is getting out of their Beijing site if the

  • by erroneous ( 158367 ) on Monday June 04, 2007 @09:53AM (#19382021) Homepage
    If you read this post you are probably male. (80% correct)
    You are probably in the 20-35 age group. (60% correct)

    (I know I'll only get negative responses to this post of the type "I'm reading this and I'm a 47-year-old woman!" That's Ok. You're in the other bracket.)

    My algorithm is as good as Microsoft's. Can I have a research grant please?
    • by spellraiser ( 764337 ) on Monday June 04, 2007 @10:02AM (#19382157) Journal

      You got me pegged. Curse you!

      Can I have my privacy violation lawsuit settlement, please?

    • Also, the time is currently 11:45. (Correct 2 times per day.)
    • by PhxBlue ( 562201 )

      "I'm reading this and I'm a 47-year-old woman!"

      As a 48-year-old grandmother, I'm offended that you called me 47!

      Actually ... I don't think that Slashdot-ism works here. Go figure. :)

    • sure, it works for slashdot, but does it work for a site dedicated to something more neutral, like say migraine headaches (predominantly female, 75/25 maybe) or cnn.com? Things that are somewhat gender neutral?

      I know, you were trying to be funny, but if this thing works across the board, 4 out of 5 positive ID's ain't bad.
    • Re: (Score:1, Funny)

      by Anonymous Coward
      Are you trying to say that Slashdot is made up of mostly males!? And all this time I've been thinking this was a great place to meet chicks.
  • by giafly ( 926567 ) on Monday June 04, 2007 @09:53AM (#19382023)

    Microsoft's new algorithms correctly guessed the gender of a Web surfer 80% of the time, and his or her age 60% of the time.
    Just include {MERGE GENDER} and {MERGE AGE} in the comment area. For example...

    Your gender is ... MALE
    Your age is ... 18-30
  • Why don't we just enter out SSN every time we browse the web, and we can avoid all this 60-80% nonsense All sites visited would be logged in a centeral database, and would be used to deliver targed advertising. Just think! Based on my browsing habits, every website will look like torrentspy.com.
  • by taupin ( 1047372 ) on Monday June 04, 2007 @09:57AM (#19382067)

    Printable Version [computerworld.com]

    Right now this doesn't worry me too much - after all, how much "identification of anonymous dissidents" could someone do based only on one's gender and a rough estimate of age? On the other hand, if Microsoft do expand to geographical location, occupation, and educational degree as mentioned, then it's rather worrying.

  • This is another example of why it's important to ensure that corporations aren't allowed to collect and store huge amounts of data about individuals. The fact that they can analyse it in some way or another is irrelevant if privacy is respected in the first place.
  • O Rly? (Score:5, Insightful)

    by vivaoporto ( 1064484 ) on Monday June 04, 2007 @10:00AM (#19382121)
    First things first: why China? (The same question applies to Venezuela, Russia, Brazil or whatever is the target of the Slashdot "fifteen minutes of hate" of the day). Of course people should be concerned about what these countries do wrt losses of privacy and basic rights, but what about U.S. and E.U.? As we talk, they are working on a new agreement to share data from passengers on trans-Atlantic flights [iht.com], a much more effective way to profile people, because it contains name, address, gender, destination, credit card number, everything, without needing to make any kind of assumption, everything is plain and clear. This is why I think that not only "in China", as the summary states, but in most countries in the world, this information can and will be used to tag people indiscriminetaly, subversive or not, terrorist or not, law abiding or not. So, take care of your own backyard before to point the poison ivy in your neighbor one.

    Second, it is not like if Microsoft was the only one researching and developing on this field and, more than that, it is not like if Microsoft was not researching on this field, any government interested on this kind of technology would not research itself, or fund research on its public universities. So, throwing Microsoft name on the mix only reinforces my point, this submission is nothing but a flamebait, being the flame targets the usual suspects, proprietary software and communism.

    • Re: (Score:3, Funny)

      by bahstid ( 927038 )
      think of the child^H^H nese!
    • Agreed. In addition, China's already shown a propensity for messing with the internet, so they could just institute national policies that directly link your activities with your name. I know it wouldn't be perfect and that you could get around it, but it would be a whole lot better than profiling by surfing habits.

      The real places this could be used are in countries that have enough freedom to not be able to meddle at a base level, but that have or desire a high degree of surveillance on their citizens (
    • >So, throwing Microsoft name on the mix only reinforces my point, this submission is nothing but a flamebait

      Nonsense. Microsoft's power and influence alone justifies concern about whatever they do. The same could be said of China. Your point is thus totally lost on me.
    • by Prune ( 557140 )
      Funny that communism and proprietary software would be the 'usual suspects' as you say, given that the opposite of proprietary software, open source, is an ideology not that unfitting in a communist framework.
    • As we talk, they are working on a new agreement to share data from passengers on trans-Atlantic flights, a much more effective way to profile people, because it contains name, address, gender, destination, credit card number, everything, without needing to make any kind of assumption, everything is plain and clear.

      In short, everything that an international traveler has had to disclose to authorities since the beginning of the modern era.

      Identity. Citizenship. Financial responsibility. No legal barriers t

    • Re: (Score:3, Interesting)

      by Hatta ( 162192 )
      First things first: why China?

      Exactly, there's just as much reason to be afraid of these things in oppressive first world regimes [prisonplanet.com]. The US government is already getting some websites to turn over [slashdot.org] their user lists. How long until they talk Microsoft or Google into giving them access to their data mining facilities? If they can get 80% accuracy, well that's probable cause, and that will get them a warrant for a search. Eventually we'll all have to be very careful what we search for, lest we end up on a list
      • 80% accuracy of a given demographic is not probable cause to search anything. 80% accuracy for the selection of an individual is ok, but for a demographic is not enough to search every member of that demographic (or any selected members) (yet)
  • by m0nkyman ( 7101 )
    What? You think that the government of China doesn't have the resources or smarts to do this research and development themselves? Come on, shutting down research because the Chinese government might use it badly is very, very silly.
    • And what does this have to do with guns (or giants, depending on how you interpret the acronym)? I only ask because you brought it up...
  • Any technology that reduces anonymization could have this effect. It's a tradeoff--ease of mobiliy for dissidents vs eliminating obnoxious assholes. So every time you have to suffer through a troll, you're protecting freedom-loving Chinese!
  • So now in addition to Tor et al and the things that help privacy (sending Google random data as search queries) all we will have to do is have something in the background opening up male/female sites over all popular age ranges. Way to have to cream everyone's bandwidth. Sheesh, is there anything you CAN get right, MicroJerk?

    Profiling is akin to racism in my book. It's against democracy any way you look at it.
  • I realize the concept that this software, if it became more accurate, could be used by repressive regimes against their citizens. But as far a priorities go, I think they would do better to concentrate on bringing attention to human rights violations, and educatiting people about the rule-of-law.
  • The Usual Suspects (Score:4, Interesting)

    by blueZhift ( 652272 ) on Monday June 04, 2007 @10:10AM (#19382257) Homepage Journal
    This reminds me of the scenes in Casablanca where the police are told to round up the usual suspects. Ultimately the accuracy doesn't matter to the government anyway. Worldwide, I think we are moving in a direction of less freedom rather than more, spearheaded by wrong-headed anti-terrorism hysteria in the US. So why should they care about accuracy, they'll just round up whoever fits the profile and sort it out later, or not.
  • "Our new algorithm you ask? Well, we took a look at the MSN search log and counted the ratio of keywords such as 'boobs' and 'anal'." Considering the various interests and beliefs of men and women, no algorithm could accurately guess a gender, even age.
  • by ajs318 ( 655362 ) <.sd_resp2. .at. .earthshod.co.uk.> on Monday June 04, 2007 @10:18AM (#19382355)
    What about somebody writing a browser extension that performs bogus searches in the background, for no better reason than to frustrate "profiling" attempts? Is this feasible?
  • They detailed how much of the information they gather is from using MS related software such as Internet Explorer, etc.

    Which is to say, how much info can be gathered using a non Ms browser such as Firefox with a Non MS operating system such as Apple or Linux, and avoiding non MS- dominated web sites?

    The more important questions are a)what extent is this important to free societies and breaking the grip of totalitarian regimes on their societies?, and b) to what extent do we as memebers in free societies nee
    • how much info can be gathered using a non Ms browser such as Firefox with a Non MS operating system such as Apple or Linux, and avoiding non MS- dominated web sites?

      Pretty much everything anyone could want.

      When you click on a prostate cancer site it isn't hard to guess your likely age and gender.

      • True. I guess the better question would have been "how much of the statistics are derived from M$ server software and/or M$ client software -- because a server can always collect information at an extreme level.
  • I don't get it. No matter what age-group/gender combination you think of, even combined with geodata and occupation/education levels, this doesn't even come close to identifying individuals. Unless you actually believe "a male farmer in his 30s in the Shanghai region" or "a female grandmother in the suburbs of Houston" is significantly detailed.

    It makes more sense to worry about accidently getting linked to personal details left in instant messaging, e-mail, community profiles and/or conversations.
  • by Phat_Tony ( 661117 ) on Monday June 04, 2007 @10:30AM (#19382529)
    Right, so they're about 50% sure that it's someone who's both male and aged 24-30, living in China. It should be easy to pick out the individual from there.
  • Must be Monday... (Score:1, Flamebait)

    by stubear ( 130454 )
    ...because the slashmonkeys are arguing that technology that can be abused is now considered to be bad. Tomorrow there will be ab article about how the RIAA/MPAA are trying to shut down P2P networks and he sheep will graze on the other side of the field once more.
  • by petes_PoV ( 912422 ) on Monday June 04, 2007 @10:55AM (#19382903)
    ... which is a completely different and only slightly correlated attribute.
  • Microsoft's new algorithms correctly guessed the gender of a Web surfer 80% of the time, and his or her age 60% of the time.

    A bit strange in my opinion. "Guessing the gender correctly" has already 50% even if you don't have any data about the user. So there is not much improvement here.
    But the age... if you really guess the age thats more difficult. If we say we have everyone in the Internet up to the age of 100, you have a 1% change of guessing the age - much less than 50%.
    So even if you improve only
  • ...what sort of profile they'd get running that on /. posts?
    • Re: (Score:2, Funny)

      by Bugs42 ( 788576 )
      99% male

      75% 18-30 years old

      100% single and living in Mom's basement

      Hey, look, I got even more data than MS! I'll take a check now, please.
  • This doesn't sounds like a problem at all.
    To use it the regime would have to know about other sites the user had visited to input that information in to the algo.
    If they can already uniquely identify a users across multiple sites then they would already know who they were.
    All this is useful for is processing information for marketing after using your phishing detecting(IE7, google toolbar) software to spy on your users.
  • Are there a lot of people who, for example stay logged in to Google while checking email, searching the web, looking up maps etc? I log out every time. No personalized search for me thank you very much. Speaking of... lemme clear my cookies. hehehe
  • It's really at:
    http://www.rsf.org/article.php3?id_article=22379 [rsf.org]

    the other,
    http://www.rsf.org/rubrique.php3?id_rubrique=20 [rsf.org]
    is a splash page, me thinks

  • Unfortunately it is TOO effective. As a result, Americans spend way more money than they should. People stoped making informed purchases, they just buy the product of whoever has the most effective ad. This translates 100% into politics. The public no longer care about budget deficits, nor do they care about the candidate's stance on all issues. They are more likely to get behind a candidate with the best ads.
  • willingly give up all rights of privacy so they can be good citizen comrades in the willing partnership of the Corporate State and the citizen comrades!

    All power to the Rights Fuhrer Bill Gates and Citizen Comrade Chiefs who free us from worry about nasty anonymity!

  • Screw communism (no offense, read what I have to say). What do you think Democracies are going to do with this? I can imagine congressmen buying $60,000.00 copies of this product to determine exactly who's in their voting region, match up the IPs on the internet, and then use that to further harass us every time an election comes around! It's madness! At least in Communist states they put you out of your misery and do you in! Here is America they keep you alive so they can keep exploiting you... "It's

Never tell people how to do things. Tell them WHAT to do and they will surprise you with their ingenuity. -- Gen. George S. Patton, Jr.

Working...