Forgot your password?
typodupeerror
Privacy Your Rights Online

Anonymous Cowards, Deanonymized 159

Posted by Unknown Lamer
from the we-know-who-you-are dept.
mbstone writes "Arvind Narayana writes: What if authors can be identified based on nothing but a comparison of the content they publish to other web content they have previously authored? Naryanan has a new paper to be presented at the 33rd IEEE Symposium on Security & Privacy. Just as individual telegraphers could be identified by other telegraphers from their 'fists,' Naryanan posits that an author's habitual choices of words, such as, for example, the frequency with which the author uses 'since' as opposed to 'because,' can be processed through an algorithm to identify the author's writing. Fortunately, and for now, manually altering one's writing style is effective as a countermeasure." In this exploration the algorithm's first choice was correct 20% of the time, with the poster being in the top 20 guesses 35% of the time. Not amazing, but: "We find that we can improve precision from 20% to over 80% with only a halving of recall. In plain English, what these numbers mean is: the algorithm does not always attempt to identify an author, but when it does, it finds the right author 80% of the time. Overall, it identifies 10% (half of 20%) of authors correctly, i.e., 10,000 out of the 100,000 authors in our dataset. Strong as these numbers are, it is important to keep in mind that in a real-life deanonymization attack on a specific target, it is likely that confidence can be greatly improved through methods discussed above — topic, manual inspection, etc."
This discussion has been archived. No new comments can be posted.

Anonymous Cowards, Deanonymized

Comments Filter:
  • Re:First (Score:5, Interesting)

    by FriendlyLurker (50431) on Tuesday February 21, 2012 @09:11AM (#39109413)
    This just begs a "reanonymize" browser plugin to alter one's writing style...
  • better way. (Score:5, Interesting)

    by Anonymous Coward on Tuesday February 21, 2012 @09:20AM (#39109497)

    This is, of course, not really new.

    A couple of years ago, there was some news (cannot find the link now) that some researchers tried this with a more statistical approach. As an implementation they used a compression algorithm.

    I had a try with this on a forum. Somebody posted a long story anonymously, but I suspected the author. I gathered 10 posts from 5 authors, including the suspect. Then I cut the amount of text to equal length. Subsequently I added the anonymous text to each of the 10 samples and bzipped the resulting text.

    The resulting zipped file was shortest in the case where I added the unknown text to the samples from the suspected author. The bzip algorithm apparently decided there was more similarity between the posts.

    Although this was by no means a real scientific test, I turned out to be correct and was rather pleased with the result. Seems to me such an approach could also be useful for things. Why login on /. when it can just figure out who you are based on what you have just written?

    To maintain anonimity you would just have to insert random shit into your posts.

    Bonus points for the slashdotter who can deduce my identity based on the non-randomness of this post.

  • by bigsexyjoe (581721) on Tuesday February 21, 2012 @09:22AM (#39109529)

    If it can identity you based on your idiosyncrasies, I suppose that means writers could use software based on these techniques to identity the idiosyncrasies in their own writing. From there, they can learn new ways to express themselves and write in a more colorful and varied manner.

    Heck, it can even be a tool that teaches you to think in a more varied manner.

  • by ardiri (245358) on Tuesday February 21, 2012 @09:45AM (#39109775) Homepage

    if your stupid enough to not change your posting style when trolling, your own bad.

  • Re:First (Score:5, Interesting)

    by hairyfeet (841228) <bassbeast1968@NOsPAM.gmail.com> on Tuesday February 21, 2012 @09:47AM (#39109795) Journal

    Yes but just like speech patterns folks got a habit of using similar phrases which I'm sure this picks up. For example I use folks where some would use people or persons, or if I think something is lame I often say it "Sucks the big wet titty" and often make reference to the south and southerners since that is my area. I'm sure if it went through every post of every place where I have the same UID (which is most of the places I hang out) it could then very easily either find my real name (Thanks to Yahoo comments using real first names and not UIDs) and any other places where I use a different UID quite trivially.

    In the end we humans are creatures of habit, we easily fall into patterns and routines and if its one thing computers excel at its pattern matching so frankly this doesn't surprise me at all and given a little time to tweak it I wouldn't be surprised if they have 95%+ accuracy if given a large enough data set of a suspected poster. So you might pick up ONE of my phrases, hell maybe even two, but I seriously doubt you'd pick up enough of my mannerisms that this thing would mistake Ethanol Fueled for Hairyfeet or vice versa.

  • Re:First (Score:4, Interesting)

    by hairyfeet (841228) <bassbeast1968@NOsPAM.gmail.com> on Tuesday February 21, 2012 @09:51AM (#39109833) Journal
    But wouldn't that just butcher the flow? I mean a trivial way to do it would be to run it through a translator, say take your English, convert it to German, then have it converted back to English, and you'd have this Chingrish kinda speech that was kinda sorta similar to what you said but not. Would you really want your ideas that mangled? Hell why even post at all if nobody is gonna understand you clearly?
  • Re:First (Score:5, Interesting)

    by lightknight (213164) on Tuesday February 21, 2012 @10:23AM (#39110161) Homepage

    And easily-defeated. One of the projects of my senior class at university was the building of software to defeat that kind of detection. It was crafted primarily so dissidents in foreign countries could speak without fear, by analyzing the author's writing patterns, and offering solutions to shift the writing to a different style.

  • by rarrar (671411) on Tuesday February 21, 2012 @10:54AM (#39110557)
    Schools already use programs like "White Smoke" and http://www.whitesmoke.com/ [whitesmoke.com] and "Style Writer" http://www.stylewriter-usa.com/ [stylewriter-usa.com] to identify grammar errors and stylistic errors, and suggest corrections. These programs are able to identify active and passive voice, clarity and readability of writing, ambiguous words, gender specific words, cliches, and more. I'm not sure the use of such software is such a great idea. I guess it's OK as long as a teacher reviews the results. Then again, if the teacher doesn't do as good a job as the program does...

"Gotcha, you snot-necked weenies!" -- Post Bros. Comics

Working...