Forgot your password?
typodupeerror
Privacy Security Your Rights Online

Unmasking Anonymous Email Senders 204

Posted by CmdrTaco
from the i-see-you-over-there dept.
alphadogg writes "Just because you send an email anonymously doesn't mean people can't figure out who you are anymore. A new technique developed by researchers at Concordia University in Quebec could be used to unmask would-be anonymous emailers by sniffing out patterns in their writing style from use of all lowercase letters to common typos. Their research, published in the journal Digital Investigation, describes techniques that could be used to serve up evidence in court, giving law enforcement more detailed information than a simple IP address can produce."
This discussion has been archived. No new comments can be posted.

Unmasking Anonymous Email Senders

Comments Filter:
  • by gatkinso (15975) on Tuesday March 08, 2011 @03:49PM (#35422404)

    run it thru pretty print or some other formatter before sending it.

    • by Anonymous Coward on Tuesday March 08, 2011 @03:54PM (#35422464)

      They seriously think an 80% success rate is good enough to be used in court?

      I'm betting the real reason is so they can go to a judge with their pseudo-evidence to get a warrant for more invasive spying.

      • I can't even see something as good as an 80% match rate on anything less than a full page of text, you'd need a damn huge sample size if you're going to be using typos and capitalization as "fingerprinting".

        Also, doesn't this mean that a simple spellchecker and the auto-capitalization function on many smartphones would defeat this technology?

      • by Anonymous Coward on Tuesday March 08, 2011 @04:09PM (#35422678)

        According to Wikipedia an 80% success rate is good enough for most civil cases, and indictment for criminal cases. These are best off a "preponderance of the evidence," or "more likely than not" standard (>50%). Criminal case decisions are based on a standard of "clear and convincing evidence," but 80% would be more than enough to get them in the door.

        http://en.wikipedia.org/wiki/Legal_burden_of_proof#Examples

      • by spun (1352) <loverevolutionary@nOSpam.yahoo.com> on Tuesday March 08, 2011 @04:10PM (#35422690) Journal

        They seriously think an 80% success rate is good enough to be used in court?

        Why not? 19 states and many countries still admit polygraph tests into court, despite the fact that they are wildly inaccurate, and people can be specifically taught to deceive them.

        http://en.wikipedia.org/wiki/Polygraph#Validity [wikipedia.org]

      • by mikael_j (106439)

        80% success rate is worse than a lot of properly trained text categorization tools. I'm also suspecting from skimming the article that this system is even easier to throw off track than most text categorization tools built on solid algorithms.

        Just word yourself a little differently, use the british spelling of a few words instead of your usual american spelling, try to use shorter or longer paragraphs than you would usually use et voilà, you are now a very poor match for those anonymous death threats s

        • by spun (1352) <loverevolutionary@nOSpam.yahoo.com> on Tuesday March 08, 2011 @04:23PM (#35422816) Journal

          Even worse than false negatives would be false positives. Maybe those death threats to your boss sound just like you, use the same words you use, the same grammar, everything. That's because your jealous coworker pirated himself a copy of this program, fed some writing of yours through it, and then kept editing those death threats until the program claimed they sounded just like you.

          • by westlake (615356)

            That's because your jealous coworker pirated himself a copy of this program, fed some writing of yours through it, and then kept editing those death threats until the program claimed they sounded just like you.

            The poison pen is ego driven. The ultimate DIY project.

            Here is sampling of the real thing:

            Perverts welcome.

            When a male teacher was in the mood for a little sex between classes, militant teacher [Jane Doe] was happy to help out, by writing a hall pass for a female student.

      • 80% success is better than fingerprint matching OR DNA. So yeah, I think it's likely to stand up in court.
        • by gd2shoe (747932)
          80% might be better than some ill advised partial fingerprint match. It's unfortunate, but does happen from time to time. It isn't better than a real match or DNA evidence.
          • I invite you to be horrified by taking a look at the actual science behind those CSI shows. The threshold for use in court is far far lower than you might imagine it to be. Furthermore, I'm going to go out on a limb here and say that you don't have the foggiest idea how DNA evidence is handled in a courtroom, or for that matter a criminal lab. You'll be most pleased to know that not only were you a match to the sample we have, but so are all your immediate male family members, most of your extended famil
      • by Spazmania (174582)

        An 80% success rate is good for two things:

        1. Exculpatory evidence. "Defendant often filled the office printer with paper; that's why his fingerprint is on the death threat letter. Our software indicates only a 20% match to defendant's writing style, so it wasn't him."

        2. Narrowing the suspect list. "We ran the writing analysis software against samples from 20 suspects. It got an 80% match against suspect #1, a 70% match against suspect #2 and no better than a 20% match against anybody else. So we focused th

        • I'd imagine this software would be very easy to fool if you wanted to commit the resources too. Or even just keep tweaking your fake document until you produce the desired result from the software.

          • by Spazmania (174582)

            Well, sure, if you're some kind of evil genius you can research how the cops figure you out, avoid all that and then hide the fact that you researched it too.

            • by Weezul (52464)

              Imagine a Stasi, Egyptian SSIS, FBI, etc. officer who really really wants to lock you up for terrorist talk. He'll first write out what he wants you to say. He'll then look at your writing, make some changes, check the score, and repeat until score says conviction.

        • by syousef (465911)

          An 80% success rate is good for two things:

          1. Exculpatory evidence. 2. Narrowing the suspect list.

          Sounds more like a good way to misdirect your effort because you're chasing the wrong guy after incorrectly excluding the right one.

      • They seriously think an 80% success rate is good enough to be used in court? I'm betting the real reason is so they can go to a judge with their pseudo-evidence to get a warrant for more invasive spying.

        How is this modded insightful? This is how the justice system works - when the police can show reasonable probable cause, a judge issues a search warrant. Nor is it pseudo evidence - no more than partial fingerprint matches, noting that you're a left handed red head who matches the description of the left

      • by PRMan (959735)

        Well then, it's a good thing I spell perfectly and don't make typos!

        Dang it! I just incriminated myself, since I seem to be in the 1% of Americans that can.

      • by westlake (615356)

        They seriously think an 80% success rate is good enough to be used in court?

        I'm betting the real reason is so they can go to a judge with their pseudo-evidence to get a warrant for more invasive spying.

        The geek never gets these things right.

        The standards for discovery in a civil case and for a warrant in a criiminal are not the same as the burden of proof at trial.

        Judges do not like excluding relevant evidence even if that evidence is in some ways uncertain or imperfect. Few things in life are certain and perfected

        The burden of proof in a civil case is light: "More probable than not." You may not even need a unaminous verdict.

        Civil cases in state court

        Size and unanimity requirements in civil cases vary considerably under state laws. Less than half the states require twelve-person juries, and about half the states allow for non-unanimous verdicts.

        Calling textual analysis "psuedo evidence" leads nowhere. Courts have been a

    • run it thru pretty print or some other formatter before sending it.

      If it's looking for writing style, not just punctuation, spacing, caps, etc., then you might also want to do an auto-translate back and forth from your language. But that would potentially provide another way to find you if you used an online translator.

    • by OzPeter (195038) on Tuesday March 08, 2011 @04:12PM (#35422722)

      run it thru pretty print or some other formatter before sending it.

      Nah .. run it twice though Google translate

      Nah .. ejecutarlo dos veces a través de Google Translate

      Nah .. twice run through Google Translate

      • The vodka is good, but the meat is rotten!

        • by OzPeter (195038)

          The vodka is good, but the meat is rotten!

          I just tried that famous phrase via google and got ..

          The flesh is weak but the spirit is willing. => Russian => the flesh is weak but the spirit is ready.

          I'm sure a native Russian speaker could debate the choice of "ready" over "willing" (Especially if I could paste from translate to here!)

          • by Cyberax (705495) on Tuesday March 08, 2011 @04:50PM (#35423136)
            I'm a native Russian speaker and this phrase, indeed, can't be mistranslated this way (I just used it as a well known example). However, it's true that attempting to automatically translate ANYTHING non-trivial from English to Russian invariably results in hilarity.
            For example, I've tried to translate the next Slashdot article's blurb:

            "Google Voice users learned late Monday that the service now has a way of making purely Internet-based phone calls. Making a SIP call with a "sip:" prefix, the Google Voice phone number and @sip.voice.google.com skips the conventional phone network entirely, saving users cellphone minutes. Disruptive Telephony tested it and found that a call worked "great.""

            "Disruptive" was translated as "explosive" in the sense of "trinitrotoluene", and "great" was translated as "big". Translating it back resulted in:

            "Google Voice users learned late Monday that the service is now a way to make a clean Internet phone calls Make a call with SIP. "Sip:" prefix, Google Voice phone transmits the number and@sip.voice.google.com common telephone network fully, saving minutes of mobile phone users. Explosive Telephone tested it and found that the call worked "big""

            You can probably still guess the meaning, but it's not exactly easy.

            • Sounds just like those outsourced business sending you news feeds advertising their products.
      • by hedwards (940851)

        OTOH, that's more efficient than the old system of cutting letters out of the newspaper.

      • by vlueboy (1799360)

        thanks.

        you forgot another logical step.
        first, we do what you said,
        then just run it thru a case changer.
        all lowercase sounds good.
        all caps sounds better.

        break up sentences like i'm doing, pretending that
        we're holding a typewritter-style car-return over-compensation.
        all this adds mental noise
        "proving" the fact that the writer
        is nothing more than yet another internet noob.
        why not double any question marks and exclamation points!!??
        throw in unnecessary ellipsis at the end of sentences for suspense ...
        if your ma

        • This is called "stylometry": the algorithmic analysis of authorship based on the content of the work in question. There are many scholarly articles out there describing various algorithms out there you can find and read. Early efforts in this area involved testing the Shakespeare/Bacon hypothesis, who wrote which of the Federalist papers, and establishing the authorship of the 15th Oz novel.

          The basic concept is pretty easy. I played a fair bit with the idea back a few years ago when I wanted to prove to m

          • Do me a favor and find out if the user "Clippy" [lesswrong.com] on LessWrong.com is one of the other posters and who that asshole is. Will pay in Bitcoins.

            -- long time angry member of LessWrong.com pissed off at the fucking paperclip maximizer.

      • It would be interesting to run that exercise with a non trivial message and then give the results to a someone who hasn't seen the original and see if they can still make sense of it. (I'm guessing not, translation has a lot of subtleties even when using a standard vocabulary.)

    • by bugnuts (94678)

      In the old days, we used to pipe our play-by-email instructions through jive(6) to avoid giving away too much info.

      I could see anon email services offering a filter. The fastest way is to convert it to French and back.

    • by ron_ivi (607351)

      Translate to and from some other language repeatedly until the translations are the same.
      That way the writing style will resemble the translation program's more than your own.

      An example, using this technique on the above text: http://translationparty.com/#8957181 [translationparty.com]
      "By repeating the same part of the translation has been translated into other languages. Style, translation program, this method is beyond ourselves."

      • Original:

        There were three men came out of the west, their fortunes for to try
        And these three men made a solemn vow
        John Barleycorn must die
        They've ploughed, they've sown, they've harrowed him in
        Threw clods upon his head
        And these three men made a solemn vow
        John Barleycorn was dead
        They've let him lie for a very long time, 'til the rains from heaven did fall
        And little Sir John sprung up his head and so amazed them all
        They've let him stand 'til Midsummer's Day 'til he looked both pale and wan
        And little Sir John'

    • by arivanov (12034)

      Not good enough. Lexical analysis, dictionary analysis, etc produce enough information to be used as a piece of evidence in court and have been used in court for decades long before the Internet.

      They are not reliable enough by itself, but taken together with other indirect evidence they can tip the scales towards a conviction.

    • Google translate to German, then back to English, nobody will ever be able to restore the original message!
  • For the lulz (Score:4, Insightful)

    by burnit999 (1845596) on Tuesday March 08, 2011 @03:51PM (#35422426)
    Sooo... if I want to write an anonymous letter I just switch from my usual grammar natzi mode to my OMFG i c4/Vz p0ns0r your org MANNNN!
    • by eyrieowl (881195)
      Being a grammar 'natzi' apparently being distinct from being a spelling nazi... ;)
    • by Idbar (1034346)
      You can always type something, and then translate it sequentially into 5 different languages using an on-line translator. After that either the grammar is good or the message is completely scrambled. In any case, no tracks of your writing style.
  • by SMoynihan (1647997) on Tuesday March 08, 2011 @03:52PM (#35422434)

    Turns out most spam is written by e e cummings.

    Who'd have thought it?

  • by _0xd0ad (1974778) on Tuesday March 08, 2011 @03:53PM (#35422450) Journal

    who always types part of the body of his message in the subject line.

  • by trollertron3000 (1940942) on Tuesday March 08, 2011 @03:54PM (#35422466)

    Yes but unlike writing this can be easily duplicated. Writing using someone else's style isn't an easy task. Doing it with a keyboard, very easy.

  • Pretty sure profiling and behavioral analysis has been around for a long time.

    • Re: (Score:3, Funny)

      by OrangeCowHide (810076) *
      But this is on a computer... On the internet. That's like double implicit innovation.
      • by Minwee (522556)

        But this is on a computer... On the internet. That's like double implicit innovation.

        So you can patent it.

  • wherefore did I ever adopt such a distinctive writing style.

  • This only really applies when you have something to compare it with. Besides, this technique just forensic document examination, which is older than computers are, how is this news?
  • by zindorsky (710179) <zindorsky@gmail.com> on Tuesday March 08, 2011 @03:56PM (#35422494)

    I'm not saying the research is worthless, but their techniques are easily defeated.
    It would be simple to write a program that would iteratively "fuzz" your message with typos, lowercase/uppercase toggling, etc. and check the result against their algorithm until the message could no longer be tied to you.
    I'm sure someone could do it in 10 lines of Perl, or less.

    • by Asic Eng (193332)
      Surely they are already doing this? The spam I'm getting is universally atrociously written, probably in an attempt to escape spam filters, I suppose.
    • by Whalou (721698)

      I'm sure someone could do it in 10 lines of Perl, or less.

      Isn't there a law or something like that that states that anything that can be written in 10 lines of Perl can be written in 1 line of Perl?

      (Sorry for the amount of 'that's in the previous sentence.)

    • by vux984 (928602)

      I'm not saying the research is worthless, but their techniques are easily defeated.

      And...?

      It would be simple to write a program that would iteratively "fuzz" your message with typos, lowercase/uppercase toggling, etc. and check the result against their algorithm until the message could no longer be tied to you. I'm sure someone could do it in 10 lines of Perl, or less.

      Of course it would be easy to defeat. But document analysis techniques have been around for decades... maybe not this specific algorithm, but

      • by hedwards (940851)

        As has been pointed out by others, in the past you couldn't auto-translate it into another language and back. You lose virtually all of the identifiable information that would help them analyze the document like that.

        • by vux984 (928602)

          As has been pointed out by others, in the past you couldn't auto-translate it into another language and back. You lose virtually all of the identifiable information that would help them analyze the document like that.

          And people still don't bother most of the time; so the tech is still useful.

          For example, forensic fingerprinting technology is defeated by wearing gloves, but that hasn't rendered the technology irrelevant either.

          • And people still don't bother most of the time;

            They will now. And in the real world, there's going to be far too many coincidences and people who, accidentally or not, use a different writing style than usual. Doing so is even more simple than wearing a glove. 80% in the real world? Not at all.

    • So what you'd be writing is a fairly simple minded encryption program... The output of which would likely be fairly recognizable and very likely would still contain the 'fingerprints' of the original writing. I.E. with sufficient text, they'd still be able to tell the difference between a message written by you, and a message written by me - but with the additional disadvantage of not having a 'reverse' function and thus making the text difficult to read by the intended recipient.

      So I suspect it's much le

  • Simple (Score:5, Informative)

    by LWATCDR (28044) on Tuesday March 08, 2011 @03:56PM (#35422498) Homepage Journal

    Use Google translate. Translate it into Spanish, then into German, then back into English, then into LEET.

    It should be simple to obscure the style and weaknesses of the author with this method.

    • Re:Simple (Score:4, Interesting)

      by 0100010001010011 (652467) on Tuesday March 08, 2011 @04:00PM (#35422564)

      With Google Translate. Translate into Spanish, then German, then English, then in LEET.

      It should be easy to hide the style of the author and weaknesses with this method

      I was expecting some hilariously screwed up result, but that turned out rather well. It also masked your writing style.

      • It's also a fairly simple and trivial message. I suspect that a longer passage, more like normal text, would not survive too well. Translation is fairly complex, and even though Google Translate does a fairly good (albeit mechanical) job... I find that it often misses the nuances and suffers greatly if you use a vocabulary much above the grade school level.

        Case in point, the text above run through the process above:

        There is also a very simple message and trivial. I suspect that a longer p

      • by LWATCDR (28044)

        I too was surprised, since I meant it mainly as a joke. I thought that English to Spanish to German would cause some amusement. Probably better to go outside of the European languages.

        I was also surprised, especially since I did not mean it is a joke. I thought that the English and Spanish and German would cause some amusement. Probably best to go out of the EU languages.

        So that was English to Swahili to Finnish to English. Not bad actually just one small edit for meaning and it would be pretty dang good.

    • by nametaken (610866)

      I'd guess any repetitive technique like this with more than one obfuscation would make it increasingly unique and identifiable, no? If we're looking to lower the confidence of matching, maybe aim for the common denominator.

      Perhaps it's better to write to a fourth grade level and just run everything against a common spell check engine, like the one in Outlook?

    • And make quite sure you're incomprehensible. Even one pass would make sure you sounded like a complete dick. But think about it, this wouldn't change your sentence length distribution, for instance.

    • by Dan541 (1032000)

      So, give Google the evidence. I think I'll pass.

  • I developed a bad habit in very early days of usenet when there was a weird bug with Pnews where you had to begin a post with a blank line -- so to this day I still start every email (written in Pine) with a blank line first for some reason.
    • by idontgno (624372)

      Ah. It's been a long time since I've thought about sacrifices to the line eater. [catb.org]

      An old religion worshiping an unforgiving and primitive god. [catb.org]

      I guess if your online writing style was incubated in the Usenet era, it might have enough quirks and idiosyncrasies to be identifiable.

      • by weave (48069)
        Nice. Thanks for those links. Interesting. I forgot the details and reasons for the initial empty line besides "you were just supposed to do it" (and I still do) ... back in the days when an entire big 7 news feed was about 20 megabytes a day.
    • by Culture20 (968837)

      I totally know what you mean.
      .
      ^D
      ^D^D^D
      ^C
      ^Zbg
      ???
  • by dim5 (844238) on Tuesday March 08, 2011 @04:00PM (#35422560)
    This is why I cut & paste each word of anonymous emails from an online dictionary.

    Untraceable.
  • by sumdumass (711423) on Tuesday March 08, 2011 @04:01PM (#35422570) Journal

    It used to be that people would cut words from magazines and other papers to make ransom notes so no one could recognize their hand writing.

    With this concept moving to the computer and internet, it will be trivial to find words, phrases, auto generation scripts and so on to do the digital equivalent. In fact, I think there are several programs out there that will pull random lines of text from several sources on the internet, take a real message and create a image of some sort to lay information over top of it, all just to get around spam filters. (disable the display of image in your email and you will be surprised at what is underneath them sometimes).

    But something I can see this really having a problem with is how easy it might make the chance at setting someone else up to take a fall. Suppose you and I have emailed each other for quite some time now. I saved all our correspondence and farmed them to find phrases and word misspellings, cut and pasted them to make statements you never intended to make, then sent them off to threaten the president. Something even more disturbing, suppose we know each other in real life and I have the hots for your wife. I make my way into your house, plant some pipes and fertilizer beside some diesel fuel in one of your closets, get on your computer, sign up for a free email address from it using fake information and start spamming chat rooms and emailing government officials your intent to kill the president.

  • ...the King's English shall be for thee to hide thine criminal ways.

  • by Sara Chan (138144) on Tuesday March 08, 2011 @04:05PM (#35422616)
    The actual research paper is at

    http://www.dfrws.org/2008/proceedings/p42-iqbal.pdf [dfrws.org]

    Note that it was published in 2008. So Slashdot is reporting relatively quickly here.
  • I long ago gave up any idea that my writing would be very anonymous...

    As an American working in software companies in India for ten years, whenever managers sent out surveys they said would be "totally anonymous" I always figured with my American writing style (complete sentences, very few typos, no "spel it like u sa it", active voice, writing out our product and company name in full) everyone would recognize it was my writing anyway... And that was usually the case, as people who weren't supposed to know

  • Does this really come as a surprise?
  • I actually write in different styles, and used that for different RPG game systems and stories - now all I have to do is go to a nearby cafe (cant go a block without running into two) and use their free computers using different personas.

    In fact, I think I'll start studying the writing styles of Cheney, Rove, and Fnarf and using them as writing templates for my next posts ...

    Pretty easy to do.

    I think most of my current personae are quite radically different in writing style from my other published pseudonym

  • I'm sure I saw this in an episode of Numb3rs once. :-P

  • Their conclusion is completely off base. Even if their software is 100% accurate, it can only categorize a certain style of writing as having come from a single person (and that's still debatable since it's not too hard to duplicate type-written styles). What if every anonymous writer uses the same script to turn their text into "1337"-speak? The software would not have the ability to match the style to any one person -- it can only conclude that it is very likely that such types of a message was written by
  • It won't ever BE evidence, but it will lead to evidence. I'm sure the NSA uses software like this along with speech recognition software and voiceprint recognition software to create investigative leads for follow-up.

  • Every once in awhile, I get a trollish and insulting comment on my blog. Usually, the commenter leaves the name field anonymous but leaves a valid email address as an invitation for me to take the bait and respond. A quick google search of the email often reveals other trollish comments posted by the same user elsewhere on the internet, and usually they slip up at least once and leave their name. From there, it's pretty easy to find out more personal information.
    • by Suhas (232056)
      I don't buy this, What kind of an idiot will leave their real name on an internet post?

      -Trey
  • I hire someone from 4chan to ghost write all my correspondence....

    ... Y00 n00b!

  • Dennis Montgomery and his phony secret terrorist message decoding software comes to mind for some reason...

Be sociable. Speak to the person next to you in the unemployment line tomorrow.

Working...