Unmasking Anonymous Email Senders 204
alphadogg writes "Just because you send an email anonymously doesn't mean people can't figure out who you are anymore. A new technique developed by researchers at Concordia University in Quebec could be used to unmask would-be anonymous emailers by sniffing out patterns in their writing style from use of all lowercase letters to common typos. Their research, published in the journal Digital Investigation, describes techniques that could be used to serve up evidence in court, giving law enforcement more detailed information than a simple IP address can produce."
Pretty print it first (Score:5, Interesting)
run it thru pretty print or some other formatter before sending it.
Re:Pretty print it first (Score:5, Insightful)
They seriously think an 80% success rate is good enough to be used in court?
I'm betting the real reason is so they can go to a judge with their pseudo-evidence to get a warrant for more invasive spying.
Re: (Score:3)
I can't even see something as good as an 80% match rate on anything less than a full page of text, you'd need a damn huge sample size if you're going to be using typos and capitalization as "fingerprinting".
Also, doesn't this mean that a simple spellchecker and the auto-capitalization function on many smartphones would defeat this technology?
Re:Pretty print it first (Score:4, Interesting)
According to Wikipedia an 80% success rate is good enough for most civil cases, and indictment for criminal cases. These are best off a "preponderance of the evidence," or "more likely than not" standard (>50%). Criminal case decisions are based on a standard of "clear and convincing evidence," but 80% would be more than enough to get them in the door.
http://en.wikipedia.org/wiki/Legal_burden_of_proof#Examples
Re:Pretty print it first (Score:4, Informative)
They seriously think an 80% success rate is good enough to be used in court?
Why not? 19 states and many countries still admit polygraph tests into court, despite the fact that they are wildly inaccurate, and people can be specifically taught to deceive them.
http://en.wikipedia.org/wiki/Polygraph#Validity [wikipedia.org]
Re:Pretty print it first (Score:5, Funny)
"It's not a lie if _you_ believe it."
In totally unrelated news, my dick is a foot long.
Well I'd like to see that stand up in court.
Re: (Score:2)
80% success rate is worse than a lot of properly trained text categorization tools. I'm also suspecting from skimming the article that this system is even easier to throw off track than most text categorization tools built on solid algorithms.
Just word yourself a little differently, use the british spelling of a few words instead of your usual american spelling, try to use shorter or longer paragraphs than you would usually use et voilà, you are now a very poor match for those anonymous death threats s
I can imitate your writing style (Score:4, Insightful)
Even worse than false negatives would be false positives. Maybe those death threats to your boss sound just like you, use the same words you use, the same grammar, everything. That's because your jealous coworker pirated himself a copy of this program, fed some writing of yours through it, and then kept editing those death threats until the program claimed they sounded just like you.
Re: (Score:2)
That's because your jealous coworker pirated himself a copy of this program, fed some writing of yours through it, and then kept editing those death threats until the program claimed they sounded just like you.
The poison pen is ego driven. The ultimate DIY project.
Here is sampling of the real thing:
Perverts welcome.
When a male teacher was in the mood for a little sex between classes, militant teacher [Jane Doe] was happy to help out, by writing a hall pass for a female student.
Re: (Score:2)
Re: (Score:2)
Re: (Score:3)
Re: (Score:2)
An 80% success rate is good for two things:
1. Exculpatory evidence. "Defendant often filled the office printer with paper; that's why his fingerprint is on the death threat letter. Our software indicates only a 20% match to defendant's writing style, so it wasn't him."
2. Narrowing the suspect list. "We ran the writing analysis software against samples from 20 suspects. It got an 80% match against suspect #1, a 70% match against suspect #2 and no better than a 20% match against anybody else. So we focused th
good enough if their poor and/or the wrong race (Score:2)
I'd imagine this software would be very easy to fool if you wanted to commit the resources too. Or even just keep tweaking your fake document until you produce the desired result from the software.
Re: (Score:2)
Well, sure, if you're some kind of evil genius you can research how the cops figure you out, avoid all that and then hide the fact that you researched it too.
Re: (Score:2)
Imagine a Stasi, Egyptian SSIS, FBI, etc. officer who really really wants to lock you up for terrorist talk. He'll first write out what he wants you to say. He'll then look at your writing, make some changes, check the score, and repeat until score says conviction.
Re: (Score:2)
An 80% success rate is good for two things:
1. Exculpatory evidence. 2. Narrowing the suspect list.
Sounds more like a good way to misdirect your effort because you're chasing the wrong guy after incorrectly excluding the right one.
Re: (Score:2)
How is this modded insightful? This is how the justice system works - when the police can show reasonable probable cause, a judge issues a search warrant. Nor is it pseudo evidence - no more than partial fingerprint matches, noting that you're a left handed red head who matches the description of the left
Re: (Score:2)
Well then, it's a good thing I spell perfectly and don't make typos!
Dang it! I just incriminated myself, since I seem to be in the 1% of Americans that can.
Re: (Score:2)
They seriously think an 80% success rate is good enough to be used in court?
I'm betting the real reason is so they can go to a judge with their pseudo-evidence to get a warrant for more invasive spying.
The geek never gets these things right.
The standards for discovery in a civil case and for a warrant in a criiminal are not the same as the burden of proof at trial.
Judges do not like excluding relevant evidence even if that evidence is in some ways uncertain or imperfect. Few things in life are certain and perfected
The burden of proof in a civil case is light: "More probable than not." You may not even need a unaminous verdict.
Civil cases in state court
Size and unanimity requirements in civil cases vary considerably under state laws. Less than half the states require twelve-person juries, and about half the states allow for non-unanimous verdicts.
Calling textual analysis "psuedo evidence" leads nowhere. Courts have been a
Re: (Score:3)
run it thru pretty print or some other formatter before sending it.
If it's looking for writing style, not just punctuation, spacing, caps, etc., then you might also want to do an auto-translate back and forth from your language. But that would potentially provide another way to find you if you used an online translator.
Re:Pretty print it first (Score:5, Informative)
run it thru pretty print or some other formatter before sending it.
Nah .. run it twice though Google translate
Nah .. ejecutarlo dos veces a través de Google Translate
Nah .. twice run through Google Translate
The vodka is good, but the meat is rotten! (Score:2)
The vodka is good, but the meat is rotten!
Re: (Score:2)
The vodka is good, but the meat is rotten!
I just tried that famous phrase via google and got ..
The flesh is weak but the spirit is willing. => Russian => the flesh is weak but the spirit is ready.
I'm sure a native Russian speaker could debate the choice of "ready" over "willing" (Especially if I could paste from translate to here!)
Re:The vodka is good, but the meat is rotten! (Score:4, Informative)
For example, I've tried to translate the next Slashdot article's blurb:
"Google Voice users learned late Monday that the service now has a way of making purely Internet-based phone calls. Making a SIP call with a "sip:" prefix, the Google Voice phone number and @sip.voice.google.com skips the conventional phone network entirely, saving users cellphone minutes. Disruptive Telephony tested it and found that a call worked "great.""
"Disruptive" was translated as "explosive" in the sense of "trinitrotoluene", and "great" was translated as "big". Translating it back resulted in:
"Google Voice users learned late Monday that the service is now a way to make a clean Internet phone calls Make a call with SIP. "Sip:" prefix, Google Voice phone transmits the number and@sip.voice.google.com common telephone network fully, saving minutes of mobile phone users. Explosive Telephone tested it and found that the call worked "big""
You can probably still guess the meaning, but it's not exactly easy.
Re: (Score:2)
Re: (Score:2)
OTOH, that's more efficient than the old system of cutting letters out of the newspaper.
Re: (Score:2)
Re: (Score:2)
thanks.
you forgot another logical step.
first, we do what you said,
then just run it thru a case changer.
all lowercase sounds good.
all caps sounds better.
break up sentences like i'm doing, pretending that ...
we're holding a typewritter-style car-return over-compensation.
all this adds mental noise
"proving" the fact that the writer
is nothing more than yet another internet noob.
why not double any question marks and exclamation points!!??
throw in unnecessary ellipsis at the end of sentences for suspense
if your ma
Re: (Score:2)
This is called "stylometry": the algorithmic analysis of authorship based on the content of the work in question. There are many scholarly articles out there describing various algorithms out there you can find and read. Early efforts in this area involved testing the Shakespeare/Bacon hypothesis, who wrote which of the Federalist papers, and establishing the authorship of the 15th Oz novel.
The basic concept is pretty easy. I played a fair bit with the idea back a few years ago when I wanted to prove to m
Re: (Score:2)
Do me a favor and find out if the user "Clippy" [lesswrong.com] on LessWrong.com is one of the other posters and who that asshole is. Will pay in Bitcoins.
-- long time angry member of LessWrong.com pissed off at the fucking paperclip maximizer.
Re: (Score:2)
It would be interesting to run that exercise with a non trivial message and then give the results to a someone who hasn't seen the original and see if they can still make sense of it. (I'm guessing not, translation has a lot of subtleties even when using a standard vocabulary.)
Re: (Score:2)
I know someone who did that. Unfortunately, native speaker and practical joker are not mutually exclusive.
Re: (Score:2)
In the old days, we used to pipe our play-by-email instructions through jive(6) to avoid giving away too much info.
I could see anon email services offering a filter. The fastest way is to convert it to French and back.
Re: (Score:2)
Translate to and from some other language repeatedly until the translations are the same.
That way the writing style will resemble the translation program's more than your own.
An example, using this technique on the above text: http://translationparty.com/#8957181 [translationparty.com]
"By repeating the same part of the translation has been translated into other languages. Style, translation program, this method is beyond ourselves."
Re: (Score:2)
Original:
There were three men came out of the west, their fortunes for to try
And these three men made a solemn vow
John Barleycorn must die
They've ploughed, they've sown, they've harrowed him in
Threw clods upon his head
And these three men made a solemn vow
John Barleycorn was dead
They've let him lie for a very long time, 'til the rains from heaven did fall
And little Sir John sprung up his head and so amazed them all
They've let him stand 'til Midsummer's Day 'til he looked both pale and wan
And little Sir John'
Re: (Score:2)
Not good enough. Lexical analysis, dictionary analysis, etc produce enough information to be used as a piece of evidence in court and have been used in court for decades long before the Internet.
They are not reliable enough by itself, but taken together with other indirect evidence they can tip the scales towards a conviction.
Re: (Score:3)
For the lulz (Score:4, Insightful)
Re: (Score:2)
Re: (Score:3)
shake shake roll... Natzi!
Re: (Score:2)
E E Cummings, that blatant spammer (Score:4, Insightful)
Turns out most spam is written by e e cummings.
Who'd have thought it?
Finally, they can find that one guy (Score:5, Funny)
who always types part of the body of his message in the subject line.
Oh really? Well I wish them (Score:2)
the best of luck in their attempt.
It's best when the meaning changes to something (Score:2)
that's funny.
Too easy to fake (Score:3)
Yes but unlike writing this can be easily duplicated. Writing using someone else's style isn't an easy task. Doing it with a keyboard, very easy.
Re: (Score:2)
I sense a new market opportunity for "text laundering".
Re: (Score:2)
A perl script with a few lines of punctuation-removal and whitespace normalization would do wonders.
Re: (Score:2)
I am needing this URGENT but I have one doubt about the same.
Pls send codes to do the needful and revert.
Re: (Score:2)
Now, automate it and blast away!
Re: (Score:2)
Very easy to frame someone, too (Score:2)
What can be done can be undone. If this gets accepted as evidence in court, why not get a sample of someone's writing and duplicate it in a compromising message?
Behavioral Profiling rediscovered (Score:2)
Pretty sure profiling and behavioral analysis has been around for a long time.
Re: (Score:3, Funny)
Re: (Score:2)
But this is on a computer... On the internet. That's like double implicit innovation.
So you can patent it.
Verily, I am scrod (Score:2)
wherefore did I ever adopt such a distinctive writing style.
Re: (Score:3)
Re: (Score:2)
A new technique? (Score:2)
Re: (Score:2)
It's news BECAUSE it is on Slashdot, silly. :p
Interesting, but easily defeated (Score:3)
I'm not saying the research is worthless, but their techniques are easily defeated.
It would be simple to write a program that would iteratively "fuzz" your message with typos, lowercase/uppercase toggling, etc. and check the result against their algorithm until the message could no longer be tied to you.
I'm sure someone could do it in 10 lines of Perl, or less.
Re: (Score:2)
Re: (Score:2)
Isn't there a law or something like that that states that anything that can be written in 10 lines of Perl can be written in 1 line of Perl?
(Sorry for the amount of 'that's in the previous sentence.)
Re: (Score:2)
Isn't there a law or something like that that states that anything that can be written in 10 lines of Perl can be written in 1 line of Perl?
It was ruled unconstitutional last week.
http://www.supremecourt.gov/opinions/10pdf/09-1036.pdf [supremecourt.gov]
Re: (Score:2)
I'm not saying the research is worthless, but their techniques are easily defeated.
And...?
It would be simple to write a program that would iteratively "fuzz" your message with typos, lowercase/uppercase toggling, etc. and check the result against their algorithm until the message could no longer be tied to you. I'm sure someone could do it in 10 lines of Perl, or less.
Of course it would be easy to defeat. But document analysis techniques have been around for decades... maybe not this specific algorithm, but
Re: (Score:3)
As has been pointed out by others, in the past you couldn't auto-translate it into another language and back. You lose virtually all of the identifiable information that would help them analyze the document like that.
Re: (Score:3)
As has been pointed out by others, in the past you couldn't auto-translate it into another language and back. You lose virtually all of the identifiable information that would help them analyze the document like that.
And people still don't bother most of the time; so the tech is still useful.
For example, forensic fingerprinting technology is defeated by wearing gloves, but that hasn't rendered the technology irrelevant either.
Re: (Score:2)
And people still don't bother most of the time;
They will now. And in the real world, there's going to be far too many coincidences and people who, accidentally or not, use a different writing style than usual. Doing so is even more simple than wearing a glove. 80% in the real world? Not at all.
Re: (Score:2)
So what you'd be writing is a fairly simple minded encryption program... The output of which would likely be fairly recognizable and very likely would still contain the 'fingerprints' of the original writing. I.E. with sufficient text, they'd still be able to tell the difference between a message written by you, and a message written by me - but with the additional disadvantage of not having a 'reverse' function and thus making the text difficult to read by the intended recipient.
So I suspect it's much le
Simple (Score:5, Informative)
Use Google translate. Translate it into Spanish, then into German, then back into English, then into LEET.
It should be simple to obscure the style and weaknesses of the author with this method.
Re:Simple (Score:4, Interesting)
With Google Translate. Translate into Spanish, then German, then English, then in LEET.
It should be easy to hide the style of the author and weaknesses with this method
I was expecting some hilariously screwed up result, but that turned out rather well. It also masked your writing style.
Re: (Score:2)
It's also a fairly simple and trivial message. I suspect that a longer passage, more like normal text, would not survive too well. Translation is fairly complex, and even though Google Translate does a fairly good (albeit mechanical) job... I find that it often misses the nuances and suffers greatly if you use a vocabulary much above the grade school level.
Case in point, the text above run through the process above:
There is also a very simple message and trivial. I suspect that a longer p
Re: (Score:2)
I too was surprised, since I meant it mainly as a joke. I thought that English to Spanish to German would cause some amusement. Probably better to go outside of the European languages.
I was also surprised, especially since I did not mean it is a joke. I thought that the English and Spanish and German would cause some amusement. Probably best to go out of the EU languages.
So that was English to Swahili to Finnish to English. Not bad actually just one small edit for meaning and it would be pretty dang good.
Re: (Score:2)
I'd guess any repetitive technique like this with more than one obfuscation would make it increasingly unique and identifiable, no? If we're looking to lower the confidence of matching, maybe aim for the common denominator.
Perhaps it's better to write to a fourth grade level and just run everything against a common spell check engine, like the one in Outlook?
Re: (Score:2)
And make quite sure you're incomprehensible. Even one pass would make sure you sounded like a complete dick. But think about it, this wouldn't change your sentence length distribution, for instance.
Re: (Score:2)
So, give Google the evidence. I think I'll pass.
Re: (Score:2)
Google makes money data mining. You shouldn't trust them with nefarious anti-government translations; if they never delete one e-mail, they will definitely never lack the same logging for your translation activities.
I'm also starting to think that trusting my web searches to them all these years may not be such a good idea, even if their dashboard claims they've already deleted it.
That said, duckduckgo isn't as good for searching and lacks mapping and similar all-in-one google conveniences. Their translatio
I'd be easy (Score:2)
Re: (Score:2)
Ah. It's been a long time since I've thought about sacrifices to the line eater. [catb.org]
An old religion worshiping an unforgiving and primitive god. [catb.org]
I guess if your online writing style was incubated in the Usenet era, it might have enough quirks and idiosyncrasies to be identifiable.
Re: (Score:2)
Re: (Score:2)
I totally know what you mean.
^D
^D^D^D
^C
^Zbg
???
This is why I cut & paste (Score:5, Funny)
Untraceable.
The digital equivilent of cutting up magazines. (Score:5, Interesting)
It used to be that people would cut words from magazines and other papers to make ransom notes so no one could recognize their hand writing.
With this concept moving to the computer and internet, it will be trivial to find words, phrases, auto generation scripts and so on to do the digital equivalent. In fact, I think there are several programs out there that will pull random lines of text from several sources on the internet, take a real message and create a image of some sort to lay information over top of it, all just to get around spam filters. (disable the display of image in your email and you will be surprised at what is underneath them sometimes).
But something I can see this really having a problem with is how easy it might make the chance at setting someone else up to take a fall. Suppose you and I have emailed each other for quite some time now. I saved all our correspondence and farmed them to find phrases and word misspellings, cut and pasted them to make statements you never intended to make, then sent them off to threaten the president. Something even more disturbing, suppose we know each other in real life and I have the hots for your wife. I make my way into your house, plant some pipes and fertilizer beside some diesel fuel in one of your closets, get on your computer, sign up for a free email address from it using fake information and start spamming chat rooms and emailing government officials your intent to kill the president.
Forthwith (Score:2)
...the King's English shall be for thee to hide thine criminal ways.
Re: (Score:2)
Word, even unto thy mother.
The actual research paper (Score:5, Informative)
http://www.dfrws.org/2008/proceedings/p42-iqbal.pdf [dfrws.org]
Note that it was published in 2008. So Slashdot is reporting relatively quickly here.
Not Anonymous (Score:2)
I long ago gave up any idea that my writing would be very anonymous...
As an American working in software companies in India for ten years, whenever managers sent out surveys they said would be "totally anonymous" I always figured with my American writing style (complete sentences, very few typos, no "spel it like u sa it", active voice, writing out our product and company name in full) everyone would recognize it was my writing anyway... And that was usually the case, as people who weren't supposed to know
Uhm... duh? (Score:2)
This gives writers like me an edge in AC (Score:2)
I actually write in different styles, and used that for different RPG game systems and stories - now all I have to do is go to a nearby cafe (cant go a block without running into two) and use their free computers using different personas.
In fact, I think I'll start studying the writing styles of Cheney, Rove, and Fnarf and using them as writing templates for my next posts ...
Pretty easy to do.
I think most of my current personae are quite radically different in writing style from my other published pseudonym
Wow ... (Score:2)
I'm sure I saw this in an episode of Numb3rs once. :-P
This is progress... how? (Score:2)
By the way, which one's Pink? (Score:2)
It won't ever BE evidence, but it will lead to evidence. I'm sure the NSA uses software like this along with speech recognition software and voiceprint recognition software to create investigative leads for follow-up.
Just Google it... (Score:2)
Re: (Score:2)
-Trey
Won't work (Score:2)
I hire someone from 4chan to ghost write all my correspondence....
... Y00 n00b!
Dennis Montgomery? (Score:2)
Re: (Score:2)
Re:I recall - he is correct, mod up, not down... (Score:3, Informative)
Here is an except that proves anonymous post is correct:
But even Unabombers are not infallible. Exulting in his apparent mastery of the FBI, the master criminal made his mistake, in the form of a 35,000- word treatise on the "Future of Industrial Society", which he submitted to the Washington Post and New York Times. If they published the rambling, anti-technology manifesto, the writer said, he would cease his campaign. After much soul-searching, the two papers did so on 20 September 1995, on the advice of the FBI.
Relatives in Chicago were struck by similarities between some of Ted Kaczynski's earlier writings and the rambling musings of the Unabomber's tract, and eventually his brother informed the FBI. And so the trail of 18 years, dotted with 200 detained suspects along the way, led to a hand- built cabin near the Continental divide. But the tale may not yet be over.
Here is the article from the Independent [independent.co.uk].
I recollected that this was how the Unabomber was finally caught, via relatives who read his writings and recognized him... I respect that some mods might not like anonymous cowards, but if they are correct they should not be modded down, at least not to be fair.
Re: (Score:2)
Or maybe that person really COULD care less, but their current level of caring is so low it doesn't matter.