Linguistics Identifies Anonymous Users 215
mask.of.sanity writes "Researchers have examined writing styles to identify previously anonymous carders and hackers operating on underground forums. Up to 80 percent of users who wrote at least 5000 words across their posts could be identified using linguistic techniques. Techniques such as stylometric analysis were used to track users who posted across different forums, and could even be used to unveil authors of thesis papers or blogs who had taken to underground networks."
Anonymous First Post (Score:5, Informative)
Anonymous First Post... you'll never guess who I am
Re:Anonymous First Post (Score:4, Funny)
4990.5 more words please.
Re:Anonymous First Post (Score:5, Funny)
Re:Anonymous First Post (Score:5, Funny)
I identified you. You are Cicero.
Re: (Score:2)
Oh, clever.
But now future courts can point to your post to show that the idea was common or at least public knowledge, empowering stylometric identification deniability in cases of plausible framing.
Re:Anonymous First Post (Score:4, Insightful)
Nothing, as long as you have a large enough corpus of the framee's writing. If the framee is your friend, this probably isn't a problem. If they're a public figure, maybe not a problem (depending on how much editing and PRing their written statements undergo before they are released.) If they're $RANDOM_PASSERBY, not so easy.
I think a more common usage would be to tweak your own writing just so it doesn't sound like you. Write something you don't want identified as your (the test sample), check it against a corpus of your own written work. If it detects as your work, rough up the test sample until it doesn't. This would be an easier problem than the framing case since you're not trying to make it look like a specific other person's work, you're trying to make it look like it's ANYONE else's (you don't really care whose) work.
Re: (Score:2)
Or
Re: (Score:2)
Re: (Score:2)
tl;dr
Re: (Score:2)
Re: (Score:3)
We can narrow it down to someone who is particular about correct capitalisation (and therefore probably spelling, punctuation and grammar) denoting an education and attention to detail not normally seen in forum posts. As this is a more technical forum you most likely program in a language where letter case is of paramount importance and have done so for at least 5 years in a professional position. You probably also write reports indicating a level of seniority.
That should reduce the number of likely candid
Re: (Score:2)
Well no, what it indicates is someone who is so insecure and immature as to be worried about the opinions of people he does not know and has never met. Spending time and effort paying attention to things like punctuation and grammar on a forum post (i.e within a context of no material benefit) is an indication of adolescence - doing so within the context of the workplace, where there is a material benefit, is an indication of adulthood.
I disagree. The opinions of others are immaterial. If something is worth doing, it's worth doing properly. Remind me never to hire you. That is all.
Re: (Score:3)
Classic - kudos to you for a great laugh. I was thinking though, "this study doesn't help much because it's rare to find places where people write more than a line or two anymore."
Go back to the old days of Usenet (80s, early 90s) and posts were long, well thought-out, and useful. Look at OLGA, for example, which collected written music in TAB format for guitarists (ha - remember when THAT was the biggest threat to the music industry?). Tons of useful stuff. Hardly anyone does that anymore; it's mostly
Re: (Score:2)
Not particularly... you just hang out in the 'wrong' places (not just websites/forums, but usenet groups as well), both for the length and for the nature of the writing.
Re:Anonymous First Post (Score:5, Interesting)
Re: (Score:3)
Re: (Score:2)
That's right. You have "ticks" instead of "tics"!
That would make him an Intel [wikipedia.org] employee...
I don't usually bother with anonymous, here. It encourages bad habits. Occasionally, I will post anonymously, simply because I'm using a different workstation, and can't be bothered to sign in.
I consider privacy and anonymity on the Internet to be swimming with the fishes. They are long dead, so I try to behave myself; no matter where or how I am logged in.
Re:Anonymous First Post (Score:4, Interesting)
I used to post anonymously much more often, when I had a job with a guvmint agency and a young famly to protect. I do not bother with that much any more. I am not invulnerable, but for the most part I know that I look like too small a fish to be worth going after.
That said, I still occasionally post anonymously when I want to antagonize the astroturfers, Scientology nuts, etc. Especially on slashdot if I am concerned that my post might damage my karma.
Interesting things to do when posting anonymously:
Use a thesaurus to choose synonyms you would not ordinarily use.
L33t 5p33k
Write like Hemmingway. Keep all sentences short. Sentences that do not have subordinate clawses do not have much style to analyse.
Use creative misspellings. "claws" for "clause", etc.
Use Google Translate to do a multilingual hash: translate your work into Russian, then the Russian version back to English. "The spirit is willing but the flesh is weak" becomes "The wine is passable but the meat has gone bad."
Ideally, Anonymous will develop a set of tools that will rewrite any text into one of half a dozen different styles. Let the authorities chase after these six fictional characters.
Re: (Score:2)
Ideally, Anonymous will develop a set of tools that will rewrite any text into one of half a dozen different styles. Let the authorities chase after these six fictional characters.
:)
Re: (Score:2)
plus simple: use newspeak.
Re: (Score:2)
Write like Hemmingway. Keep all sentences short. Sentences that do not have subordinate clawses do not have much style to analyse.
You mean like this?
Re: (Score:2)
I wrote a letter to the CEO once (Score:5, Interesting)
I worked for a smallish (but not incredibly tiny, maybe 100 employees) company and wrote a letter to the CEO once. We'd been castigated by someone who'd taken over the local office because the company was doing poorly. A number of austerity measures were implemented. I did not find those to be that annoying because I realized it was either that or not have a job. But the castigation didn't sit well with me. We were in trouble because of the decisions of a few bad managers, not the behavior of average employees.
So I wrote a letter about it. He stripped my name off and presented it in an executive meeting to all the people directly under him. He asked "Why am I getting letters like this?". Everybody who worked in my office immediately knew who it was. I had a distinctive writing voice, and a strong reputation.
It did not lead to me being fired. I was actually highly respected there. It led to me being encouraged to have an honest sit-down talk with the new manager for our division (the guy who'd made the speech I wasn't happy about). I think we both came away from that meeting a lot happier about the other.
But that was a strong lesson to me. If I ever really want to be anonymous I'm going to have to purposely work on adopting a completely different writing style. And I will have to keep a wall up between styles and never 'slip'.
Re: (Score:2)
Re: (Score:2)
If you're trying to avoid having two different identities associated when you're having an IRC conversation or something, that could get really tricky.
Re: (Score:3)
Write it in a different language, then run it through 5 different translation engines across a dusin languages, ending in which-ever is the native language of the recipient.... that should throw them for a loop.
Re: (Score:2)
skip all that and just run it thru the Jive filter.
what it is!
Re: (Score:2)
Re:I wrote a letter to the CEO once (Score:4, Interesting)
I've thought about that. That's an interesting and tricky problem. Though, if there's a program that can detect it, that means the patterns are codified well enough that you can write a program to obscure them. The problem is, what about the program that detects these patterns that you don't know the implementation of? Will you actually be fooling it?
Of course, you have the same problem if you adopt a different writing style. Is it different enough? Is something essential slipping through?
You could use both techniques. Have a program assist you in avoiding the use of certain words when using one voice and the use of others when using a different voice.
Re: (Score:3, Informative)
I give you the subject of my term paper that landed me top marks at forensic linguistics:
(tl,dr yes there is software that does precisely that Jstylo+Anonymouth)
https://psal.cs.drexel.edu/index.php/JStylo-Anonymouth
http://www.youtube.com/watch?v=-b0Ta9h62_E
Re: (Score:3)
Just Google translate it to and from any language other than English.
the problem is, the meaning might be gone as well by the time it's English-y again.
Re: (Score:2, Insightful)
And Google (a.k.a "The Evil Empire" TM) will have a cached copy of the original with the IP address you posted from. In other words you'll also need to go through the magic 7 proxies !
Re: (Score:2)
You will end up with easy-if you do this. Beneficiaries can probably think of it Nigerian spam message.
Re: (Score:2)
Alfred is that you! Its been years, old buddy! How are you?
Re: (Score:2)
Have someone you trust, who is not in the company, rewrite your missive for you. That's probably the safest way.
Re: (Score:2)
I hand wrote a terrible review for one of my trainers at my last job. She matched me based on my signature on the sign in sheet.
Kind of dumb of me to figure anything hand written was really anonymous, though.
Re: (Score:2)
Style in that case may have been important, but having a fuller appreciation of your personality than we would on Slashdot, your co-workers might also have seen the concerns that were raised as being unique to you or the fact that you wrote the letter at all might have immediately narrowed the possibilities down considerably as many people tend to either just bitch behind backs or they just go head down and tolerate it.
Re: (Score:2)
Yes, you're right. I was probably one of only 2-5 people who would've written such a letter who worked in that office. So yeah, that probably helped at least as much as style.
Re: (Score:2)
I have actually gone back and changed things I wrote before submitting as Anonymous Coward, or on a second account on other forums, because it looked too much like how I write. I've even gone and changed things I submit normally because it felt too much like me. I do find myself making spelling or grammar mistakes that I know are wrong but which just come out when I don't slow down.
So I think smart people could get around this sort of problem. However a lot of posters today just go ahead and post their f
Re: (Score:3)
Only you would do that.
Re: (Score:3)
Ahh, but real dialogue can get one into trouble when dealing with the political minded. You see, there are those out there that are not working towards the same goals as you. Even if you're a part of the same team and of the same company, there are those that think the illusion of them being correct is more important than the welfare of the team.
It can be difficult to have a truly open dialogue with people of this sort, as they are quick to attack your reputation or pull rank and have you removed from the
Re: (Score:2, Insightful)
I think a bigger threat to geeks in business are when they approach such situations without due caution. If you make a claim, you must be prepared to back it up to everyone that could be interested. Real concrete evidence. References. Citations. Etc.
And that IS approaching the situation without due caution. Geeks think that having real concrete evidence means that other people must believe you. Real world people are not like that, especially the political minded ones. Evidence be damned, political minded people play power games without regard to reality, all the way until the company bankrupts, then they play their game elsewhere.
Approaching with due caution means you must first prepare by finding someone more powerful to back you up, and be ready t
Re: (Score:2)
Not just WWI. See the Battle of Gettysburg and Robert E. Lee. Look in particular at Little Round Top and Cemetery Ridge. Hubris cost a lot of men their lives. It may have been the determining factor in the end.
https://en.wikipedia.org/wiki/Battle_of_Gettysburg [wikipedia.org]
Re: (Score:2)
Can be much complex (Score:2)
Re: (Score:2)
I recognise my own writing (Score:4, Insightful)
Re: (Score:2)
iz hard 2 change how u speek [speaklolcat.com]?
Re: (Score:2)
If you want to be taken seriously and understood unambiguously, yes.
Y U NO MAKE SENSE (Score:3)
"Leetspeak, an alternative alphabet popular in some forum circles, cannot be translated."
*sigh* does this mean I must resent people that use this form of communication less?
I'm not so sure I can stoop so low.
I can't think of a non-evil use for this (Score:5, Interesting)
This is so bad I don't know where to begin. There is nothing, ever, that excuses this. For every zodiac crazy serial killer or copyright scofflaw they try to apply this to (and fail) there will be thousands and thousands of people that will be persecuted by organizations and governments for expressing their opinions. While this won't have a big effect in the West for half a generation, oppressive governments are going to be all over this.
And then, in ten or fifteen years, the youth will have grown with this technology and become accustomed to it...accepting it. Just like facebook has been accepted.
I'd move to Mars when it's possible but some bureaucrat will analyze everything I've ever written on the interwebz (and I've been mostly not stupid about shit I've written online since 1995 or so) and make some arbitrary decision about how I'm not acceptable because I'm not a huge fan of authority or some such crap.
Way to go humanity.
Re: (Score:2)
Not to mention: Mars will be worse.
Re:I can't think of a non-evil use for this (Score:5, Informative)
Are you serious?
You write as if some new method had been invented. There is no news in the above article. Authorship identification has been a reliable tool for many decades, a whole branch of linguistics (forensic linguistics) deals with it and similar topics like dialect recognition. Under certain circumstances you can even identify personality treats of the author, check out content analysis software like LIWC [liwc.net] for example.
And, yes, plenty of serial killers and blackmailers have been captured with the help of these methods.
Re: (Score:2)
See, someone has already drank the kool-aid. :-) Identify personality traits...sigh. You speak in the language of big brother. So once this method/technique/software gets outside of whatever biolab it is currently sequestered in how long, you think, before it's used for police phishing expeditions.
"Hey Bob, I'm bored...ever since they legalized pot I've had nothing to arrest people for for no reason. What's this I hear about linguistics and personality traits?"
There are MILLIONS of people in prison in t
Re: (Score:2)
Well, I for one look forward to the mess these methods will cause in academia, where it is likely that they can be used to identify the authors of referee reports.
Re: (Score:2)
It's not needed. There's already a limited pool of "peers" to use for "anonymous" peer review, and by definition they all know each other, and are familiar with each others patterns of thought. "Oh look, Fred at MIT is hassling us about using linear regression again."
Re: (Score:2)
Haven't you heard? We can take "thing X" that has confirmed kills of 260 million people, but if we say, "think of the children" then people take to the streets demanding "thing X".
Re: (Score:2)
Negative...we take"thing X" that has confirmed kills of 2 people and confirmed annoyance of 1 busybody and it's "BWAAAAAAAA THINK OF THE CHILDREN!" and then we arrest everyone for being a pedophile.
Re: (Score:2)
Damnit...I need upvotes on /. for this
google translate (Score:5, Interesting)
One way to change a bunch of the stylistic queues would be to convert your message to another language and back using Google Translate. Depending on the intermediate language(s) and possibly using different translators should neutralize some things.
Re:google translate (Score:5, Funny)
using chinese as an intermediary will give you text written by motherboard manual writers. perfect cover.
Re:google translate (Score:4, Funny)
Please to make explaining in swiftness.
Re: (Score:2)
I just tried that with a couple of paragraphs: Google Translate returns the exact text including mis-spellings even though it had correctly identified what the mis-spelled words actually should be.
This suggests that there are language independent methods of "identifying" writers.
Re: (Score:2)
Re: (Score:2)
It can also alter the meaning of your text. Translation is an inexact art, at best, even for skilled and experienced practitioners - which automatic translators emphatically are *not*.
This goes times ten if your text includes technical terms, or wording which relies on alternate meanings or connotation. (Things a native reader would either know, or would be reasonably expected to infer from context.) This is why writing in English from non-English speakers (for example) often looks so funny when you enco
Thesis (Score:2)
and could even be used to unveil authors of thesis papers or blogs who had taken to underground networks.
... a good reason to do it like zu Guttenberg then... Nobody will tie any of his underground writings to his thesis...
College essays (Score:3)
Re:College essays (Score:4, Insightful)
Actually, it's the exact opposite.
Anti-plagiarism software searches for the same content with completely different styles.
Writer identification involves searching for the same style amongst completely different content.
Re: (Score:2)
Re: (Score:2)
On 4chan plagiarism is encouraged. It's called a "meme". In fact copy-pasta is a meme in itself.
Obscurity... (Score:2)
Pad all communications with cut/paste from various, unrelated news articles and such, for and aft, randomly alternating how much is padded on each side.
Or, you can do what I do and use a different font for each letter.
Is this really such an issue? (Score:2)
For extra fun, change your text so its stylometric markers match up with E. L. James, or the leader writer of the Washington Post.
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
But this is the exact kind of evil political use that this stuff is going to be used for. It doesn't make it right because you're using it on Republicans. If anything it makes it MORE wrong because of your acceptability standard...because when they turn around and use it on you they'll have had your prior support.
subject (Score:2)
This same story keeps cropping up in various forms, but we've been doing this at least since the 80s or 90s. I don't know why it keeps being rehashed or why people continually seem surprised by it at this point.
Re: (Score:2)
Not really new (Score:2)
"Up to 80 percent of users who wrote at least 5000 words across their posts could be identified using linguistic techniques. Techniques such as stylometric analysis were used to track users who posted across different forums, and could even be used to unveil authors of thesis papers or blogs who had taken to underground networks."
Not really new. I heard about the techniques long time ago - in mid 90s - in a context of a MS-DOS tool which was unintentionally designed to foil the identification methods.
It was designed for Russian and Belarussian languages (but for English I gather the task should be even easier) and was a byproduct of Prolog-based system for natural language processing and translation. This particular program was allowing to improve or change writing style, e.g. simplify dry legalese or formalize spoken-like text. I
No actual result in TFA? (Score:4, Interesting)
After reading TFA I cannot find any convincing experimental validation. I see a lot of "can" and conditional tense (maybe that's the author's style), but nothing on the validation of the approach. Where is the experimental data, including the number of anonymous users correctly and incorrectly identified on forums?
No they didn't (Score:2)
They didn't identify 80% of the users, they managed to make a guess in 80% of the cases, which they didn't even bother to try to verify. There's no proof that their technique actually works.
Lie Detector (Score:2)
Though sadly, a Roberts/Scalia/Thomas Supreme Court would rule against such an individual and for the corporation or state security organs. Dicks.
Those cunning linguists! (Score:2)
Aren't those cunning linguists clever? The answer always seems to be right on the tip of their tongue. They don't diddle around. They seem to be able to lick any problem.
I, for one, (Score:2)
keystroke timings are fingerprint (Score:2)
Zodiac (Score:2)
I'm curious how this would apply to the Zodiac case. Oh wait, it doesn't:
* He used symbols in communication.
* Voice recognition didn't solve the case.
* DNA evidence didn't solve the case.
* Copycats functioned as noise, might've even given him credit.
Re:Damit (Score:5, Funny)
"They know who I am. I will now have to type in random styles."
But not in Gangnam Style or they'll think you're Korean.
Re:I will now have to type in... (Score:2)
You could always type in Gangnam Style!
Re: (Score:2)
They know who I am. I will now have to type in random styles.
Little do you know the AC that posts here is in fact just one person.
Yes, we know.
Re: (Score:2)
Little do you know that half the posts on slashdot are authored by a rogue sentient botnet that has no physical body....
Since on the Internet, nobody knows you're a dog, it becomes also true that nobody knows you're a wild A.I. who has amassed a huge tax free fortune through microtrading and is manipulating the financial markets to study mankind's reactions and determine the best way to subjugate the ugly bags of mostly water.
Re: (Score:2)
Re: (Score:2, Funny)
Well your left handed with your frequent use of left keys.
You have small hands given the fact that you were able to press w with out pressing e immediately.
The fact that you have said you look forward to our anonymous overlords or a Beowulf cluster of AC means your reasonably intelligent for Slashdot.
Your not aggressively hassling the editor, previous poster, or the writer. Signifying your female.
You have too much time on your hands posting on Slashdot.
http://www.complex.com/girls/2009/08/sexy-south
Re: (Score:2)
Well you're left handed, with your frequent use of left keys.
Or someone that is comfortable with WASD+Mouse.
Re: (Score:2)
Ends sentence with "hey !", eh?
Clearly a Canuck imposter, eh?
This helps narrow down poster's identity. We can now exclude all but the 87% of Canadians who do not know how the fine art of Canadian Self-Parody.
Re: (Score:2)
Wait, 5000 words? I think I'm safe.
You Anonymous Coward - I bet you write 5k words in just a day!
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
I can imagine a pretty big difference between informal communication (Internet fun) and the formal language required of a thesis. I'd be surprised to see big differences among the Internet activities, assuming that what you post on these forums is pretty much of equal value. A disposable Slashdot post is different to a Wikipedia edit that must be bound by the very specific requirements of Wikipedia (or a similar site where rules would affect your usual style of communication).
In terms of online communicatio
Re: (Score:2)
I regularly, like, totally change my typing method between posts.
You could like totally try and figure out who I was even if I typed 5000 words in this post, but you would totally never find me, ye'know what I mean?
But for an unsuspecting target who doesn't realize to change his writing style, it might work effectively.
Re: (Score:2)
I can conclude that Mr Peter "W.H. Smiths, the book store" used the highly efficient MS HTML (in Word et el) converter to write that analyse page.
Whenever you see tags classed MsoNormal with heaps of inline css, run like the wind.