Forgot your password?
typodupeerror
Privacy Software Science

When Writing, How Anonymous Can You Be, Really? 184

Posted by timothy
from the go-ask-your-ghostwriter dept.
An anonymous reader writes "Do you still think your online writing is, basically, anonymous? Think again! Research has it people put much of their personal traits into their writing, and computers may just be able to pick them up. That's at least what a recently announced competition on author identification (Given a document, who wrote it?) and author profiling (Given a document, what are its author's age and gender?) wants to find out. Alas, re-using other people's writing is no solution either; there's also a competition on plagiarism detection (Given a document, is it an original?). Wanna revisit your recent rants?"
This discussion has been archived. No new comments can be posted.

When Writing, How Anonymous Can You Be, Really?

Comments Filter:
  • Yes, we know (Score:1, Informative)

    by Anonymous Coward

    As previously reported [slashdot.org] on Slashdot. Now, please identify me. Here's a hint: I have a 5 digit UID.

    • Re: (Score:3, Funny)

      by Anonymous Coward

      I got this one. You sir are Anonymous Coward, with UID 00666. Now, what prize do I get for this?

  • by Anonymous Coward

    >throw machine at 4chan
    >"Identify!"
    >all posters sound the same
    >machine concludes all posters are part of a highly advanced AI
    >machine becomes depressed that it will never create anything wonderful like the spaghetti threads or /mlp/
    >kills itself
    >mfw

    Based on the above, who am I?

    • by Sasayaki (1096761)

      Moot? Is that you?

    • Re: (Score:3, Informative)

      by durrr (1316311)

      >Based on the above, who am I?
      Anonymous

    • >mfw

      Based on the above, who am I?

      I'm guessing a retard who doesn't understand that this abbreviation means "my face when".

      • by durrr (1316311)

        Maybe he don't have a face?

        • by Genda (560240)

          Are you suggesting this poor soul has a butt at both ends?

          • Are you suggesting this poor soul has a butt at both ends?

            Yeah, like that's a rare condition in the world today.

  • Uh huh... (Score:3, Interesting)

    by Anonymous Coward on Sunday December 16, 2012 @06:47PM (#42309029)

    Like facial recognition.... I am sure this works wonderfully when it only has 10 or 20 exemplars to compare against, but it fails miserably as it scales up. Good luck conclusively identifying an author when there are over a million profiles to potentially match with.

    • by Mitreya (579078)

      Like facial recognition.... I am sure this works wonderfully when it only has 10 or 20 exemplars to compare against, but it fails miserably as it scales up. Good luck conclusively identifying an author when there are over a million profiles to potentially match with.

      Or like fingerprints that start giving off larger number of false-positives when compared against a large enough database of entries.

      Consider this: they don't have to conclusively identify the original author. It will be good enough to find someone with similar writing (i.e. also a subversive) and charge them instead of the original perpetrator. And good luck proving that you didn't write that

      Mmmm, a national database of writing samples collected from everyone in school... that sounds like fun.

      • by russotto (537200)

        Mmmm, a national database of writing samples collected from everyone in school... that sounds like fun.

        I never thought not doing my homework would pay off so well :-).

        There's definitely going to be false positives. I've seen other people's writing that was nearly word-for-word identical with my own, and there's no way they saw mine (nor I theirs) before writing it.

        • The larger the sample of a person's writings, the more accurate this thing will become, of course. The nature of the writings will also influence the accuracy. In school, even an essay is going to be very similar to other people's essays, as they are unlikely to contain a lot of original thought. Everyone is doing their best to feed the teacher the responses that they believe the teacher wants to be fed.

          Now, if your ex girlfriend were to give these researchers everything that you ever wrote to her, there

          • Re: (Score:3, Interesting)

            by Anonymous Coward

            Well put.

            As a test, I just looked through my own posts on slashdot and selected a four word string I use pretty often that seemed somewhat unique, but not obviously so.

            I combined that string (in quotes) with site:slashdot.org on Google. At least two of the results returned in the first page were me, made over the course of the last few weeks.

            Now of course there are others that used that in their posts, but had someone picked that string from something I posted AC they'd know there was a good chance it was m

          • Everyone is doing their best to feed the teacher the responses that they believe the teacher wants to be fed.

            Interesting. In fact, I tried to screw the teacher with my original essays because I was smart enough both to do it and afford screwing the teacher. That activity, however, was limited to my native language and English.
            It was fun.

      • by chrismcb (983081)

        Consider this: they don't have to conclusively identify the original author. It will be good enough to find someone with similar writing (i.e. also a subversive) and charge them instead of the original perpetrator

        I doubt a major news network, would ever just blindly claim the wrong person did it.

      • they use the analysis to identify a small range of who to watch to find certain confirmation they have the right guy

        law enforcement tools are not limited only to 100% certain ones. the fuzzy ones are used to narrow down a list of targets, where law enforcement's limited manpower can be better spent to find certain confirmation

        It will be good enough to find someone with similar writing (i.e. also a subversive) and charge them instead of the original perpetrator. And good luck proving that you didn't write th

      • by westlake (615356)

        Or like fingerprints that start giving off larger number of false-positives when compared against a large enough database of entries.

        The false positive is significant only when it is plausible.

        Their partial prints may be a close match, but the 86 year old wheelchair-bound Vet in a hospice on Staten Island is probably not the killer who shot up a liquor store in Buffalo last week.

      • re: Mmmm, a national database of writing samples collected from everyone in school... that sounds like fun.
        :>(
        A national database of writing samples... it already exists for a large number of high school and for an even larger number of college students. It's called Turnitin.com http://en.wikipedia.org/wiki/Turnitin [wikipedia.org] and many students are required to submit their homework essays through their site. Some students have sued to not include/submit their work, and some have sued (with two examples of studen
    • by vlm (69642)

      The profiling competition only bins into 10s / 20s / 30s which seems extremely lame. Can't they at least try Myers Briggs category or something? Maybe that would be too patented/copyrighted....

      Wake me when they make something monetizable (OK the categorize tool reports: highly educated, technically oriented, 30s, raised in the midwest, ultra low TV viewing quotient, classical education literature coeff extremely high, also a high sci fi reading coeff, verbal indications of extreme physical attractiveness

    • ... it fails miserably as it scales up.

      Text recognition was good enough to identify Ted Kaczynski [wikipedia.org]. One thing that helped in Ted's case was that they had a lot of text. His manifesto was 35,000 words.

      • by Kjella (173770)

        Quoting from that WP page:

        which led to his brother and his wife recognizing Kaczynski's style of writing and beliefs from the manifesto

        It's a whole different thing to recognize a person's beliefs - if possibly in a more extreme form - than what they've written on an entirely different subject. Quite possibly they recognized specific examples, theories, arguments or conclusions he had used as well. I'd wager this was 99% content and 1% style which really clinched that it wasn't some other crazy nut bag with the same ideas. I recently ran into one online that had some rather unique conspiracy theories, if they start

    • It is comparatively easy to establish that Author A did NOT write a particular document, than to prove conclusively that he DID write that document. In the latter case, you can establish that he probably wrote it, with a high confidence level in your findings, but it's not conclusive proof. It probably is proof enough to get a warrant to examine his computer(s), in an attempt to get that conclusive proof.

    • by invid (163714)
      They don't have to conclusively determine the author based only on text analysis. It is just one tool to narrow the search sufficiently to use other means to conclusively identify someone.
  • by sandytaru (1158959) on Sunday December 16, 2012 @06:49PM (#42309039) Journal
    Google thinks I'm a 20 year old male. I'm in my early thirties and a gal. I think visiting Slashdot so much throws off its algorithm, as does all the video game sites I hang out at. You'd think the searches for things like "gel nails" might tip them off, but it's probably further confused by my lack of visits to Pinterest.

    I'd be interested to see if this program can do any better at analyzing my writing than Google does analyzing my search history.
    • by monkeyhybrid (1677192) on Sunday December 16, 2012 @06:55PM (#42309073)

      Thank you for updating your age and gender details in our databases.

      Yours sincerely,
      Google.

    • by Kergan (780543)

      I second that. According to Google, I'm an old, obese dude in desperate needs for new abs and viagra. Go figure.

    • by Anonymous Coward on Sunday December 16, 2012 @07:00PM (#42309105)

      Google thinks I'm a 20 year old male. I'm in my early thirties and a gal. I think visiting Slashdot so much throws off its algorithm, as does all the video game sites I hang out at.

      I think you misunderstand the purpose of the algorithm. A writing sample is, of course, insufficient to detect your age and gender precisely.

      There is a good chance that your writing style matches that expected of a male in their twenties, in which case the algorithm had done well. You may be a gal, but your interests and behavior is perhaps more similar to that of a male in their twenties, and for the purposes of predicting what to sell you or what to expect from you, that's actually more accurate than your actual stats.

      • by pla (258480)
        Why did you post as AC? You have the single most insightful comment yet!

        No one interested in this tech cares what reproductive hardware you have - They care what you'll buy, simple as that.

        Kudos for the good call!
    • by Anonymous Coward

      Since this is Slashdot, I'm betting you actually are a 20 year old male.

      • See, that's where the Google algorithm programmers got lazy. They assume that too.
        • M.I.T. (instead of GaTech ) has an affiliated program at the University of Georgia? I would have thought/expected that Georgia Tech would be the affiliate program, rather than Massachusetts Institute of Technology...
          • It's the acronym for the degree program - Master of Internet Technology.
            • Gracias! Pardon my misunderstanding, and Thanks for telling me. The first thing that came to my mind was M.I.T. the school in Massachusetts. (My dad said that when he first moved to San Diego he kept misreading the "S.D." initials in the local paper as "South Dakota" and kept wondering why they would print so much news about South Dakota here in California. His "L.A." frame of mind must have rubbed off on me) Good luck with your Masters Degree!
    • by demonlapin (527802) on Sunday December 16, 2012 @07:38PM (#42309339) Homepage Journal

      You'd think the searches for things like "gel nails" might tip them off

      Nah, just makes it think you're emo.

    • by pla (258480)
      I think visiting Slashdot so much throws off its algorithm, as does all the video game sites I hang out at.

      Back in my youth, a friend consciously chose a handwriting style specifically to throw off so-called "handwriting" analysts. Of course, he chose to incorporate all the worst traits possible, meaning anyone looking at a sample of his writing would either immediately get the joke, or would back away slowly in fear for their life.

      Funny to think that in the modern world, "handwriting" has become an a
    • I'd be interested to see if this program can do any better at analyzing my writing than Google does analyzing my search history.

      I don't know where you can see what Google thinks you are, but I have used various tools that analyze my writing and then try to guess my age and gender, and well, all those tools typically guess me as a 40+ male. I do not expect any program based on this research to be any better.

  • by vlm (69642)

    This would have been a lot more fun about two months ago to detect paid political astroturfers.

    The ultimate AI-ish application would be an astroturfer plugin for chrome probably called "AstroturfBlock". So the site is a "tech" site, the contents are pure politics, and the text analysis system indicates an unemployed liberal arts degree holder... Go ahead and block it.

    • Re:astroturfers (Score:4, Insightful)

      by sco08y (615665) on Sunday December 16, 2012 @07:25PM (#42309247)

      This would have been a lot more fun about two months ago to detect paid political astroturfers.

      The ultimate AI-ish application would be an astroturfer plugin for chrome probably called "AstroturfBlock". So the site is a "tech" site, the contents are pure politics, and the text analysis system indicates an unemployed liberal arts degree holder... Go ahead and block it.

      How is it going to detect whether people were paid to write something?

      • by Mitreya (579078)

        The ultimate AI-ish application would be an astroturfer plugin for chrome probably called "AstroturfBlock".

        How is it going to detect whether people were paid to write something?

        You also need a blacklist database of known astroturfers (well, their writing samples, you don't need their identity) for this system to work

      • by Jeng (926980)

        There are usually key words they are paid to promote in their writings, for search purposes.

      • by vlm (69642)

        How is it going to detect whether people were paid to write something?

        Thanks for pointing out a minor bug in my project design. The answer, of course, is it doesn't matter. If a "tech" site is getting flooded with unemployed journalism grads posting stereotypical political talking points who cares if they're being paid or not, block the fools.

        AstroturfBlock would be exactly like how I don't care if an ad account is in collections with the middlemen, or its a donation, or whatever, I just want adblock to block ads.

        • by sco08y (615665)

          Thanks for pointing out a minor bug in my project design. The answer, of course, is it doesn't matter. If a "tech" site is getting flooded with unemployed journalism grads posting stereotypical political talking points who cares if they're being paid or not, block the fools.

          AstroturfBlock would be exactly like how I don't care if an ad account is in collections with the middlemen, or its a donation, or whatever, I just want adblock to block ads.

          Okay, fair enough. The major bug, then, is that astroturf works because people buy it. Like all the fake shit *constantly* going around Facebook.

          You're trying to solve the troll problem: blocking the troll is easy peasy, it's blocking all the assholes who feed the troll that's the problem.

          Okay, granted, "deny: *.facebook.com", but there are a lot of false positives there.

  • One example are the company performance surveys, that are supposed to be anonymous. I cant answer questions like 'how do you think the company leadership is doing' without effectively giving away who I am - my opinion is based on my position, and thus is easily inferred.

    • by Mitreya (579078)

      I cant answer questions like 'how do you think the company leadership is doing' without effectively giving away who I am - my opinion is based on my position, and thus is easily inferred.

      You _could_ try talking to people in different positions (and write from their perspective) to solve that problem :)

      It could be that one of your underlings is already writing responses tailored to look like it is written by someone in your position in hierarchy.

      Anonymous surveys are easily gamed.

    • Well I think a more impressive piece of software than this one would be a program that can "understand" writing and deconstruct the meaning of the message to it's simplest form. If all anonymous writers used the same software, they would all have the same "style."

      Obviously you would only use it for writing that needed to be anonymous, because a large part of writing is the personality you put into it.

      Here's an example of software that seems to "understand" language, posted to /. in the past:

      http://web.mit.e [mit.edu]

  • by StripedCow (776465) on Sunday December 16, 2012 @07:04PM (#42309131)

    Of course, authors can use these tools too, and then iteratively change their texts until they cannot be correctly identified or profiled.

    Just like spammers can check whether their e-mails ends up in spam filters before sending them.

    It will be a never-ending cat and mouse game.

    • The only "authors" who would benefit from this would be undercover agents and trolls. What would be the point of mutating the way you write so that you can no longer be identified or linked as the author of what you wrote before?

      An example to make my point clear. Suppose you're an Islamic fundamentalist ranting about US cultural imperialism. Using the tools you gradually change what you write, under a sequence of aliases, until soon you have the online opinions of a Neocon!

      It would have been easier if you s

      • What would be the point of mutating the way you write so that you can no longer be identified ...

        If you are writing characters for a story, you might want them all to have unique, easily identifiable speech patterns.

        Also the traits that stand out and identify you most are probably really annoying.
        You might want want to reduce them.
        For example, you might want to not use the phrase "might want" nearly so much if it was brought to your attention.

      • How about a protestor in a repressive regime, like China? Wouldn't they rightfully want to be anonymous?

        You basically argued "nothing to hide, nothing to fear."

  • When Writing, How Anonymous Can You Be, Really?

    No.

    • When Writing, How Anonymous Can You Be, Really?

      No.

      No? As an answer to that rhetorical question? Answering "No" doesn't make any sense at all; did you read the question?

    • by game kid (805301)

      Took me a while to figure out if that was just bad spelling or a chant to Cthulhu.

      • by mcgrew (92797) *

        I had no problem reading it, since I have two daughters in their twenties. The GP has seldom if ever used a keyboard and does all his writing on a numberpad-only feature phone.

  • As a professional writer, I wish to be less anonymous. Hello, New Yorker?

    As one of billions who are exposed, I doubt that I will attract any attention regardless of this technology. Perhaps they will figure out who really wrote Shakespeare's plays, but surely they will devote fewer resources to the rest of us.

  • by Spottywot (1910658) on Sunday December 16, 2012 @07:56PM (#42309469)

    We can all (I hope) recognise authors quotes whom we have some familiarity even if we haven't read the passage in question before. Terry Pratchet quotes for instance stand out a mile, Frank Herbert can be identified by the fact that he'll use the word 'subtle' at least twice a paragraph. Even here on /. certain posters styles identify them without having to read their UID, Girlintraining is an example (for me at least), hell I can spot her posts purely based on the responses to her posts for gods sake.

    With the privacy arms race going on right now on the internet, identifying people based on what they write *and* their style, is not only the magic bullet for Big Brother, but quite acheivable given a big enough sample,

    • by onyxruby (118189)

      Interesting claim, since I know girlingtraining. I would be curious to see if you can identify alternate accounts girlintraining has used.

      • No I don't know any of her alternate accounts, but then again I haven't been looking for any, unlike the software suggested in TFA. Now that I *do* know I'll maybe keep an eye out for them, though even suggesting that sounds a bit creepy.
  • The problem with anonymity is that we have become addicted to digital..well..everything. Once you have the data in a digital format it is merely a matter of algorithms, storage, and computational power to pretty much wring whatever you want out of the data. I was a loud mouth Libertarian for quite a few years.. I ranted and threw in my 2 cents at a lot of places online.. then things like the att closet data capture and facebook image recognition started popping up and the writing on the digital wall was pre
  • Timothy's put-downs have been getting a lot of undeserved attention recently. For starters, I don't care what others say about Timothy. He's still nasty, two-faced, and he intends to dig a grave in which to bury liberty and freedom. Now stay with me a moment here; I am making a point. Specifically, if my own experience has taught me anything, it's that he thinks that he's a tribune of the oppressed. However, his endeavors are so lewd that they are easily taken up and assimilated by spiteful, fork-tongued au
  • are only afforded by the rich, connected and well-armed. For the others, be careful what you say, anywhere.

  • assimilation rape (Score:5, Interesting)

    by epine (68316) on Monday December 17, 2012 @01:47AM (#42311559)

    Wanna revisit your recent rants?

    I can't stand how every slashdot story submission has to end with a pink flamingo smoke grenade. I'm guessing that sober "just the facts, ma'am" submissions still exist, but rarely make it through the selection hoop of our post-counting overlords.

    I have several online pseudonyms which I make an effort to keep separate. I rarely post the same idea under more than one identity. If I post it here, it doesn't go there. I prefer to keep things separate so far as I can. I also have some background in computational linguistics. I've known for fifteen years that there is absolutely no way to win this battle long term. Only the most insipid comments will escape long-term annealing. If the word "gay" is the all season tire on your social media K-car, then your identity is safely concealed within the deep-wank weeds.

    If every post you write contains colourful language or idiom such as "all-season tire of deep-wank camouflage" you're toast and you know it, clap your hands. Merely getting my possessives and plurals and possessive plurals right more often than not narrows the net substantially. I might pedantically write Harry S Truman without putting a dot after the S (Snopes: "Although the 'S' was not technically an abbreviation and therefore did not need to be followed by a period, Truman's full name was generally rendered as 'Harry S. Truman' during his lifetime ..."). I make use of colons, semicolons (these come and go), mdash appositives, and parenthetical side-notes--at least one of these in almost every paragraph I write. I post way more links than the average person. My thoughts meander. There is playful use of language with double readings. I subvert cliche to achieve double readings that enable me to circle away from my target, then loop back from an unexpected angle. My unit of thought is the paragraph more so than the sentence.

    Even with all those signatures, originality in word selection is my neon tattoo. The corpus analysis algorithms likely don't do much (yet) with originality. Hard to characterize. For a while my anonymity might pass through the gun-metal algorithms unmelded by virtue of my writing being too bright and distinctive and easy to trace. But not for long. Even the fractal filigrees of originality will be coded eventually. (Pay no attention to the alliteration: an accident, not a stylistic signature.)

    Frankly, my dear, I don't give a damn.

    This is about respect. We all live a double life, pretty much all the time. We speak differently in front of our mothers (most of us) than with the lady-killing rough necks at the peanut bar or power tie horn-dogs at the chichi sushi bar.

    I value anonymity because I don't wish to own everything I say on a literal level, stripped of context, devoid of my original conceit or persona.

    I happen to regard linearity as a social construct. Humans are not inherently linear in cognition or constitution. We learn how to cultivate linear facades in our areas of competence (but not necessarily around the edges: this is why a competent accountant consults his astrologer Madam Threenipple). If you like the primary facade you have, and it suits all purposes, then I suppose you'll see the charm in proclaiming it from the RealName rafters.

    If you're a Baptist homosexual (I've known a few), you might wish to string your public identity by separate ropes.

    Or maybe you've just got things to work out. You're figuring things out on the fly and trying them on for size and you don't wish to fall prey to the Joseph McCarthy clean-nose auto-da-fe "have you ever". Implication: Anything you've ever said will be permanently recorded and will classify you irretrievably. This despite 0/1 statistics never passing T-scores. If the same person also has an NRA membership and has been a career employee of the Hoover Institute for two decades? Still a communist. Ten times more dangerous.

    The kind of person most willing t

  • About two years into my current job, I was able to guess which of my longer serving colleagues had written or contributed to various anonymous documents and reports floating around the office. The processes are easy; learning what words they use misuse or confuse, who writes in a more formal or a more chatty style, those who seem to be unable to leave out detail or write a precis when appropriate, et.c. What confuses this is copy-editing and the numerous copied passages that are typically found in such docu
  • by acid_andy (534219) on Monday December 17, 2012 @03:37AM (#42311925)
    I just have to turn my writing English Finnish, Russian, and, finally, through the back to English again. Analysis software!
    • by acid_andy (534219)
      Just for completeness here, that started out as: I'll just translate my writing from English to Finnish, through Russian and finally back to English again. Analyse that, software!
  • There are 4 simple rules that will help you to avoid this type of identification:
    1. Be brief
    2. Write seldom
    3. Plagiarize!
    4. Do not write in your mother tongue.

  • This isn't really news. I've been having discussions online since before AOL & Windows 3.1 existed, when the hot things were email lists and Usenet.

    Trolls were around even then and once they would get booted off or blocked they would don new aliases, which fooled nobody.

    Their style of writing gave them away.

You can do this in a number of ways. IBM chose to do all of them. Why do you find that funny? -- D. Taylor, Computer Science 350

Working...