Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
The Courts Government Privacy News Science

Writing Style Fingerprint Tool Easily Fooled 96

Urchin writes "Some of the techniques used by literary detectives and courts of law to identify the authorship of text are easily fooled, say US researchers. They found that non-professional writers could hide their identity from 'stylometric' techniques by writing in the style of novelist Cormac McCarthy. Stylometric methods have been used in a number of high-profile legal cases in recent decades, including the 'Unabomber' trial. 'We would strongly suggest that courts examine their methods of stylometry against the possibility of adversarial attacks,' say the researchers."
This discussion has been archived. No new comments can be posted.

Writing Style Fingerprint Tool Easily Fooled

Comments Filter:
  • by Peter Steil ( 1619597 ) on Thursday August 20, 2009 @04:15AM (#29130591)
    ....from the beginning. Sure it may work on a limited set of individuals. It's the same thing as a polygraph test, it's not based on any sort of quantifiable data but mere suspicion at best. It is completely subjective and there is no real hard science to support such tests. This is the reason why polygraphs are not admissible in court, and why writing analysis shouldn't be either. Be sure to watch for writing analysis to show up on the next Maury show!
    • by Anonymous Coward on Thursday August 20, 2009 @04:22AM (#29130635)
      Some analysis of handwriting can be useful. In forgery, for instance, a signature can show as false when compared to an authentic one by the presence of a "forger's tremor", because the forger must proceed more slowly to produce the signature than the person to whom it properly belongs.
      • by Jason Levine ( 196982 ) on Thursday August 20, 2009 @07:42AM (#29131591) Homepage

        I've always wondered just how accurate signatures are. I've noticed that my own signature varies widely depending on various factors. For example, when we purchased our house I had to sign my name to a dozen or more papers. The first signature looked "normal" but the later signatures were glorified scribbles. If I needed to sign a check last and just scribbled my signature on the back, would the bank (not privy to my signature's declining quality in the previous paperwork) be able to tell that it wasn't a bad fake?

        • by jbudofsky ( 1279064 ) on Thursday August 20, 2009 @08:35AM (#29132091)

          I've always wondered just how accurate signatures are. I've noticed that my own signature varies widely depending on various factors.

          Signatures written on paper are not all that helpful for a few reasons. First off, they are easy to forge. Second off, a single person might sign his name twice and produce two signatures which look very different to both the naked eye and some forms of analysis - hence not accurate. Where they actually are accurate, however, is when written on pressure sensative pads (such as those seen on new-fandangled credit card swipers). If you were to do an analysis of the pressure and speed at which the signer signed various parts of the signature, you would actually produce some very reliable information. This is because even when you sign your name in slightly different manners you have the tendancy to use the same speed/pressure on certain parts of certain letters. Personally I would just use digital signatures...but calculating hash functions on the back of your resteraunt receipt is never fun. Its also difficult to fit a 256-bit output on that miniscule "sign here" line.

          • Dear Sirs,

            I find highly offensive you suggestion that styles of writing may be subject to gimmickry and impersonation. I wish to complain in the strongest possible terms about the broadcast, and am deeply dismayed at the judgment displayed by the BBC in funding and producing such rubbish. Many of my best friends groom haddock and other north Atlantic fishes and only a few of them are transvestites.

            Yours faithfully, Brigadier Sir Charles Arthur Strong (Mrs.)

            • Re: (Score:3, Funny)

              by a whoabot ( 706122 )

              Dear Sirs and Madam,

              I wish to complain about that last complaint. I can assure you that all groomers of haddock and every other species in order Gadiformes are indeed transvestites. This is in fact a necessary grade to be reached in the apprenticeship process for the Gadiformes Groomers Guild (GGF). If the former complainant indeed knows of any non-transvestite groomers as such, then he should report them both to the GGF and to the Ministry of Fish Groomers in Luton at once!

              Angrily,
              Mr. Pint

          • I don't know, some of those pads are OK at capturing my signature but others leave it a jumbled mess worse than any signature I've ever written with a pen. And that includes the "just signed my name 100 times, here's another paper to sign for my house" signature. I'm guessing that the differences are either expense (places that go with cheap pads get horrid looking signatures) or when the pad was purchased (earlier ones worse at capturing signatures than later ones).

            As a side note, am I the only one who d

            • by feandil ( 873841 )

              what's wrong with a PIN ? that cannot be forged. civilised countries do use them for credit card payments now you know.

          • Signatures written on paper are not all that helpful...Where they actually are accurate, however, is when written on pressure sensative pads (such as those seen on new-fandangled credit card swipers)

            This may be slightly offtopic (but hopefully interesting to the slashdot crowd), so I apologize in advance. I've been trying to figure out how to use electronic signature pads to verify job authorizations, and haven't been able to come up with a way that they seem airtight to me if a customer denies issuing the

          • Re: (Score:1, Interesting)

            by Anonymous Coward

            Over in Japan, we use Hanko, which are simply ink stamps.
            While signatures can be forged, Hanko is susceptible to theft AND duplication from the stamp.
            I think signatures work on the assumption that signatures are like "artifacts" of one's personality - pretty much like statistics that describe
            the character of a population. The same goes for stylometrics.
            These, like MD5, are good for match identification, but not for authentication.
            Using stylometrics as evidence IMHO is a misuse of technology.

        • by sjames ( 1099 )

          The fact is, our entire banking system is built upon little more than trust. Neither teller nor merchant has any idea if the squiggle on the check is MY squiggle or not. The cost of analysis to gain any level of certainty exceeds the value of a typical check.

          The security features on the check don't mean a lot either. The check printer has no real way of knowing that I am or am not the person whose information they are printing on the check. In any event, a bank will cash anything check like they are given.

      • a signature can show as false when compared to an authentic one by the presence of a "forger's tremor", because the forger must proceed more slowly to produce the signature than the person to whom it properly belongs.

        Which is a totally arbitrary differentiation, considering that a confident, arrogant, or unconcerned forger might well write less hesitantly than a person worried about their handwriting quality, or whether they actually have enough in the bank to cover what they're signing for.

      • sorry, but RTFA. this is stylometry, not handwriting analysis.
        Wikipedia: Sylometry [wikipedia.org]
        Wikipedia: Graphology [wikipedia.org]
      • by mcmonkey ( 96054 )

        Some analysis of handwriting can be useful. In forgery, for instance, a signature can show as false when compared to an authentic one by the presence of a "forger's tremor", because the forger must proceed more slowly to produce the signature than the person to whom it properly belongs.

        Perfect example! What you've detected is the speed and deliberation of the signer. In using this method to detect forgeries, you must make many assumptions regarding the state of mind of the signer.

        Using myself as an exampl

      • This is writing analysis, not handwriting analysis. It looks at the words and punctuation you write, not the shape of the letters you write, so it can be used for typed documents. If they were looking at my writing for example, they would look at my vocabulary, the fact I use British rather than American spellings and words and so on.

    • by KibibyteBrain ( 1455987 ) on Thursday August 20, 2009 @04:31AM (#29130677)
      I don't think anyone has ever sold writing analysis as a unique identifier. But it can be useful. If one was an unpublished author in any significant form, and then "went unabomber" and started to write letters as a calling card, one could deduce from very similar writing styles and structures between the incriminating work and the unpublished/unpopularized previous would would be evidence to at least raise suspicion that the writer of the previous work was somehow uniquely tied to the crimes, even if not directly. Of course, all bets are off if it is plausible that someone could have pre-analyzed the author to imitate. Its also of note, this is only a positive test(i.e. a failed match in analysis makes no claim at all as to whether or not someone wrote it). I good example would be a set of writing that demonstrates an idiom used only in a certain locale, a business term used only in a certain company, and an ideological term used only in a certain fringe political movement. This is reasonable *evidence* of authorship, where of course evidence != proof. The polygraph, on the other hand, is complete BS because the only real thing a polygraph achieves is psychologically motivate the taker to tell the truth due to "faith" in the fact he will be outted for lying by the device. It doesn't actually measure anything related to the statements, only the physiological condition which can depend on millions of independent factors.
      • "I do not think anyone has ever sold as an analysis of writing a unique identifier. But it can be useful. If one was an unpublished author in any way, and then is Unabomber, "and began to write letters as a calling card, can be deduced from very similar writing styles and structures of work and unpublished incriminating / unpopularized previous evidence that at least raise the suspicion that the writer of the earlier work was somehow tied to the crimes, though not directly. Of course, all bets are off if it
        • by KibibyteBrain ( 1455987 ) on Thursday August 20, 2009 @04:43AM (#29130719)
          Again, thats why its clear that writing analysis is only a positive test. If steps are taken to actively change the style of writing, of course it will fail. It is something like saying an audio recording of someone's voice in a phone call is invalid, because it is possible to speak in a different voice. While true, this doesn't significantly weaken the positive test value.
        • by mpe ( 36238 )
          If one was an unpublished author in any way, and then is Unabomber, "and began to write letters as a calling card, can be deduced from very similar writing styles and structures of work and unpublished incriminating / unpopularized previous evidence that at least raise the suspicion that the writer of the earlier work was somehow tied to the crimes, though not directly. Of course, all bets are off if it is possible that someone could have analyzed previously the author to imitate.

          Or that either their prev
      • by Moraelin ( 679338 ) on Thursday August 20, 2009 @05:11AM (#29130845) Journal

        Yes, but the problem is this:

        1. It's not just that it's possible to fake not being myself, it's also that I can pretty much frame someone else. E.g., given enough messages written by KibibyteBrain (which just clicking on the user name or id will give me a list of), it's trivial to do a stylistical analysis on those and not just get an idea of how to write in the same style, but run the same analysis on the result and refine it until the match is outstanding.

        2. From what I understand, the people in this test fooled it by merely being told to write in the style of someone else, without the help of any analysis tools, and still fooled it majorly. That's some pretty damn fragile "evidence" if anyone asks me. It's something Joe Sixpack can do by himself. Add some tools and it can only get crappier.

        Even such idioms as you mention, are trivial to notice even without any tools. E.g., with only a little correspondence with another team here and reading some of their docs, I can tell that they use "solution" instead of "application".

        3. While it can be handwaved as "eh, nobody said it's perfect", some people do seem to take it as less fallible than it really is. Even you just called it "This is reasonable *evidence* of authorship, where of course evidence != proof." And that's the whole point. Something that can be fooled by almost any Joe Sixpack without any tools or much effort, isn't reasonable evidence at all.

        We allow evidence like handwriting, signatures, fingerprints, or DNA because they're supposedly very very hard to fake well. Ok, so DNA turned fakable as well, but you need a fair bit of expensive lab equipment and knowledge. It's something a biology prof at a medical college could probably do, but not something Joey Three-fingers the small time smuggler would even know where to start if he wants to plant someone else's fake blood at his latest shootout scene. Or fingerprints turned out easy to fake for the purpose of fooling a fingerprint reader, but it's still very very hard to transfer to an object in a way that looks genuine.

        But here we have something that untrained people fooled by just being told to try. I'm sorry, but for me then it shouldn't be evidence at all.

        • I think it's about as much evidence as having someone's IP address. It can be spoofed, it's not necessarily linkable to that exact person - but it is sort of a pointer in the direction of that person, as occam's razor would suggest that it is more likely to be real than a frame.

          So I would not say it should be admissible in court, or if it is it should come with a giant caveat, but I could see it pointing investigators in the direction of someone to try to find more hard evidence.

        • While you can attempt to write in someone else's style, you're going to run into problems duplicating it strongly enough for a stylometric analysis to implicate them. Even if you lifted exact phrases from previous works you will invariably need to come up with original words, phrases, and sentence structures to fill the gaps where the original author has not written. These should be enough put reasonable doubt as to the authorship of the faked text.

          More over, if it's identified as a fake, by eliminating t

          • RTFA, seriously (Score:4, Informative)

            by Moraelin ( 679338 ) on Thursday August 20, 2009 @10:14AM (#29133235) Journal

            From TFA: "Each volunteer was then asked to write a description of their neighbourhood in a way that masked their personal style, before writing a further passage in the style of novelist and playwright Cormac McCarthy." [...] "the techniques consistently identified Cormac McCarthy as the author of the imitations of his work."

            So, yes, the whole bloody experiment was precisely about disguising your style as someone else, and no, it did not give the tests any reasonable doubt. People trying to imitate Cormac McCarthy were consistently identified as Cormac McCarthy by the stylistic analysis techniques. It doesn't get more clear cut than this, really.

            So, yes, it is very possible for an average Joe Sixpack to incriminate someone else, if they so choose.

            • Somebody fairly well versed in these techniques ought to create a tool to help spoof another person. Upload the spoofing text, a substantial volume of the spoof victim's writing, hit go, and it comes back with a match rating, and perhaps suggestions for improving it (e.g.: longer sentences, compound sentences, more frequent use of the word "unfortunately", etc.)

              That would pretty much doom the whole enterprise (or at least force it to advance beyond the current state of the art).

        • Re: (Score:3, Insightful)

          Comment removed based on user account deletion
      • by Xenographic ( 557057 ) on Thursday August 20, 2009 @06:31AM (#29131157) Journal

        > I don't think anyone has ever sold writing analysis as a unique identifier. But it can be useful.

        One problem with that is the human tendency to be overconfident as to how good these tests are. This happens everywhere. Court, business, whatever.

        Say you have some metric at work (e.g. lines of code) that's easy to measure. If it's the only measure management has, it's what they'll use to measure how good you're doing. This applies even if the results are absurd, because they would rather believe that they have *some* idea what's going on than to accept the fact that they have no idea what's going on.

        In summary, sometimes NO information is better than bad information, but people are very reluctant to accept that fact.

    • Ah, the irony of someone saying "I could have told you" and then saying that it's "completely subjective" and has "no real hard science to support [it]"!

      Writing style probably can be useful evidence where the style isn't known by others in advance, but it is quite easy to fake a style (much like having a "normal written style" and a "formal report style").

    • Re: (Score:3, Insightful)

      by Lillesvin ( 797939 )

      It is completely subjective and there is no real hard science to support such tests.

      I beg to differ. There's very little subjective in stylometrics, the subjective part is interpreting the results, but definitely not producing them. Take a look at http://en.wikipedia.org/wiki/Stylometry [wikipedia.org] and tell me which of the methods described there you think is "completely subjective".

      The main problem with stylometry is not the methods, but the data. As TFA describes, changing writing style throw off the results - at l

      • The main problem with stylometry is not the methods, but the data. As TFA describes, changing writing style throw off the results - at least to some extent...if someone is aware that the text they are producing might be subjected to stylometric analyses, they can employ various mechanisms to avoid identification and will probably have a better chance at succeeding than if writing casually. However, most texts used in court has been produced casually (letters, emails, text messages) and almost always have some unique traits specific to their author.

        But therein lies the rub: how can you be certain that the actual author didn't consider that the text might be subject to stylometric analysis? Even as a kid, if I wrote something that I didn't want traced back to me, I made an effort to disguise my handwriting and writing style. If I thought of that back when I was a semi-delinquent teen/pre-teen (okay, not really delinquent, but I did get a little mischievous once or twice), I can just about guarantee that anyone who is doing something that might land

        • That is true, but that's where the habitual aspect comes in. While you may be conscious about various aspects of your writing style, there are certain areas that are less prone to conscious manipulation --- e.g. certain syntactical constructions or your active vocabulary. No one (ie. no forensic linguists) will believe that you are Douglas Coupland if the frequency of certain prepositions in your text deviates wildly from his works. And yes, you can of course tamper with such frequencies, but the point is t

  • by Anonymous Coward on Thursday August 20, 2009 @04:16AM (#29130603)

    hide their identity from 'stylometric' techniques by writing in the style of novelist Cormac McCarthy

    ... or Anonymous Coward.

    • Re: (Score:3, Informative)

      by Thanshin ( 1188877 )

      What a crappy joke. I wish I could find you and kill you.

      I mean...

      Oh! A bad pun! Should we cross our paths, I'd rather extinguish your life.

      My dear sir.

  • Duh! (Score:4, Insightful)

    by k.a.f. ( 168896 ) on Thursday August 20, 2009 @04:26AM (#29130647)
    If the methods a stylometry analysis uses are known (and they couldn't very well be a secret to hold up in court), of course you can game them. As long as the algorithm outputs "no" for any reformulation of your message, you can easily find it, by generate-and-test if necessary. The only question is, how fast can you generate a text that (a) says what you intend and (b) does not point to you? Very fast, I'd wager.
    • Did you RTFA? (Score:5, Informative)

      by argent ( 18001 ) <(peter) (at) (slashdot.2006.taronga.com)> on Thursday August 20, 2009 @05:08AM (#29130839) Homepage Journal

      If the methods a stylometry analysis uses are known (and they couldn't very well be a secret to hold up in court), of course you can game them.

      Their volunteer "attackers" lacked formal training in linguistics and had no access to stylometry software.

      • Re:Did you RTFA? (Score:5, Insightful)

        by Opportunist ( 166417 ) on Thursday August 20, 2009 @06:00AM (#29131055)

        No, but they knew they were being analyzed and for what. It's trivial to change my style (well, maybe not in English, I don't tend to have the word pool to draw from) and become someone else. If I know in advance that my writing would be used to find me.

        You can, probably, given time and persistance, sift through the thousands and millions of board messages posted everywhere on the internet and find out who I am in other boards. I didn't try to hide my identity against comparison of writing styles.

        I could see this working if applied to notes and texts written by someone who didn't have any reason to assume it would become the subject of an investigation. I'd deem it utterly worthless, though, when applied to ransom notes and the like.

        • Re: (Score:3, Interesting)

          by k.a.f. ( 168896 )

          No, but they knew they were being analyzed and for what. It's trivial to change my style (well, maybe not in English, I don't tend to have the word pool to draw from) and become someone else. If I know in advance that my writing would be used to find me.

          You can, probably, given time and persistance, sift through the thousands and millions of board messages posted everywhere on the internet and find out who I am in other boards. I didn't try to hide my identity against comparison of writing styles.

          I could see this working if applied to notes and texts written by someone who didn't have any reason to assume it would become the subject of an investigation. I'd deem it utterly worthless, though, when applied to ransom notes and the like.

          That's what I meant, sorry: even a computer program could outwit such analyses. Given the current state of automatic language analysis (Disclaimer: IAA computational linguist), I consider it obvious that a determined person can fool the discriminators enough to appear as someone else.

        • Don't you know, ransom notes are compiled by pasting the individual letters cut from magazines. They do it in the movies all the time.

          • Still, you cut and paste to write what you plan to express. This may already be a lead. Not to mention that your choice of newspaper is a good hint for a profiler, amongst other things, how you cut and glue the paper snippets, how you choose words...

            I'd write an email.

        • well, maybe not in English, I don't tend to have the word pool to draw from

          I'm impressed. Your spelling, grammar and punctuation are much, much better than a good portion of the native speakers/writers posting here :)

          I could see this working if applied to notes and texts written by someone who didn't have any reason to assume it would become the subject of an investigation. I'd deem it utterly worthless, though, when applied to ransom notes and the like.

          Exactly. If you have any inkling that your texts will be analyzed to determine who the actual author is, and you don't want them traced back to you, then, as TFA states, it is trivial even for amateurs to mimic other writing styles to hide the actual author's identity. Even if writing is found in your possession that looks like your style, can you prove that some

    • Re: (Score:3, Funny)

      by bitt3n ( 941736 )

      how fast can you generate a text that (a) says what you intend and (b) does not point to you? Very fast, I'd wager.

      as fast as: type it out, auto-translate it into french, auto-translate it back into: "the person who is being hated by myself is to be killed by myself by employment of the method of the bomb conflagration saving if it is the case that I am receiving the stipend of an amount that is one million of dollars. sandwich."

      • the person who is being hated by myself is to be killed by myself by employment of the method of the bomb conflagration saving if it is the case that I am receiving the stipend of an amount that is one million of dollars. sandwich.

        Oddly, if you translate that in to French and back (using Google translate), you get "the person who is hated by myself is to be killed by myself by using the method of the bomb save conflagration if it is that I receive an allocation of that amount is to a million dollars. sandwich.", which is (IMHO) slightly MORE readable than your original!

      • how fast can you generate a text that (a) says what you intend and (b) does not point to you? Very fast, I'd wager.

        as fast as: type it out, auto-translate it into french, auto-translate it back into: "the person who is being hated by myself is to be killed by myself by employment of the method of the bomb conflagration saving if it is the case that I am receiving the stipend of an amount that is one million of dollars. sandwich."

        FBI Agent 1: "This guy wants a million dollars. And a sandwich."
        FBI Agent 2: "Bastard must be hungry. Let's try starving him out."

    • ... you can easily find it, by generate-and-test if necessary.

      If you think generate-and-test is an easy way to find it, then I've got some NP-complete problems for you to solve. While you're at it, I also have some public keys I'd like you to crack.

      /sarcasm

      (Not that I think fooling stylometry is hard, but generate-and-test is generally not useful for anything but the smallest problems.)

  • No surprise (Score:5, Interesting)

    by AmiMoJo ( 196126 ) on Thursday August 20, 2009 @04:33AM (#29130681) Homepage Journal

    This should not really come as a surprise to anyone. Like all evidence that has to be interpreted, the interpretation can be flawed.

    Shows like CSI have computers getting an exact match on fingerprints and DNA, but the real world is not like that. Fingerprint matching is entirely subjective and the print recovered from a crime scene is rarely a nice clean one like they show on TV. DNA often has to be manipulated before a match can be made (due to the sample found at the scene being too small or of poor quality) and even then it often matches more than one person.

    Even when you do get a match, it's not proof that someone was at a specific place because DNA and fingerprints can easily be transferred. Someone broke in to my car a few years ago and despite there being fingerprints the police decided not to prosecute because they were on the outside of the car and the accused could just claim he lent on it on his way home from the pub.

    There have been a few cases where fingerprint and DNA evidence have been challenged in the UK courts and shown to be unreliable, with innocent people spending years in jail before being cleared. Yet, the police seem to have started asking for everyone in the area of a crime to "volunteer" their DNA. Presumably if you don't "volunteer" you become a suspect.

    The idea that handwriting is any more unique than those two and at all reliable is laughable.

    • Re:No surprise (Score:4, Insightful)

      by abigsmurf ( 919188 ) on Thursday August 20, 2009 @05:04AM (#29130815)
      There was a good article here (or possibly some other social news type site) about the inherent flaw in DNA databases and the weight given to DNA evidence.

      The theory goes like this: the chances of getting a false positive on a part sample are something like 1/50million. You have 50 million people on the database. This means You'd expect a false positive on every search. If you're unlucky enough to live close enough to a crime to have committed it, you could easily find yourself in court.

      You'll then have to defend yourself based on a 1 in 50 million probability to a jury who won't understand the statistics. If you haven't got a solid alibi, it would be a tough thing to do.

      There's probably a good Terry Pratchett quote about 1 in a million chances to be used here.
      • This is the problem with fingerprint evidence as opposed to DNA evidence

        DNA Evidence is normally matched on a small number of key points against a database of these points, probability of a mismatch is ~ 1:50million with a world population of 6.5 billion you will get mis-matches, with a US population of 300 million you will get mis-matches CODIS has 5 million entries so far ,, mis-matches are less likely but not impossible ....

        NB if you have two samples then they can be matched exactly with total confidence

        • It can be matched exactly with total confidence if you scan the entire DNA. However, it took the human genome project a long time to do that with just one sample, so I don't think that is being done with police samples.

          • For one, it isn't being done with police samples, and it would be utterly stupid to do so, and not for the reason you think.

            It is unlikely that any two cells selected at random from your body share exactly the same DNA. Every cell division introduces errors. Some of these errors cause the new cells to malfunction and die (or malfunction and become cancer), but many will not. A total DNA comparison would rarely, if ever, return a perfect match except by chance (either the chance of having picked the right

      • by AmiMoJo ( 196126 )

        An excellent point well made.

        There is also danger of a match being made on another member of your family, but you being the one somehow tied to the case (in the same city or something) and so you get arrested. Siblings have close enough DNA that such matches can apparently be made.

        I question the "1 in 50 million" statistic too. It's far too simplistic, as there are different ways of collecting and matching DNA. Also, so-called experts have been wrong about this sort of thing in the past. Remember that poor

    • There was a study done where fingerprint "experts" were asked to identify fingerprints from a "crime scene". Except these fingerprints were actually fingerprints that the expert had previously identified. The experts identified the fingerprints the same the second time around at a rate of less than 50% (my recollection is that only 1 in 10 of the experts gave the same identification for the fingerprints the second time around).
  • by Trepidity ( 597 ) <[delirium-slashdot] [at] [hackish.org]> on Thursday August 20, 2009 @04:36AM (#29130685)

    Stylometrics is essentially a correlational field: it's not that people inherently must write in unique styles that are identifiable from a few measurable features: there is no strong genetic causation for handwriting or anything like that, which would mean that a handwriting style really does truly identify an individual or narrow set of individuals. Rather, it's that, all else being equal, people in practice, do tend to write in a way that lets the stylometric features distinguish them. But, when all else isn't equal, and people are actively trying to thwart that sort of analysis, they are, unsurprisingly, able to do so in a lot of cases.

    I suspect that a lot of forensic analysis runs into this problem: it takes some fact that empirically is true among the general population, but only because the general population is not actively trying to thwart you. The set of robust empirical truths about people, that hold up even when the person is aware that you're trying to use it against them and actively trying to keep you from doing so, is much smaller.

  • The real issue is why we continue to ban 'criminals' when forensics are both available for testimony but often not for further examination because of deliberate overuse. We've now been shown data that even DNA evidence can be manufatured, if it's not first tested for methyl levels. And that is totally independent of physical specification. Which bring back the essential question that we've not had updated since 2000: What are we willing to expend energy for?

  • by hansraj ( 458504 ) on Thursday August 20, 2009 @04:48AM (#29130737)

    What exactly is the "Cormac McCarthy style"? The article doesn't mention it all. I even skimmed through the paper and all it does it quote a paragraph from some work of Cormac McCarthy.

    I can't figure out what his style exactly is, and I certainly would not be able to fake it as the participants were supposed to. And the participants were supposed to not be literary geniuses.

    • Re: (Score:1, Flamebait)

      Well, I was just as confused as you were, but it's fairly obvious that he writes in some gimmicky convuluted way that people think is cool. Did you ever try to read Michel Focault? He would string together sentences with about 6 clauses each containing participle/subject/predicate lists of 3 of 4 items, and the whole damn sentence would take up about a page or more.

      And then there is James Joyce in Finnegan's Wake, and that dumb guy we all had to read in college that told the story about the retarded guy a

      • Re: (Score:2, Informative)

        by SappoMan ( 51574 )
        This is the epilogue from "Blood Meridian", a novel of McCarthy:
        "In the dawn there is a man progressing over the plain by means of holes, which he is making in the ground. He uses an implement with two handles and he chocks it into the hole and he enkindles the stone into the hole with he steel, hole by hole, striking the fire out of the rock, which God has put there. On the plain behind him are the wanderers in search of bones, and those who do not search. And they move haltingly in the light, likes mech
    • The only thing of his I've read is "The Road", which is a great post-apocalyptic novel. I do remember his style was a little unusual... it's been a few years, but I'm thinking sentence fragments, half-finished thoughts, etc.

      Whatever it was, though, it wasn't distracting enough to prevent me from finishing the book. I'm trying to read some Margaret Atwood now, and not really enjoying it...

    • Most of the story is written as 3rd person but has various parts written as 1st person.
    • What exactly is the "Cormac McCarthy style"? The article doesn't mention it all. I even skimmed through the paper and all it does it quote a paragraph from some work of Cormac McCarthy.

      Admittedly, they should have included an excerpt of reference text. From a randomly selected [quarterlyc...sation.com] website because I'm too lazy to walk to my bookshelf for something newer:

      In large part, The Orchard Keeper is written with the same stylistic tics that that Harold Bloom would later celebrate in Blood Meridian as, to paraphrase, the m

      • Re: (Score:2, Interesting)

        by Anonymous Coward

        Ummm Not Fair 20 years ago the exam board just labeled that 'Bad Grammmar' and failed me.

    • As you have already got some pointers to the style of McCarthy, let me tell a little anecdote. Some literary scientist once tried, only half-jokingly, to come up with a measure [jhu.edu] for the "southernness" of books. After some research, he found out that the deeper the southern roots of the author, the more dead mules appear in his texts. By this metric, Cormac McCarthy is the undisputed king of the genre, with over 100 dead mules in his novel "Blood Meridian" alone. He kills 50 alone when he let's them drop ove
    • The style of an author has many factors. Mostly it deals with word choice, sentence structure, and information flow. Does the author over use certain words? Does he/she have a large vocabulary? Are the descriptive words long or short? Are the sentences short or rambling? Are the sentences passive or active? How much detail and how long does it take for the author to convey his/her ideas?

      In college, I took a linguistic class on stylistics. One of the best courses I ever took, and I wish I had taken i
  • by Lundse ( 1036754 )
    If you can describe something in enough detail to put it in a certain category (X writes likes this), then you can also imitate that category from that same description (I will now write like this in order to seem like X).

    I do not really see how you would ever expect different.
  • no 1 can f00l teh l33t XpertZ just bi change D way D write stuff kekeke stupit fags
  • by digitig ( 1056110 ) on Thursday August 20, 2009 @05:36AM (#29130949)

    As the article says "the study only attacked some of the less complex stylometry techniques". In fact, I'm surprised that they even considered lexical density because that varies greatly within a single author's writing. It's usually high at the beginning of a text, usually (not always) gradually falls off, jumps when they change subject, and so on. I'm not aware of it's being used in forensic linguistics (although it is used in analysing texts to identify, for example, objective divisions within a text).

    The sort of thing that they used in the Derek Bentley [wikipedia.org] (which contributed to the partial posthumous pardon) was analysis of his statement, which had

    • unusually high proportion of passive constructions
    • the use of police jargon
    • use of language that was not consistent with an educationally sub-normal 17-year-old
    • word frequencies that didn't correlate well with general spoken or written English but that did correlate very well with police reports
    • unusual precision in the expression of times
    • frequent post-positioning of "then" after the subject ("I then went..." instead of "then I went..."), again characteristic of police reports

    That all pointed to the statement not being Bentley's own words, but rather being the police version of his answers to a series of police questions that had been removed from the statement. One aspect of his original trial was a statement "I did not know he was going to use the gun", which was taken as evidence that he knew his accomplice, Craig, had a gun (and the inconsistency with the denial that he knew this, later in the statement, was taken as evidence that he was lying). Since the linguistic analysis shows that this was probably a reply to a question, it seems more likely that it went something like:

    Police
    Did you know he was going to use the gun?
    Bentley

    No.

    Which makes sense because he knew at the time of the interview that Craig had a gun.

    Yes, of course this sort of thing can be gamed, but it wasn't credible that Bentley would have been capable of such sophisticated gaming. The important thing as far as this thread is concerned is that forensic linguistics doesn't plug in a single measure, turn a handle and come out with a yes/no answer; it uses a whole range of measures and builds up an overall picture of what probably happened.

    • Some of the techniques tested by Brennan and Greenstadt discard prepositions because they are deemed to have no information content, says Michael Oakes, a computational linguist at the University of Sunderland, UK. This filters out the words that could have helped most, he says.

      "deemed to have no information content" is actually a positive feature for analysis. Vocabulary is one thing, but the little things, like prepositions, malapropisms, punctuation and favorite constructions are harder to fake. If someo

      • If someone consistently uses it's as a possessive and writes "for all intensive purposes", it'll be difficult for that person to suddenly start writing consistently.

        Well that right there would identify 90% of Slashdot, Fark, and Digg users as being the same author.

    • In fact, I'm surprised that they even considered lexical density because that varies greatly within a single author's writing.

      Did they? Yes, the article mentions lexical density, but it then* goes on to describe token/type ratio, which is a different beast entirely.

      The problem with FAs is that they're anything but a primary source....

      HAL.

      * Am I hiding my writing style here or adhering to it...?

  • The fact that one person may write in the style of another is nothing new. While the use of such writing-style analysis may still have a valid use in some cases, it is clear that it, like any other forensic tool (even DNA analysis) can be beaten.

    Prior to contemporary times, I believe the number of people who would have had access to enough writing samples (of persons other than authors, columnists, and other published figures) to successfully mimic another's style would have been limited to family members
  • by BigHungryJoe ( 737554 ) on Thursday August 20, 2009 @07:23AM (#29131457) Homepage

    "We would strongly suggest that courts examine their methods of stylometry against the possibility of adversarial attacks,' say the researchers."

    Of course, this assumes that law enforcement actually cares about the guilt or innocence of the people they convict. They don't. They only care about putting as many people in prison as they can.

    • Re: (Score:3, Informative)

      by TimSSG ( 1068536 )

      They only care about putting as many people in prison as they can.

      Wrong|

      The Basic Metric used on the police is case closed.
      In other words, it is easy to say a dead person committed a crime; because it closes a case.

      Metrics have very bad sides.

      Tim S.

  • by JoshuaZ ( 1134087 ) on Thursday August 20, 2009 @08:25AM (#29131971) Homepage
    So handwriting analysis has problems. Another recent Slashdot article was about how DNA evidence might be falsifiable. And we all know that eye-witnesses have serious problems. We don't however reject any of these. Why not? Because we don't care about single pieces of evidence but rather about bodies of evidence. It is the collective narrative which matters. It might be possible for one or two types of evidence to be wrong or falsified. But it is extremely difficult to falsify four or five. The real problem is when overzealous prosecutors try to portray something like handwriting analysis as a CSI-style magic bullet. This is moreover, being balanced by a problem in the opposite direction, which juries increasingly wanting all sorts of technical evidence to convict even when it would be unnecessary, prohibitively expensive or in some cases, a form of evidence that really only exists in fiction.
  • this article is NOT about handwriting in anyway, it is about writing style which is the style you write in. No where does it mention anything about handwriting!
  • I read the book "Author Unknown" which talked about this for the forensic side. It was only an okay book. Here's how I would do it.
    1. Inconsistently spell things wrong. Misspell a word one way, then down a few paragraphs, misspell it another way.
    2. Type in all caps. All capitalization errors you might normally make goes away.
    3. Don't use your regional sayings for things. Use some other region's, or use all of them.
    4. Run it back & forth with translation services to really obfuscate it.

    easy peasy.

  • The "Gender Genie" works surprisingly well, even tried a female blogging about math, and a male blogging about gay-bashing...

    The Gender Genie [bookblog.net]

  • Well, it got me wrong.

"Hello again, Peabody here..." -- Mister Peabody

Working...