Urchin writes "Some of the techniques used by literary detectives and courts of law to identify the authorship of text are easily fooled, say US researchers. They found that non-professional writers could hide their identity from 'stylometric' techniques by writing in the style of novelist Cormac McCarthy. Stylometric methods have been used in a number of high-profile legal cases in recent decades, including the 'Unabomber' trial. 'We would strongly suggest that courts examine their methods of stylometry against the possibility of adversarial attacks,' say the researchers."
....from the beginning. Sure it may work on a limited set of individuals. It's the same thing as a polygraph test, it's not based on any sort of quantifiable data but mere suspicion at best. It is completely subjective and there is no real hard science to support such tests. This is the reason why polygraphs are not admissible in court, and why writing analysis shouldn't be either. Be sure to watch for writing analysis to show up on the next Maury show!
by Anonymous Coward
on Thursday August 20, @04:22AM (#29130635)
Some analysis of handwriting can be useful. In forgery, for instance, a signature can show as false when compared to an authentic one by the presence of a "forger's tremor", because the forger must proceed more slowly to produce the signature than the person to whom it properly belongs.
I've always wondered just how accurate signatures are. I've noticed that my own signature varies widely depending on various factors. For example, when we purchased our house I had to sign my name to a dozen or more papers. The first signature looked "normal" but the later signatures were glorified scribbles. If I needed to sign a check last and just scribbled my signature on the back, would the bank (not privy to my signature's declining quality in the previous paperwork) be able to tell that it wasn't a bad fake?
I've always wondered just how accurate signatures are. I've noticed that my own signature varies widely depending on various factors.
Signatures written on paper are not all that helpful for a few reasons. First off, they are easy to forge. Second off, a single person might sign his name twice and produce two signatures which look very different to both the naked eye and some forms of analysis - hence not accurate. Where they actually are accurate, however, is when written on pressure sensative pads (such as those seen on new-fandangled credit card swipers). If you were to do an analysis of the pressure and speed at which the signer signed various parts of the signature, you would actually produce some very reliable information. This is because even when you sign your name in slightly different manners you have the tendancy to use the same speed/pressure on certain parts of certain letters.
Personally I would just use digital signatures...but calculating hash functions on the back of your resteraunt receipt is never fun. Its also difficult to fit a 256-bit output on that miniscule "sign here" line.
I find highly offensive you suggestion that styles of writing may be subject to gimmickry and impersonation. I wish to complain in the strongest possible terms about the broadcast, and am deeply dismayed at the judgment displayed by the BBC in funding and producing such rubbish. Many of my best friends groom haddock and other north Atlantic fishes and only a few of them are transvestites.
Yours faithfully, Brigadier Sir Charles Arthur Strong (Mrs.)
I wish to complain about that last complaint. I can assure you that all groomers of haddock and every other species in order Gadiformes are indeed transvestites. This is in fact a necessary grade to be reached in the apprenticeship process for the Gadiformes Groomers Guild (GGF). If the former complainant indeed knows of any non-transvestite groomers as such, then he should report them both to the GGF and to the Ministry of Fish Groomers in Luton at once!
a signature can show as false when compared to an authentic one by the presence of a "forger's tremor", because the forger must proceed more slowly to produce the signature than the person to whom it properly belongs.
Which is a totally arbitrary differentiation, considering that a confident, arrogant, or unconcerned forger might well write less hesitantly than a person worried about their handwriting quality, or whether they actually have enough in the bank to cover what they're signing for.
I don't think anyone has ever sold writing analysis as a unique identifier. But it can be useful. If one was an unpublished author in any significant form, and then "went unabomber" and started to write letters as a calling card, one could deduce from very similar writing styles and structures between the incriminating work and the unpublished/unpopularized previous would would be evidence to at least raise suspicion that the writer of the previous work was somehow uniquely tied to the crimes, even if not directly. Of course, all bets are off if it is plausible that someone could have pre-analyzed the author to imitate. Its also of note, this is only a positive test(i.e. a failed match in analysis makes no claim at all as to whether or not someone wrote it). I good example would be a set of writing that demonstrates an idiom used only in a certain locale, a business term used only in a certain company, and an ideological term used only in a certain fringe political movement. This is reasonable *evidence* of authorship, where of course evidence != proof.
The polygraph, on the other hand, is complete BS because the only real thing a polygraph achieves is psychologically motivate the taker to tell the truth due to "faith" in the fact he will be outted for lying by the device. It doesn't actually measure anything related to the statements, only the physiological condition which can depend on millions of independent factors.
"I do not think anyone has ever sold as an analysis of writing a unique identifier. But it can be useful. If one was an unpublished author in any way, and then is Unabomber, "and began to write letters as a calling card, can be deduced from very similar writing styles and structures of work and unpublished incriminating / unpopularized previous evidence that at least raise the suspicion that the writer of the earlier work was somehow tied to the crimes, though not directly. Of course, all bets are off if it
Again, thats why its clear that writing analysis is only a positive test. If steps are taken to actively change the style of writing, of course it will fail. It is something like saying an audio recording of someone's voice in a phone call is invalid, because it is possible to speak in a different voice. While true, this doesn't significantly weaken the positive test value.
If one was an unpublished author in any way, and then is Unabomber, "and began to write letters as a calling card, can be deduced from very similar writing styles and structures of work and unpublished incriminating / unpopularized previous evidence that at least raise the suspicion that the writer of the earlier work was somehow tied to the crimes, though not directly. Of course, all bets are off if it is possible that someone could have analyzed previously the author to imitate.
1. It's not just that it's possible to fake not being myself, it's also that I can pretty much frame someone else. E.g., given enough messages written by KibibyteBrain (which just clicking on the user name or id will give me a list of), it's trivial to do a stylistical analysis on those and not just get an idea of how to write in the same style, but run the same analysis on the result and refine it until the match is outstanding.
2. From what I understand, the people in this test fooled it by merely being told to write in the style of someone else, without the help of any analysis tools, and still fooled it majorly. That's some pretty damn fragile "evidence" if anyone asks me. It's something Joe Sixpack can do by himself. Add some tools and it can only get crappier.
Even such idioms as you mention, are trivial to notice even without any tools. E.g., with only a little correspondence with another team here and reading some of their docs, I can tell that they use "solution" instead of "application".
3. While it can be handwaved as "eh, nobody said it's perfect", some people do seem to take it as less fallible than it really is. Even you just called it "This is reasonable *evidence* of authorship, where of course evidence != proof." And that's the whole point. Something that can be fooled by almost any Joe Sixpack without any tools or much effort, isn't reasonable evidence at all.
We allow evidence like handwriting, signatures, fingerprints, or DNA because they're supposedly very very hard to fake well. Ok, so DNA turned fakable as well, but you need a fair bit of expensive lab equipment and knowledge. It's something a biology prof at a medical college could probably do, but not something Joey Three-fingers the small time smuggler would even know where to start if he wants to plant someone else's fake blood at his latest shootout scene. Or fingerprints turned out easy to fake for the purpose of fooling a fingerprint reader, but it's still very very hard to transfer to an object in a way that looks genuine.
But here we have something that untrained people fooled by just being told to try. I'm sorry, but for me then it shouldn't be evidence at all.
I think it's about as much evidence as having someone's IP address. It can be spoofed, it's not necessarily linkable to that exact person - but it is sort of a pointer in the direction of that person, as occam's razor would suggest that it is more likely to be real than a frame.
So I would not say it should be admissible in court, or if it is it should come with a giant caveat, but I could see it pointing investigators in the direction of someone to try to find more hard evidence.
While you can attempt to write in someone else's style, you're going to run into problems duplicating it strongly enough for a stylometric analysis to implicate them. Even if you lifted exact phrases from previous works you will invariably need to come up with original words, phrases, and sentence structures to fill the gaps where the original author has not written. These should be enough put reasonable doubt as to the authorship of the faked text.
More over, if it's identified as a fake, by eliminating t
From TFA: "Each volunteer was then asked to write a description of their neighbourhood in a way that masked their personal style, before writing a further passage in the style of novelist and playwright Cormac McCarthy." [...] "the techniques consistently identified Cormac McCarthy as the author of the imitations of his work."
So, yes, the whole bloody experiment was precisely about disguising your style as someone else, and no, it did not give the tests any reasonable doubt. People trying to imitate Cormac McCarthy were consistently identified as Cormac McCarthy by the stylistic analysis techniques. It doesn't get more clear cut than this, really.
So, yes, it is very possible for an average Joe Sixpack to incriminate someone else, if they so choose.
It sounds to me like this "evidence" is just another case of bullet matching [sfgate.com], which for those that haven't heard the term was the rage at the FBI for awhile and I'm sure there are innocent people rotting in jail right now over its bogus findings.
What we have to be seriously careful about with these pseudoscience "tests", is the simple fact that juries love CSI style mumbo jumbo that makes solving a case little more than a magic box pointing out someone and saying "He did it". And just like bullet matching
> I don't think anyone has ever sold writing analysis as a unique identifier. But it can be useful.
One problem with that is the human tendency to be overconfident as to how good these tests are. This happens everywhere. Court, business, whatever.
Say you have some metric at work (e.g. lines of code) that's easy to measure. If it's the only measure management has, it's what they'll use to measure how good you're doing. This applies even if the results are absurd, because they would rather believe that they have *some* idea what's going on than to accept the fact that they have no idea what's going on.
In summary, sometimes NO information is better than bad information, but people are very reluctant to accept that fact.
Ah, the irony of someone saying "I could have told you" and then saying that it's "completely subjective" and has "no real hard science to support [it]"!
Writing style probably can be useful evidence where the style isn't known by others in advance, but it is quite easy to fake a style (much like having a "normal written style" and a "formal report style").
It is completely subjective and there is no real hard science to support such tests.
I beg to differ. There's very little subjective in stylometrics, the subjective part is interpreting the results, but definitely not producing them. Take a look at http://en.wikipedia.org/wiki/Stylometry [wikipedia.org] and tell me which of the methods described there you think is "completely subjective".
The main problem with stylometry is not the methods, but the data. As TFA describes, changing writing style throw off the results - at l
If the methods a stylometry analysis uses are known (and they couldn't very well be a secret to hold up in court), of course you can game them. As long as the algorithm outputs "no" for any reformulation of your message, you can easily find it, by generate-and-test if necessary. The only question is, how fast can you generate a text that (a) says what you intend and (b) does not point to you? Very fast, I'd wager.
No, but they knew they were being analyzed and for what. It's trivial to change my style (well, maybe not in English, I don't tend to have the word pool to draw from) and become someone else. If I know in advance that my writing would be used to find me.
You can, probably, given time and persistance, sift through the thousands and millions of board messages posted everywhere on the internet and find out who I am in other boards. I didn't try to hide my identity against comparison of writing styles.
I could see this working if applied to notes and texts written by someone who didn't have any reason to assume it would become the subject of an investigation. I'd deem it utterly worthless, though, when applied to ransom notes and the like.
No, but they knew they were being analyzed and for what. It's trivial to change my style (well, maybe not in English, I don't tend to have the word pool to draw from) and become someone else. If I know in advance that my writing would be used to find me.
You can, probably, given time and persistance, sift through the thousands and millions of board messages posted everywhere on the internet and find out who I am in other boards. I didn't try to hide my identity against comparison of writing styles.
I could see this working if applied to notes and texts written by someone who didn't have any reason to assume it would become the subject of an investigation. I'd deem it utterly worthless, though, when applied to ransom notes and the like.
That's what I meant, sorry: even a computer program could outwit such analyses. Given the current state of automatic language analysis (Disclaimer: IAA computational linguist), I consider it obvious that a determined person can fool the discriminators enough to appear as someone else.
how fast can you generate a text that (a) says what you intend and (b) does not point to you? Very fast, I'd wager.
as fast as: type it out, auto-translate it into french, auto-translate it back into: "the person who is being hated by myself is to be killed by myself by employment of the method of the bomb conflagration saving if it is the case that I am receiving the stipend of an amount that is one million of dollars. sandwich."
... you can easily find it, by generate-and-test if necessary.
If you think generate-and-test is an easy way to find it, then I've got some NP-complete problems for you to solve. While you're at it, I also have some public keys I'd like you to crack.
/sarcasm
(Not that I think fooling stylometry is hard, but generate-and-test is generally not useful for anything but the smallest problems.)
This should not really come as a surprise to anyone. Like all evidence that has to be interpreted, the interpretation can be flawed.
Shows like CSI have computers getting an exact match on fingerprints and DNA, but the real world is not like that. Fingerprint matching is entirely subjective and the print recovered from a crime scene is rarely a nice clean one like they show on TV. DNA often has to be manipulated before a match can be made (due to the sample found at the scene being too small or of poor quality) and even then it often matches more than one person.
Even when you do get a match, it's not proof that someone was at a specific place because DNA and fingerprints can easily be transferred. Someone broke in to my car a few years ago and despite there being fingerprints the police decided not to prosecute because they were on the outside of the car and the accused could just claim he lent on it on his way home from the pub.
There have been a few cases where fingerprint and DNA evidence have been challenged in the UK courts and shown to be unreliable, with innocent people spending years in jail before being cleared. Yet, the police seem to have started asking for everyone in the area of a crime to "volunteer" their DNA. Presumably if you don't "volunteer" you become a suspect.
The idea that handwriting is any more unique than those two and at all reliable is laughable.
There was a good article here (or possibly some other social news type site) about the inherent flaw in DNA databases and the weight given to DNA evidence.
The theory goes like this: the chances of getting a false positive on a part sample are something like 1/50million. You have 50 million people on the database. This means You'd expect a false positive on every search. If you're unlucky enough to live close enough to a crime to have committed it, you could easily find yourself in court.
You'll then have to defend yourself based on a 1 in 50 million probability to a jury who won't understand the statistics. If you haven't got a solid alibi, it would be a tough thing to do.
There's probably a good Terry Pratchett quote about 1 in a million chances to be used here.
This is the problem with fingerprint evidence as opposed to DNA evidence
DNA Evidence is normally matched on a small number of key points against a database of these points, probability of a mismatch is ~ 1:50million with a world population of 6.5 billion you will get mis-matches, with a US population of 300 million you will get mis-matches CODIS has 5 million entries so far,, mis-matches are less likely but not impossible....
NB if you have two samples then they can be matched exactly with total confidence
A DNA match does not establish motive and other important things. All it shows is "it is very likely that your DNA is here".
Once you add up everything (other evidence, alibi found to be false, etc) else you might have something. But it is certainly not "short of a confession", it is nowhere even close to a confession.
It's certainly very useful to help find out who else to investigate (and who to investigate first:) ).
Same goes for the "writing style" tool. Even if it's easily fooled, it may still be a usef
Stylometrics is essentially a correlational field: it's not that people inherently must write in unique styles that are identifiable from a few measurable features: there is no strong genetic causation for handwriting or anything like that, which would mean that a handwriting style really does truly identify an individual or narrow set of individuals. Rather, it's that, all else being equal, people in practice, do tend to write in a way that lets the stylometric features distinguish them. But, when all else isn't equal, and people are actively trying to thwart that sort of analysis, they are, unsurprisingly, able to do so in a lot of cases.
I suspect that a lot of forensic analysis runs into this problem: it takes some fact that empirically is true among the general population, but only because the general population is not actively trying to thwart you. The set of robust empirical truths about people, that hold up even when the person is aware that you're trying to use it against them and actively trying to keep you from doing so, is much smaller.
The real issue is why we continue to ban 'criminals' when forensics are both available for testimony but often not for further examination because of deliberate overuse. We've now been shown data that even DNA evidence can be manufatured, if it's not first tested for methyl levels. And that is totally independent of physical specification. Which bring back the essential question that we've not had updated since 2000: What are we willing to expend energy for?
What exactly is the "Cormac McCarthy style"? The article doesn't mention it all. I even skimmed through the paper and all it does it quote a paragraph from some work of Cormac McCarthy.
I can't figure out what his style exactly is, and I certainly would not be able to fake it as the participants were supposed to. And the participants were supposed to not be literary geniuses.
The only thing of his I've read is "The Road", which is a great post-apocalyptic novel. I do remember his style was a little unusual... it's been a few years, but I'm thinking sentence fragments, half-finished thoughts, etc.
Whatever it was, though, it wasn't distracting enough to prevent me from finishing the book. I'm trying to read some Margaret Atwood now, and not really enjoying it...
What exactly is the "Cormac McCarthy style"? The article doesn't mention it all. I even skimmed through the paper and all it does it quote a paragraph from some work of Cormac McCarthy.
Admittedly, they should have included an excerpt of reference text. From a randomly selected [quarterlyc...sation.com] website because I'm too lazy to walk to my bookshelf for something newer:
In large part, The Orchard Keeper is written with the same stylistic tics that that Harold Bloom would later celebrate in Blood Meridian as, to paraphrase, the m
This is the epilogue from "Blood Meridian", a novel of McCarthy:
"In the dawn there is a man progressing over the plain by means of holes, which he is making in the ground. He uses an implement with two handles and he chocks it into the hole and he enkindles the stone into the hole with he steel, hole by hole, striking the fire out of the rock, which God has put there. On the plain behind him are the wanderers in search of bones, and those who do not search. And they move haltingly in the light, likes mech
If you can describe something in enough detail to put it in a certain category (X writes likes this), then you can also imitate that category from that same description (I will now write like this in order to seem like X).
I do not really see how you would ever expect different.
As the article says "the study only attacked some of the less complex stylometry techniques". In fact, I'm surprised that they even considered lexical density because that varies greatly within a single author's writing. It's usually high at the beginning of a text, usually (not always) gradually falls off, jumps when they change subject, and so on. I'm not aware of it's being used in forensic linguistics (although it is used in analysing texts to identify, for example, objective divisions within a text).
The sort of thing that they used in the Derek Bentley [wikipedia.org] (which contributed to the partial posthumous pardon) was analysis of his statement, which had
unusually high proportion of passive constructions
the use of police jargon
use of language that was not consistent with an educationally sub-normal 17-year-old
word frequencies that didn't correlate well with general spoken or written English but that did correlate very well with police reports
unusual precision in the expression of times
frequent post-positioning of "then" after the subject ("I then went..." instead of "then I went..."), again characteristic of police reports
That all pointed to the statement not being Bentley's own words, but rather being the police version of his answers to a series of police questions that had been removed from the statement. One aspect of his original trial was a statement "I did not know he was going to use the gun", which was taken as evidence that he knew his accomplice, Craig, had a gun (and the inconsistency with the denial that he knew this, later in the statement, was taken as evidence that he was lying). Since the linguistic analysis shows that this was probably a reply to a question, it seems more likely that it went something like:
Police
Did you know he was going to use the gun?
Bentley
No.
Which makes sense because he knew at the time of the interview that Craig had a gun.
Yes, of course this sort of thing can be gamed, but it wasn't credible that Bentley would have been capable of such sophisticated gaming. The important thing as far as this thread is concerned is that forensic linguistics doesn't plug in a single measure, turn a handle and come out with a yes/no answer; it uses a whole range of measures and builds up an overall picture of what probably happened.
Some of the techniques tested by Brennan and Greenstadt discard prepositions because they are deemed to have no information content, says Michael Oakes, a computational linguist at the University of Sunderland, UK. This filters out the words that could have helped most, he says.
"deemed to have no information content" is actually a positive feature for analysis. Vocabulary is one thing, but the little things, like prepositions, malapropisms, punctuation and favorite constructions are harder to fake. If someo
The fact that one person may write in the style of another is nothing new. While the use of such writing-style analysis may still have a valid use in some cases, it is clear that it, like any other forensic tool (even DNA analysis) can be beaten.
Prior to contemporary times, I believe the number of people who would have had access to enough writing samples (of persons other than authors, columnists, and other published figures) to successfully mimic another's style would have been limited to family members
"We would strongly suggest that courts examine their methods of stylometry against the possibility of adversarial attacks,' say the researchers."
Of course, this assumes that law enforcement actually cares about the guilt or innocence of the people they convict. They don't. They only care about putting as many people in prison as they can.
So handwriting analysis has problems. Another recent Slashdot article was about how DNA evidence might be falsifiable. And we all know that eye-witnesses have serious problems. We don't however reject any of these. Why not? Because we don't care about single pieces of evidence but rather about bodies of evidence. It is the collective narrative which matters. It might be possible for one or two types of evidence to be wrong or falsified. But it is extremely difficult to falsify four or five. The real problem is when overzealous prosecutors try to portray something like handwriting analysis as a CSI-style magic bullet. This is moreover, being balanced by a problem in the opposite direction, which juries increasingly wanting all sorts of technical evidence to convict even when it would be unnecessary, prohibitively expensive or in some cases, a form of evidence that really only exists in fiction.
Could have told you writing analysis was bogus.... (Score:3, Insightful)
Re:Could have told you writing analysis was bogus. (Score:5, Informative)
Parent
Re:Could have told you writing analysis was bogus. (Score:5, Interesting)
I've always wondered just how accurate signatures are. I've noticed that my own signature varies widely depending on various factors. For example, when we purchased our house I had to sign my name to a dozen or more papers. The first signature looked "normal" but the later signatures were glorified scribbles. If I needed to sign a check last and just scribbled my signature on the back, would the bank (not privy to my signature's declining quality in the previous paperwork) be able to tell that it wasn't a bad fake?
Parent
Re:Could have told you writing analysis was bogus. (Score:5, Funny)
I've always wondered just how accurate signatures are. I've noticed that my own signature varies widely depending on various factors.
Signatures written on paper are not all that helpful for a few reasons. First off, they are easy to forge. Second off, a single person might sign his name twice and produce two signatures which look very different to both the naked eye and some forms of analysis - hence not accurate. Where they actually are accurate, however, is when written on pressure sensative pads (such as those seen on new-fandangled credit card swipers). If you were to do an analysis of the pressure and speed at which the signer signed various parts of the signature, you would actually produce some very reliable information. This is because even when you sign your name in slightly different manners you have the tendancy to use the same speed/pressure on certain parts of certain letters. Personally I would just use digital signatures...but calculating hash functions on the back of your resteraunt receipt is never fun. Its also difficult to fit a 256-bit output on that miniscule "sign here" line.
Parent
Re: (Score:2)
Dear Sirs,
I find highly offensive you suggestion that styles of writing may be subject to gimmickry and impersonation. I wish to complain in the strongest possible terms about the broadcast, and am deeply dismayed at the judgment displayed by the BBC in funding and producing such rubbish. Many of my best friends groom haddock and other north Atlantic fishes and only a few of them are transvestites.
Yours faithfully, Brigadier Sir Charles Arthur Strong (Mrs.)
Re: (Score:3, Funny)
Dear Sirs and Madam,
I wish to complain about that last complaint. I can assure you that all groomers of haddock and every other species in order Gadiformes are indeed transvestites. This is in fact a necessary grade to be reached in the apprenticeship process for the Gadiformes Groomers Guild (GGF). If the former complainant indeed knows of any non-transvestite groomers as such, then he should report them both to the GGF and to the Ministry of Fish Groomers in Luton at once!
Angrily,
Mr. Pint
Re: (Score:2)
Which is a totally arbitrary differentiation, considering that a confident, arrogant, or unconcerned forger might well write less hesitantly than a person worried about their handwriting quality, or whether they actually have enough in the bank to cover what they're signing for.
Re:Could have told you writing analysis was bogus. (Score:5, Insightful)
Parent
Re: (Score:2)
Re:Could have told you writing analysis was bogus. (Score:5, Interesting)
Parent
Re: (Score:2)
Or that either their prev
Yes, but here's the problem (Score:5, Interesting)
Yes, but the problem is this:
1. It's not just that it's possible to fake not being myself, it's also that I can pretty much frame someone else. E.g., given enough messages written by KibibyteBrain (which just clicking on the user name or id will give me a list of), it's trivial to do a stylistical analysis on those and not just get an idea of how to write in the same style, but run the same analysis on the result and refine it until the match is outstanding.
2. From what I understand, the people in this test fooled it by merely being told to write in the style of someone else, without the help of any analysis tools, and still fooled it majorly. That's some pretty damn fragile "evidence" if anyone asks me. It's something Joe Sixpack can do by himself. Add some tools and it can only get crappier.
Even such idioms as you mention, are trivial to notice even without any tools. E.g., with only a little correspondence with another team here and reading some of their docs, I can tell that they use "solution" instead of "application".
3. While it can be handwaved as "eh, nobody said it's perfect", some people do seem to take it as less fallible than it really is. Even you just called it "This is reasonable *evidence* of authorship, where of course evidence != proof." And that's the whole point. Something that can be fooled by almost any Joe Sixpack without any tools or much effort, isn't reasonable evidence at all.
We allow evidence like handwriting, signatures, fingerprints, or DNA because they're supposedly very very hard to fake well. Ok, so DNA turned fakable as well, but you need a fair bit of expensive lab equipment and knowledge. It's something a biology prof at a medical college could probably do, but not something Joey Three-fingers the small time smuggler would even know where to start if he wants to plant someone else's fake blood at his latest shootout scene. Or fingerprints turned out easy to fake for the purpose of fooling a fingerprint reader, but it's still very very hard to transfer to an object in a way that looks genuine.
But here we have something that untrained people fooled by just being told to try. I'm sorry, but for me then it shouldn't be evidence at all.
Parent
Re: (Score:2)
I think it's about as much evidence as having someone's IP address. It can be spoofed, it's not necessarily linkable to that exact person - but it is sort of a pointer in the direction of that person, as occam's razor would suggest that it is more likely to be real than a frame.
So I would not say it should be admissible in court, or if it is it should come with a giant caveat, but I could see it pointing investigators in the direction of someone to try to find more hard evidence.
Article doesn't talk about incriminating others (Score:3, Interesting)
While you can attempt to write in someone else's style, you're going to run into problems duplicating it strongly enough for a stylometric analysis to implicate them. Even if you lifted exact phrases from previous works you will invariably need to come up with original words, phrases, and sentence structures to fill the gaps where the original author has not written. These should be enough put reasonable doubt as to the authorship of the faked text.
More over, if it's identified as a fake, by eliminating t
RTFA, seriously (Score:4, Informative)
From TFA: "Each volunteer was then asked to write a description of their neighbourhood in a way that masked their personal style, before writing a further passage in the style of novelist and playwright Cormac McCarthy." [...] "the techniques consistently identified Cormac McCarthy as the author of the imitations of his work."
So, yes, the whole bloody experiment was precisely about disguising your style as someone else, and no, it did not give the tests any reasonable doubt. People trying to imitate Cormac McCarthy were consistently identified as Cormac McCarthy by the stylistic analysis techniques. It doesn't get more clear cut than this, really.
So, yes, it is very possible for an average Joe Sixpack to incriminate someone else, if they so choose.
Parent
Re: (Score:3, Insightful)
It sounds to me like this "evidence" is just another case of bullet matching [sfgate.com], which for those that haven't heard the term was the rage at the FBI for awhile and I'm sure there are innocent people rotting in jail right now over its bogus findings.
What we have to be seriously careful about with these pseudoscience "tests", is the simple fact that juries love CSI style mumbo jumbo that makes solving a case little more than a magic box pointing out someone and saying "He did it". And just like bullet matching
No information is better than bad information... (Score:5, Insightful)
> I don't think anyone has ever sold writing analysis as a unique identifier. But it can be useful.
One problem with that is the human tendency to be overconfident as to how good these tests are. This happens everywhere. Court, business, whatever.
Say you have some metric at work (e.g. lines of code) that's easy to measure. If it's the only measure management has, it's what they'll use to measure how good you're doing. This applies even if the results are absurd, because they would rather believe that they have *some* idea what's going on than to accept the fact that they have no idea what's going on.
In summary, sometimes NO information is better than bad information, but people are very reluctant to accept that fact.
Parent
Re: (Score:2)
Ah, the irony of someone saying "I could have told you" and then saying that it's "completely subjective" and has "no real hard science to support [it]"!
Writing style probably can be useful evidence where the style isn't known by others in advance, but it is quite easy to fake a style (much like having a "normal written style" and a "formal report style").
Re: (Score:3, Insightful)
It is completely subjective and there is no real hard science to support such tests.
I beg to differ. There's very little subjective in stylometrics, the subjective part is interpreting the results, but definitely not producing them. Take a look at http://en.wikipedia.org/wiki/Stylometry [wikipedia.org] and tell me which of the methods described there you think is "completely subjective".
The main problem with stylometry is not the methods, but the data. As TFA describes, changing writing style throw off the results - at l
Concealing style (Score:4, Funny)
hide their identity from 'stylometric' techniques by writing in the style of novelist Cormac McCarthy
... or Anonymous Coward.
Re: (Score:3, Informative)
What a crappy joke. I wish I could find you and kill you.
I mean...
Oh! A bad pun! Should we cross our paths, I'd rather extinguish your life.
My dear sir.
Duh! (Score:4, Insightful)
Did you RTFA? (Score:5, Informative)
If the methods a stylometry analysis uses are known (and they couldn't very well be a secret to hold up in court), of course you can game them.
Parent
Re:Did you RTFA? (Score:5, Insightful)
No, but they knew they were being analyzed and for what. It's trivial to change my style (well, maybe not in English, I don't tend to have the word pool to draw from) and become someone else. If I know in advance that my writing would be used to find me.
You can, probably, given time and persistance, sift through the thousands and millions of board messages posted everywhere on the internet and find out who I am in other boards. I didn't try to hide my identity against comparison of writing styles.
I could see this working if applied to notes and texts written by someone who didn't have any reason to assume it would become the subject of an investigation. I'd deem it utterly worthless, though, when applied to ransom notes and the like.
Parent
Re: (Score:3, Interesting)
No, but they knew they were being analyzed and for what. It's trivial to change my style (well, maybe not in English, I don't tend to have the word pool to draw from) and become someone else. If I know in advance that my writing would be used to find me.
You can, probably, given time and persistance, sift through the thousands and millions of board messages posted everywhere on the internet and find out who I am in other boards. I didn't try to hide my identity against comparison of writing styles.
I could see this working if applied to notes and texts written by someone who didn't have any reason to assume it would become the subject of an investigation. I'd deem it utterly worthless, though, when applied to ransom notes and the like.
That's what I meant, sorry: even a computer program could outwit such analyses. Given the current state of automatic language analysis (Disclaimer: IAA computational linguist), I consider it obvious that a determined person can fool the discriminators enough to appear as someone else.
Re: (Score:3, Funny)
how fast can you generate a text that (a) says what you intend and (b) does not point to you? Very fast, I'd wager.
as fast as: type it out, auto-translate it into french, auto-translate it back into: "the person who is being hated by myself is to be killed by myself by employment of the method of the bomb conflagration saving if it is the case that I am receiving the stipend of an amount that is one million of dollars. sandwich."
P vs. NP (Score:2)
... you can easily find it, by generate-and-test if necessary.
If you think generate-and-test is an easy way to find it, then I've got some NP-complete problems for you to solve. While you're at it, I also have some public keys I'd like you to crack.
(Not that I think fooling stylometry is hard, but generate-and-test is generally not useful for anything but the smallest problems.)
No surprise (Score:5, Interesting)
This should not really come as a surprise to anyone. Like all evidence that has to be interpreted, the interpretation can be flawed.
Shows like CSI have computers getting an exact match on fingerprints and DNA, but the real world is not like that. Fingerprint matching is entirely subjective and the print recovered from a crime scene is rarely a nice clean one like they show on TV. DNA often has to be manipulated before a match can be made (due to the sample found at the scene being too small or of poor quality) and even then it often matches more than one person.
Even when you do get a match, it's not proof that someone was at a specific place because DNA and fingerprints can easily be transferred. Someone broke in to my car a few years ago and despite there being fingerprints the police decided not to prosecute because they were on the outside of the car and the accused could just claim he lent on it on his way home from the pub.
There have been a few cases where fingerprint and DNA evidence have been challenged in the UK courts and shown to be unreliable, with innocent people spending years in jail before being cleared. Yet, the police seem to have started asking for everyone in the area of a crime to "volunteer" their DNA. Presumably if you don't "volunteer" you become a suspect.
The idea that handwriting is any more unique than those two and at all reliable is laughable.
Re:No surprise (Score:4, Insightful)
The theory goes like this: the chances of getting a false positive on a part sample are something like 1/50million. You have 50 million people on the database. This means You'd expect a false positive on every search. If you're unlucky enough to live close enough to a crime to have committed it, you could easily find yourself in court.
You'll then have to defend yourself based on a 1 in 50 million probability to a jury who won't understand the statistics. If you haven't got a solid alibi, it would be a tough thing to do.
There's probably a good Terry Pratchett quote about 1 in a million chances to be used here.
Parent
Re: (Score:2)
This is the problem with fingerprint evidence as opposed to DNA evidence
DNA Evidence is normally matched on a small number of key points against a database of these points, probability of a mismatch is ~ 1:50million with a world population of 6.5 billion you will get mis-matches, with a US population of 300 million you will get mis-matches CODIS has 5 million entries so far ,, mis-matches are less likely but not impossible ....
NB if you have two samples then they can be matched exactly with total confidence
Re: (Score:2)
All it shows is "it is very likely that your DNA is here".
Once you add up everything (other evidence, alibi found to be false, etc) else you might have something. But it is certainly not "short of a confession", it is nowhere even close to a confession.
It's certainly very useful to help find out who else to investigate (and who to investigate first
Same goes for the "writing style" tool. Even if it's easily fooled, it may still be a usef
a common feature of correlations (Score:4, Insightful)
Stylometrics is essentially a correlational field: it's not that people inherently must write in unique styles that are identifiable from a few measurable features: there is no strong genetic causation for handwriting or anything like that, which would mean that a handwriting style really does truly identify an individual or narrow set of individuals. Rather, it's that, all else being equal, people in practice, do tend to write in a way that lets the stylometric features distinguish them. But, when all else isn't equal, and people are actively trying to thwart that sort of analysis, they are, unsurprisingly, able to do so in a lot of cases.
I suspect that a lot of forensic analysis runs into this problem: it takes some fact that empirically is true among the general population, but only because the general population is not actively trying to thwart you. The set of robust empirical truths about people, that hold up even when the person is aware that you're trying to use it against them and actively trying to keep you from doing so, is much smaller.
Re:a common feature of correlations (Score:4, Insightful)
Parent
Misdirection (Score:2)
The real issue is why we continue to ban 'criminals' when forensics are both available for testimony but often not for further examination because of deliberate overuse. We've now been shown data that even DNA evidence can be manufatured, if it's not first tested for methyl levels. And that is totally independent of physical specification. Which bring back the essential question that we've not had updated since 2000: What are we willing to expend energy for?
Cormac McCarthy Stlye? (Score:3, Interesting)
What exactly is the "Cormac McCarthy style"? The article doesn't mention it all. I even skimmed through the paper and all it does it quote a paragraph from some work of Cormac McCarthy.
I can't figure out what his style exactly is, and I certainly would not be able to fake it as the participants were supposed to. And the participants were supposed to not be literary geniuses.
Re: (Score:2)
The only thing of his I've read is "The Road", which is a great post-apocalyptic novel. I do remember his style was a little unusual... it's been a few years, but I'm thinking sentence fragments, half-finished thoughts, etc.
Whatever it was, though, it wasn't distracting enough to prevent me from finishing the book. I'm trying to read some Margaret Atwood now, and not really enjoying it...
Re: (Score:2)
Re: (Score:2)
What exactly is the "Cormac McCarthy style"? The article doesn't mention it all. I even skimmed through the paper and all it does it quote a paragraph from some work of Cormac McCarthy.
Admittedly, they should have included an excerpt of reference text. From a randomly selected [quarterlyc...sation.com] website because I'm too lazy to walk to my bookshelf for something newer:
Re: (Score:2, Interesting)
Ummm Not Fair 20 years ago the exam board just labeled that 'Bad Grammmar' and failed me.
Re: (Score:2, Informative)
"In the dawn there is a man progressing over the plain by means of holes, which he is making in the ground. He uses an implement with two handles and he chocks it into the hole and he enkindles the stone into the hole with he steel, hole by hole, striking the fire out of the rock, which God has put there. On the plain behind him are the wanderers in search of bones, and those who do not search. And they move haltingly in the light, likes mech
Selfevident, isn't it? (Score:2, Interesting)
I do not really see how you would ever expect different.
lol gay (Score:2)
Misrepresents forensic linguistics (Score:5, Insightful)
As the article says "the study only attacked some of the less complex stylometry techniques". In fact, I'm surprised that they even considered lexical density because that varies greatly within a single author's writing. It's usually high at the beginning of a text, usually (not always) gradually falls off, jumps when they change subject, and so on. I'm not aware of it's being used in forensic linguistics (although it is used in analysing texts to identify, for example, objective divisions within a text).
The sort of thing that they used in the Derek Bentley [wikipedia.org] (which contributed to the partial posthumous pardon) was analysis of his statement, which had
That all pointed to the statement not being Bentley's own words, but rather being the police version of his answers to a series of police questions that had been removed from the statement. One aspect of his original trial was a statement "I did not know he was going to use the gun", which was taken as evidence that he knew his accomplice, Craig, had a gun (and the inconsistency with the denial that he knew this, later in the statement, was taken as evidence that he was lying). Since the linguistic analysis shows that this was probably a reply to a question, it seems more likely that it went something like:
No.
Which makes sense because he knew at the time of the interview that Craig had a gun.
Yes, of course this sort of thing can be gamed, but it wasn't credible that Bentley would have been capable of such sophisticated gaming. The important thing as far as this thread is concerned is that forensic linguistics doesn't plug in a single measure, turn a handle and come out with a yes/no answer; it uses a whole range of measures and builds up an overall picture of what probably happened.
Bad Assumptions too (Score:2)
"deemed to have no information content" is actually a positive feature for analysis. Vocabulary is one thing, but the little things, like prepositions, malapropisms, punctuation and favorite constructions are harder to fake. If someo
Re: (Score:2)
In fact, I'm surprised that they even considered lexical density because that varies greatly within a single author's writing.
Did they? Yes, the article mentions lexical density, but it then* goes on to describe token/type ratio, which is a different beast entirely.
The problem with FAs is that they're anything but a primary source....
HAL.
* Am I hiding my writing style here or adhering to it...?
Re: (Score:2)
When writing samples abound (Score:2)
Prior to contemporary times, I believe the number of people who would have had access to enough writing samples (of persons other than authors, columnists, and other published figures) to successfully mimic another's style would have been limited to family members
as if law enforcement cares (Score:3)
"We would strongly suggest that courts examine their methods of stylometry against the possibility of adversarial attacks,' say the researchers."
Of course, this assumes that law enforcement actually cares about the guilt or innocence of the people they convict. They don't. They only care about putting as many people in prison as they can.
Re: (Score:3, Informative)
They only care about putting as many people in prison as they can.
Wrong|
The Basic Metric used on the police is case closed.
In other words, it is easy to say a dead person committed a crime; because it closes a case.
Metrics have very bad sides.
Tim S.
All evidence is tentative (Score:3, Insightful)