FBI Fights Testing For False DNA Matches 411
Statesman writes "The Los Angeles Times reports that an Arizona crime lab technician found two felons with remarkably similar genetic profiles, so similar that they would ordinarily be accepted in court as a match, but one felon was black and the other white. The FBI estimated the odds of unrelated people sharing those genetic markers to be as remote as 1 in 113 billion. Dozens of similar matches have been found, and these findings raise questions about the accuracy of the FBI's DNA statistics. Scientists and legal experts want to test the accuracy of official statistics using the nearly 6 million profiles in CODIS, the national system that includes most state and local databases. The FBI has tried to block distribution of the Arizona results and is blocking people from performing similar searches using CODIS. A legal fight is brewing over whether the nation's genetic databases ought to be opened to wider scrutiny. At stake is the credibility of the odds often cited in DNA cases, which can suggest an all but certain link between a suspect and a crime scene."
Re:So, the 1:113 Billion estimate is wrong (Score:5, Informative)
Have a read. It shows that you can't trust statistics when you only have half of the picture, and why it can be so dangerous to do so.
Re:So, the 1:113 Billion estimate is wrong (Score:1, Informative)
It's not wrong (at least not by much).
Well, a quick calculation shows that for 122 matches out of a database of 65,000 (2 billions pairwise comparisons) and 908 matches out of a database of 220,000 (24 billion comparisons) when looking for all pairs is about 1/20,000,000.
Of course, this does not take into account that he's looking for any 9 matches out of 13. As the article mentioned, if any of the loci do not match, an overall mismatch is called. So the probabilities further go down because you need to have only the exact 9/13 loci to be available (since 10+ matches out of 13 are rare), which is 1/715. That gets us in the ballpark of 1/100 billion of having that kind of a match.
Maybe the numbers are slightly off, but they don't seem wrong by much.
They should've gotten a statistician to explain the results to any judge and jury who needed to hear it rather than fighting it tooth and nails like they did. Looks like the FBI doesn't understand it properly either, which is definitely worrisome.
There's nothing to see here, just people who don't realize how statistics apply when dealing with a large number of comparisons.
Birthday paradox (Score:5, Informative)
Re:Birthday paradox (Score:3, Informative)
(yeah, I suck, forgot "plain old text")
The FBI says that the chance of any given person matching another unrelated person is 1 in 113 billion. They claim that the reason the Arizona lab tech found as many matches as she did ("dozens") is because she was checking the whole database (6 million entries) against itself. This is a straightforward birthday paradox issue, then.
According to the Wikipedia birthday problem page, the number of collisions expected given d= 113 billion different "birthdays" and n = 6 million "people in the room" is n - d + d((d-1/d)^n). This is about 160 matches! So in fact the FBI may be right.
Note that the chance of a given person matching _anyone_ in the database is about 0.0053%, which is much greater than 1 in 113 billion.
Re:We're seeing no such thing. (Score:1, Informative)
Your math is basically sound, however they are only using a THIRTEEN "markers" to make their identification/match.
If they used the entire thing, I suspect your math would be completely correct.
Would you care to re-do your math using only 13 points as the profile?
Birthday Paradox (Score:4, Informative)
If I'm not mistaken, what you've described is the Birthday Paradox:
http://en.wikipedia.org/wiki/Birthday_paradox/ [wikipedia.org]
Re:DNA can disprove only (Score:5, Informative)
Re:We're seeing no such thing. (Score:1, Informative)
That math - simple as it is - is too complex to explain to the average viewer in a 30 second news byte... what the media will do is take those 159 matches and blow them into a sensational story about the possibility (not probability) that DNA nabbed the wrong guy. If they can sufficiently suppress this story, they will have a lot less jurors quoting the news byte as absolute proof that DNA evidence can't be trusted.
Still, they should do the test - I'm not worried if there are 50, 159, or 300 "matches" - I'd like to know if there are 1500+
Re:well, well... (Score:3, Informative)
Hmmmm [usdoj.gov]...
At midyear 2007 there were 4,618 black male sentenced prisoners per 100,000 black males in the United States, compared to 1,747 Hispanic male sentenced prisoners per 100,000 Hispanic males and 773 white male sentenced prisoners per 100,000 white males.
Almost 6 to 1. And how many people will use these numbers to justify their racist attitudes instead of realizing who's being targeted? An economic breakdown might be even more revealing.
Re:well, well... (Score:5, Informative)
If person A has a DNA profile that matches one other person in the country, it is still very strong evidence.
If upon checking the other states there was found to be an average of one matching person per state, 50 matches, still strong evidence, but not nearly so conclusive. Would now require stronger supporting evidence to be "beyond reasonable doubt".
If (prison population being approx 1%) there are found to be 100 matches per state, 5000 matches, then DNA becomes more useful as evidence for aquittal than for conviction, ie: non-matching still proves it wasn't you but matching doesn't prove it was you.
Re:Transparent government (Score:3, Informative)
Everything, I suppose, but your nic.
Since presumably he is not a member of the government, "radical transparency" does not apply to his identity.
Re:We're seeing no such thing. (Score:4, Informative)
Re:well, well... (Score:4, Informative)
Re:well, well... (Score:2, Informative)
I can see it now. Sparticus II: "We Are Sarcasticus!"
See, that was a horrible joke.
Re:well, well... (Score:3, Informative)
Re:I wonder... (Score:4, Informative)
. That is to say, the chance of having marker A might be 1% and the chance of having marker B might be 5%, but the chance of having BOTH might very well be higher (or lower) than .05%.
IANAFG (I am not a forensic geneticist) but the co-segregation of genetic markers is such a fundamental and well understood process that I would have a hard time believing that they wouldn't know and correct for the rates of their chosen set when calculating the probabilities of a matched set.
Of course the statistics they calculate are probably based on estimates of pairwise segregation. Some higher-order effects may be at work that change the statistics relative to a basic model like independent pairwise segregation.
For example, allele A of gene 1 and allele B of gene 2 may not segregate according to a previously measured pairwise stastistic in the presence of allele C of gene 3. Such higher-order effects may have a significant impact on the statistics but would require a *lot* of data to reveal.
Re:We're seeing no such thing. (Score:5, Informative)
>>There is a big difference between telling a lay jury "this match had a one in a 113 billion chance of occurring at random" versus "this is an event that occurs randomly on a routine basis." Non-statisticians have a hard time getting their head around the concept of correction for multiple hypothesis testing.
To give an apocryphal quote by Mark Twain: "People use statistics the same way drunks use lampposts - for support, not illumination."
The lack of ability to reason statistically is extremely common in America. I mean extremely common - even in grad students publishing papers on stats, or in the technologically literate crowd. I'd used to write examples of egregiously bad stats in my livejournal in papers and news reports, but gave up because it was so common.
The DNA testing example is actually an example we studied in the Bayseian/conditional chapter of my stats textbook. It described an actual court case in LA where I got was convicted solely by DNA evidence (there was no other evidence to convict him, and he wasn't lucky enough to have an alibi) because the prosecutor confused the odds that (in this case) the odds of the match randomly matching being only one-in-a-million, and those are some pretty powerful odds. Of course, that would mean that in LA alone, there would be 6 people (on average) matching the DNA, and so the chance of the guy being guilty is actually only 1/6 or so.
The problem I have with the DNA "this has a one in 113 billion chance of matching" is that this is an extrapolated number based on certain premises of independence between the different loci. Whereas the more we learn about DNA, the more we learn that there is a high degree of covariability, certainly enough that (as the article shows), the odds of a match are actually much much higher.
Re:Birthday Paradox (Score:3, Informative)
Re:We're seeing no such thing. (Score:3, Informative)
I don't think they are even using markers. I thought they were using a process that basically duplicates the DNA a massive number of times, then use gravity vs. capillary action to weigh the different chromosomes which may or may not have been through a blender 1st. They are not comparing gigabits of data to verify a DNA match.
Re:You're all missing the point! (Score:2, Informative)
Or that's what they expect you to conclude.
These tests are chosen so they can tell a person apart from his family, even his own twin in some cases! The "extremely close matching DNA" you mention consists of a very small portion of the subject's DNA which in most cases encodes nothing we know of.
These tests can conclude "subject X is not the same as suspect A", but they just can say "theres's a very high probability of suspect A being the same as subject X".
There's racial tracers in DNA that can tell how long ago your lineage forked from its branch, like in the National Geographic Global Gene Project. And that proves differences among human races.
Re:Birthday Paradox (Score:5, Informative)
If I'm not mistaken, what you've described is the Birthday Paradox:
http://en.wikipedia.org/wiki/Birthday_paradox/ [wikipedia.org]
You aren't mistaken, but the Wikipedia reference is actually Birthday problem [wikipedia.org].
surprising it took this long (Score:3, Informative)
Forensic "science" (Score:4, Informative)
As I've said time and time again. Forensic science is a scam. Second rate statisticians and second rate politicians team up with second rate scientists and second rate TV shows to convince the public that forensic superheroes can detect evidence of any evil crime you commit. It's just a way to keep the people under control.
Re:We're seeing no such thing. (Score:3, Informative)
The test would immediately give a 100% result of it as non-human DNA.
One of the reasons for that is the fact that humans have 23 pairs of chromosomes whereas chimps and other primates have 24 pairs. We didn't "lose" a chromosome - one strand of DNA got glued on to the end of one of the other strands of DNA, so all of the same genetic information is still there.
-