Tuesday, April 21, 2009

Judicial use of DNA "evidence" and Misuse of Statistics: The Prosecutor's Fallacy

A recent article in the NYT described the adoption by the judicial system of a technology that began as a biomedical research tool (I resist to some extent the notion that DNA technology has directly been a boon to clinical patient care.) (See: http://www.nytimes.com/2009/04/19/us/19DNA.html.) This powerful technology, when used appropriately in appropriate circumstances, provides damning evidence of guilt because of its high specificity - the probability of a coincidental match is stated to be as low as 1x10-9. Thus, in a case such as that of the infamous (and nefarious) OJ Sipmson, in which there is strong suspicion of guilt BEFORE the DNA evidence is evaluated, a positive match, in the absence of laboratory error or misconduct (neither of which can be routinely discounted - see: http://www.nytimes.com/2001/09/26/us/police-chemist-accused-of-shoddy-work-is-fired.html) essentially proves, beyond any reasonable doubt, the genetic identity of the person to whom the sample belongs. (Yes, that does indeed mean that OJ Simpson is the perpetrator of the heinous murder of Nicole Brown Simpson, he said unapologetically.)

In the case of old OJ, he was one among perhaps 10, let's say 100 suspects. Let's assume that the LAPD had their act together (this also requires a leap of faith) and that the perpetrator is among the suspects that have been rounded up, but we have no evidence to differentiate their respective probabilities of guilt. Thus, each of the 100 has a 1% probability of being guilty, on the basis of circumstantial evidence alone, or a relation to or relationship with the victim(s) or just being in the wrong place at the wrong time, whatever. Given that 1% probability of guilt, we can make a 2x2 table representing the the probability of guilt given a positive test, which is ultimately what we want to know. I don't know the sensitivity of DNA fingerprinting, but it doesn't really matter because the high specificity of the test drives the likelihood ratio. I will assume it's 50% for simplicity:

In this "population" of 100 suspects (by suspects, I mean persons whose probability of having committed the crime is enhanced over that of a random member of the overall population by virtue of other evidence), even if all 100 suspects have equiprobable guilt, a DNA "match" is damning indeed and all but assures the guilt of the matching suspect (with the caveats mentioned above.)

But consider a different situation, one in which there are no convincing suspects. Suppose that the law enforcement authorities compare a biological sample with a large DNA database to look for a match. Note that we do not use the term "suspect" here - because it implies that there is some suspicion that has limited this population from the overall population. When a database (of unsuspected persons) is canvassed, no such suspicion exists. Rather, a fishing expedition ensues, and the probabilities, when computed, come out quite different. Suppose there are DNA samples from 100 million individuals in the database, and the entire database is canvassed. Now our 2x2 table looks like this:

Whereas in our previous example of a population of "suspects" guilt was all but assured based on a "match", in this example of canvassing a database, guilt is dubious. But what do you suppose will happen in such an investigation? Who will suspend his judgment and conduct a fair investigation of this "matching" individual, who is now a "suspect" based only on "evidence" from this misused test? How tempting will it be for detectives to selectively gather information and see reality through the distorted lens of the "infallible" DNA testing? How can such a person hope to exonerate himself?

This is the Prosecutor's Fallacy. It bolsters arguments by the ACLU and others that the trend of snowballing DNA sample collection should be curtailed, and that limits should be placed on canvassing efforts to solve crimes.

One way to limit the impact of the Prosecutor's Fallacy and false positive "matches" from canvassing efforts would be to force investigators to assign certain profiles to the imaginary "suspect" whom they hope to find in the database and to canvas a subgroup of the database that matches those characteristics. For example, if the crime occurred in Seattle, the canvassing effort could be limited to a subset of the database that lived in or near Seattle, since it is unlikely that a person in Baltimore committed the crime. Other characteristics that are probabilistically associated with certain crimes could be used to limit broad canvassing efforts.

As the use of medical technology expands both inside and outside medicine, we have a responsibility to utilize it wisely and rationally. The strategy of database screening and canvassing is reckless, unwise, and unjust, and should be summarily and duly curtailed.

No comments:

Post a Comment