I was reading the current issue of the NEJM today and got to the article called Chest Radiography for Presumed Pneumonia in Children - it caught my attention as a medical decision making article. It's of the NEJM genre that poses a clinical quandary, and then asks two discussants each to defend a different management course. (Another memorable one was on whether to treat subsegmental PE.) A couple of things struck me about the discussants' remarks about CXRs for kids with possible pneumonia. The first discussant says that "a normal chest radiograph reliably rules out a diagnosis of pneumonia." That is certainly not true in adults where the CXR has on the order of 50-70% sensitivity for opacities or pneumonia. So I wondered if kids are different from adults. The second discussant then remarked that CXR has a 98% negative predictive value for pneumonia in kids. This number really should get your attention. Either the test is very very sensitive and specific, or the prior probability in the test sample was very low, something commonly done to inflate the reported number. (Or, worse, the number is wrong.) I teach trainees to always ignore PPV and NPV in reports and seek out the sensitivity and specificity, as they cannot be fudged by selecting a high or low prevalence population. It then struck me that this question of whether or not to get a CXR for PNA in kids is a classic problem in medical decision making that traces its origins to Ledley and Lusted (Science, 1959) and Pauker and Kassirer's Threshold Approach to Medical Decision Making. Surprisingly, neither discussant made mention of or reference to that perfectly applicable framework (but they did self-cite their own work). Here is the Threshold Approach applied to the decision to get a CT scan for PE (Klein, 2004) that is perfectly analogous to the pediatric CXR question. I was going to write a letter to the editor pointing out that 44 years ago the NEJM published a landmark article establishing a rational framework for analyzing just this kind of question, but I decided to dig deeper and take a look at this 2018 paper in Pediatrics that both discussants referenced as the source for the NPV of 98% statistic.
In order to calculate the 98% NPV, we need to look a the n=683 kids in the study and see which cells they fall into in a classic epidemiological 2x2 table. The article's Figure 2 is the easiest way to get those numbers:
Here is the 2x2 table we can construct using the numbers from Figure 2 in the paper, before the follow-up of the 5 kids that were diagnosed with pneumonia 2 weeks later:
And here is the 2x2 table that accounts for the 5 kids that were initially called "no pneumonia" but were diagnosed with pneumonia within the next two weeks. Five from cell "d" (bottom right) must be moved to cell "c" (bottom left) because they were CXR-/PNA- kids that were moved into the CXR-/PNA+ column after the belated diagnosis:
The PPV has fallen trivially from 90% to 89%, but why are both so far away from the authors' claim of 98%? Because the authors conveniently ignored the 44 kids with an initially negative CXR that were nonetheless called PNA by the physicians in cell "c". They surely should be counted because, despite a negative CXR, they were still diagnosed with PNA, just 2 weeks earlier than the 5 that the authors concede were false negatives; there is no reason to make a distinction between these two groups of kids, as they are all clinically diagnosed pneumonia with a "falsely negative" CXR (cell "c").
And here is the 2x2 table that accounts for the 5 kids that were initially called "no pneumonia" but were diagnosed with pneumonia within the next two weeks. Five from cell "d" (bottom right) must be moved to cell "c" (bottom left) because they were CXR-/PNA- kids that were moved into the CXR-/PNA+ column after the belated diagnosis:
The PPV has fallen trivially from 90% to 89%, but why are both so far away from the authors' claim of 98%? Because the authors conveniently ignored the 44 kids with an initially negative CXR that were nonetheless called PNA by the physicians in cell "c". They surely should be counted because, despite a negative CXR, they were still diagnosed with PNA, just 2 weeks earlier than the 5 that the authors concede were false negatives; there is no reason to make a distinction between these two groups of kids, as they are all clinically diagnosed pneumonia with a "falsely negative" CXR (cell "c").
It is peculiar, - rather, astonishing - that the NPV in this study, still being touted and referenced as a pivot for decision making, was miscalculated despite peer review. And while you may be tempted to say that 89% is pretty close to 98%, you would be making a mistake. Using the final sensitivity and specificity from this 2x2 table, we can calculate LR+ and LR- for CXR as a test for PNA: they are 10.8 and 0.26. We can also see from this table that the rate (some may say "prevalence") of PNA in this sample is 32%. What is the posterior probability of PNA based on the "correct" numbers if the pre-test probability (or the rate or prevalence of pneumonia) is 65% instead of 32%? The calculator in the Status Iatrogenicus sidebar can be used to easily calculate it: the NPV in that case is 68%, and of course 1-NPV (the output of the calculator, chosen to emphasize the residual probability of disease in the presence of a negative test) is 32%. Pneumonia in that circumstance is still far above the treatment threshold. By that I mean, if my child had probability of pneumonia of 32%, I would want them treated. (Because antibiotics are pretty benign, bro; resistance happens at the poultry farm.)
There are more fundamental problems. Like child abuse studies, there is a circular logic here: the kid has pneumonia because the doc says he has pneumonia, but the doc knows the CXR shows "pneumonia"; but then teh diagnosis of PNA leads to the CXR finding being classified as a true positive. How many of the pneumonia diagnoses were true/false positives/negatives? We can't know because we have no gold standard for pneumonia, just as we have no gold standard for child abuse - we are guessing which cells the numbers go in. This points to another violation of basic Bayesian assumptions: there must be conditional independence between the results of the test and the presence or absence of the disease in question. Here, there is very clearly dependence because the docs are making the pneumonia determination on the basis of the CXR. The study design is fundamentally flawed, and so are all conclusions that ramify from it.
I'm always a little surprised when I go digging into the studies that people bandy about as "evidence" for this and that, as I frequently find that they're either misunderstood, misrepresented, or just plain wrong. I can readily imagine a pediatrician (resident or attending) telling me with high confidence that the CXR can "rule out" pneumonia in my kid, because her attendings told her that on the basis of the 2018 Lipsett study, and yet none of them ever looked any deeper into the actual study to find its obvious mistakes and shortcomings.
As they say, "Trust, but verify." Or perhaps more apropos here: "Extraordinary claims require extraordinary evidence." An NPV of 98% (for CXR!) is an extraordinary claim indeed. The evidence for it, however, is not extraordinary. As a trusted mentor once told me "Scott, don't believe everything you read."
ETA: You can get a 98% NPV using the sensitivity and specificity from the Lipsett data (despite the erroneous assumptions that inhere in them) by using a prevalence of pneumonia of just 7%. To wit: if you want to get to a posterior probability of PNA of 2% (corresponding to the reported 98% NPV in the Lipsett study), you need to start with a population in which only 7 of 100 kids has pneumonia, and you need to do a CXR on all of them, to reduce it by 5 kids so that only 2 of them have PNA. 100 CXRs later, pneumonia cases in the cohort are reduced from 7 cases to 2. Is it worth it to do 100 CXRs to avoid 5 courses of antibiotics? We could make a formal Threshold analysis to answer this question, but apparently that was not the point of the "Clinical Decisions" section of this week's NEJM; rather, it was to highlight reference 1, which turns out to have conclusions based on a miscalculation.


 
