Sunday, May 16, 2010

What do erythropoetin, strict rate control, torcetrapib, and diagonal earfold creases have in common? The normalization heuristic

I was pleased to see the letters to the editor in the May 6th edition of the NEJM regarding the article on the use of synthetic erythropoetins (see ). The letter writers must have been reading our paper on the normalization heuristic (NH)! (Actually, I doubt it. It's in an obscure journal. But maybe they should.)

In our article (available here: ), we operationalized the definition of the NH and attributed it to 4 errors in reasoning that lead it to be fallible as a general clinical hypothesis. Here is the number one reasoning error:

Where the normalization hypothesis is based on the assumption that the abnormal value is causally related to downstream patient outcomes, but in reality the abnormal value is not part of the causal pathway and rather is an epiphenomenon of another underlying process.

The authors of some of the letters to the editor of the NEJM have the same concerns about normalizing hemoglobin values, and the assumptions that this practice involves about our understanding of causal pathways. Which is what I want to focus on. So please turn your attention to, yes, the picture of the billiards.

I wager that the pathophysiological processes that occur in the body are more complex than the 16 balls in the photo, but it serves as a great analogy for understanding the limitations of what we know about what's going on in the body. Suppose that every time (or a high percentage of the time - we can probability adjust and not lose the meaning of the analogy) the cue ball, launched from the same spot at the same speed and angle, hits the 1--2--4--7--11 balls. We know the 11 ball is, say, cholesterol. We have figured this out. And it falls in the corner pocket - it gets "lower". But we don't know what the other balls represent, or even how many of them there are, or where they fall. We needn't know all of this to make some inferences. We see that when the cue ball is launched at a certain speed and angle, the 11 ball, cholesterol, falls. So we think we understand cholesterol. But the playing field is way more complex than the initiating event and the one final thing that we happen to be watching or measuring - the corner pocket. In the whole body, we don't even know how many balls and how many pockets we're dealing with! We only can see what we know to look for!

Suppose also that as a consequence of this cascade, the 7 ball hits the 12 ball, which falls in another corner pocket. We happen to be watching that pocket also. We know what it does. For lack of a better term, let's call it the "reduced cardiovascular death pocket." Every time this sequence of balls is hit, cholesterol (number 11) falls in one corner pocket, and the 12 ball falls in another pocket, and we infer that cholesterol is part of the causal pathway to cardiovascular death. But look carefully at the diagram. We can remove the 11 ball altogether, the 7 ball will still hit the 12 and sink it thus reducing cardiovascular death. So it's not the cholesterol at all! We misunderstood the causal pathway! It's not cholesterol falling per se, but rather some epiphenomenon of the sequence.

By now, you've inferred who is breaking. His name is atorvastatin (which I fondly refer to as the Lipid-Torpedo). When a guy called torcetrapib breaks, all hell breaks loose. We learn that there's another pocket called "increased cardiovascular death pocket" and balls start falling into there.

(A necessary aside here - I am NOT challenging the cholesterol hypothesis here. It may or may not be correct, and I certainly am not the one to figure that out. I merely wish to emphasize how we COULD make incorrect inferences about causal pathways.)

So when I see an article like there was a couple of weeks ago in the NEJM (see ) about "strict rate control" for atrial fibrillation (AF), I am not surprised that it doesn't work. I am not surprised that there are processes going on in a patient with AF that we can't even begin to understand. And the coincidental fact that we can measure heart rate and control it does not mean that we're interrupting the causal pathway that we wish to.

A new colleague of mine told me the other day of a joke he likes to make that causes this to all resonate harmoniously - "We don't go around trying to iron out diagonal earfold creases to reduce cardiovascular mortality." But show us a sexy sequence of molecular cascades that we think we understand, and the sirens begin to sing their irresistible song.

Saturday, May 1, 2010

Everyone likes their own brand - Delta Inflation: A bias in the design of RCTs in Critical Care

At long last, our article describing a bias in the design of RCTs in Critical Care Medicine (CCM) has been published (see: ). Interested readers are directed to the original manuscript. I'm not in the business of criticising my own work on my own blog, but I will provide at least a summary.

When investigators design a trial and do power and sample size (SS) calculations, they must estimate or predict a priori what the [dichotomous] effect size will be, in say, mortality (as is the case with most trials in CCM). This number should ideally be based upon preliminary data, or a minimal clinically important difference (MCID). Unfortunately, it does not usually happen that way, and investigators rather choose a number of patients that they think they can recruit with available funding, time and resources, and they calculate the effect size that they can find with that number of patients at 80-90% power and (usually) an alpha level of 0.05.

If power and SS calculations were being performed ideally, and investigators were using preliminary data or published data on similar therapies to predict delta for their trial, we would expect that, over the course of many trials, they would be just as likely to OVERESTIMATE observed delta as they are to underestimate it. If this were the case, we would expect random scatter around a line representing unity in a graph of observed versus predicted delta (see Figure 1 in the article). If, on the other hand, predicted delta uniformely exceeds observed delta, there is directional bias in its estimation. Indeed, this is exactly what we found. This is no different from the weatherman consistently overpredicting the probability of precipitation, a horserace handicapper consistently setting too long of odds on winning horses, or Tiger Woods consistently putting too far to the right. Bias is bias. And it is unequivocally demonstrated in Figure 1.

Another point, which we unfortunately failed to emphasize in the article, is that if the predicted deltas were being guided by a MCID, well, the MCID for mortality should be the same across the board. It is not (Figure 1 again). It ranges from 3%-20% absolute reduction in mortality. Moreover, in Figure 1, note the clustering around numbers like 10% - how many fingers or toes you have should not determine the effect size you seek to find when you design an RCT.

We surely hope that this article stimulates some debate in the critical care community about the statistical design of RCTs, and indeed in what primary endpoints are chosen for them. It seems that chasing mortality is looking more and more like a fool's erand.

Caption for Figure 1:
Figure 1. Plot of observed versus predicted delta (with associated 95% confidence intervals for observed delta) of 38 trials included in the analysis. Point estimates of treatment effect (deltas) are represented by green circles for non-statistically significant trials and red triangles for statistically significant trials. Numbers within the circles and triangles refer to the trials as referenced in Additional file 1. The blue ‘unity line’ with a slope equal to one indicates perfect concordance between observed and predicted delta; for visual clarity and to reduce distortions, the slope is reduced to zero (and the x-axis is horizontally expanded) where multiple predicted deltas have the same value and where 95% confidence intervals cross the unity line. If predictions of delta were accurate and without bias, values of observed delta would be symmetrically scattered above and below the unity line. If there is directional bias, values will fall predominately on one side of the line as they do in the figure.

If you want a fair shake, you gotta get a trach (and the sooner the better)

In the most recent issue of JAMA, Terragni et al report the results of an Italian multicenter trial of early (6-8 days) versus delayed (13-15 days) tracheostomy for patients requiring mechanical ventilation ( ). This research complements and continues a line of investigation of early tracheostomy in RCTs by Rumbak et al in 2004. In that earlier trial (,_randomized,_study_comparing_early.9.aspx ) the authors showed that [very] early tracheostomy (at 48 hours into a patient's illness) compared to delayed tracheostomy (after 14 days of illness) let to reduced mortality, pneumonia, sepsis, and other complications of critical illness. The mediators of this effect are not known with certainty, but may be related to the effects of reduced sedation requirements with tracheostomy, reduced dead space, facilitation of weaning from mechanical ventilation, or psychological effects on the patients or the physicians caring for them. Regardless of the mediators of the effect, I can say confidently from an anecdotal perspective that the effects appear to be robust. Almost as if by magic, something changes after a patient gets a tracheostomy, and recovery appears to accelerate - patients just "look better" and they appear to get better faster. (Removal of the endotracheal tube (ETT) allows spitting and swallowing, activities which will be required to protect the airway when all artificial airways are removed; it allows lip-reading by families and providers and in some cases speech; it allows sedation to be reduced expeditiously; it facilitates weaning; it allows easier positioning out of bed in a chair and working with physical therapy during weaning; the list goes on...)

What are the drawbacks to such an approach? Traditionally, a tracheostomy has been viewed by practitioners as the admission of a failure of medical care - we couldn't get you better fast, with a temporary airway, so we had to "resort" to a semi-permanent or permanent [surgical] airway. Moreover, a tracheostomy was traditionally a surgical procedure requiring transportation to the operating suite, although that has changed with the advent of the percutaneous dilitational approach. Nonetheless, whichever route is used to establish the tracheostomy, certain immediate and delayed risks are inherent in the procedure, and the costs are greater. So, the basic question we would like to answer is "are there benefits of tracheostomy that outweigh these risks?"

There were several criticisms of the Rumbak study which I will not elaborate upon here, but suffice it to say that the study did not lead to a sweeping change in practice with regard to the timing of tracheostomies and thus additional studies were planned and performed. One such study, referenced by last week's JAMA article, only enrolled 25% of the anticipated sample of patients with resulting wide confidence intervals. As a result, few conclusions can be drawn from that study, but it did not appear to show a benefit to earlier tracheostomy (,_randomized,_study_comparing_early.9.aspx ) A meta-analysis which included "quasi-randomized" studies (GIGO: Garbage In, Garbage Out) concluded that while not reducing mortality or pneumonia, early tracheostomy reduced the duration of mechanical ventilation and ICU stay. (It seems likely to me that if you stay on the ventilator and in the ICU for a shorter period, since there is a time/dose-dependent effect of these things on complications such as catheter-related blood stream infections and ventilator-associated pneumonia (VAP), that these outcomes and outcomes further downstream such as mortality WOULD be affected by early tracheostomy - but, the further downstream an outcome is, the more it gets diluted out, and the larger a study you need to demonstrate a significant effect.)

Thus, to try to resolve these uncertainties, we have the JAMA study from last week. This study was technically "negative." But in it, every single pre-specified outcome (VAP, ventilator-free days, ICU-free days, mortality) trended (some significantly) in favor of early tracheostomy. The choice of VAP as a primary outcome (P-value for early versus delayed trach 0.07) is both curious and unfortunate. VAP is notoriously difficult to diagnose and differentiate from other causes of infection and pulmonary abnormalities in mechanically ventilated ICU patients (see ) - it is a "soft" outcome for which no gold standard exists. Therefore, the signal-to-noise ratio for this outcome is liable to be low. What's perhaps worse, the authors used the Clinical Pulmonary Infection Score (CPIS, or Pugin Score: see Pugin et al, AJRCCM, 1991, Volume 143, 1121-1129) as the sole means of diagnosing VAP. This score, while conceptually appealing, has never been validated in such a way that its positive and negative predictive values are acceptable for routine use in clinical practice (it is not widely used), or for a randomized controlled trial (see ). Given this, and the other strong trends and significant secondary endpoints in this study, I don't think we can dichotomize it as "negative" - reality is just more complicated than that.

I feel about this trial, which failed its primary endpoint, much the same as I felt about the Levo versus Dopa article a few weeks back. Multiple comparisons, secondary endpoints, and marginal P-values notwithstanding, I think that from the perspective of a seriously ill patient or a provider, especially a provider with strong anecdotal experience that appears to favor earl(ier) tracheostomy, the choice appears to be clear: "If you want a fair shake, you gotta get a trach."