Saturday, May 1, 2010
Everyone likes their own brand - Delta Inflation: A bias in the design of RCTs in Critical Care
At long last, our article describing a bias in the design of RCTs in Critical Care Medicine (CCM) has been published (see: http://ccforum.com/content/14/2/R77 ). Interested readers are directed to the original manuscript. I'm not in the business of criticising my own work on my own blog, but I will provide at least a summary.
When investigators design a trial and do power and sample size (SS) calculations, they must estimate or predict a priori what the [dichotomous] effect size will be, in say, mortality (as is the case with most trials in CCM). This number should ideally be based upon preliminary data, or a minimal clinically important difference (MCID). Unfortunately, it does not usually happen that way, and investigators rather choose a number of patients that they think they can recruit with available funding, time and resources, and they calculate the effect size that they can find with that number of patients at 80-90% power and (usually) an alpha level of 0.05.
If power and SS calculations were being performed ideally, and investigators were using preliminary data or published data on similar therapies to predict delta for their trial, we would expect that, over the course of many trials, they would be just as likely to OVERESTIMATE observed delta as they are to underestimate it. If this were the case, we would expect random scatter around a line representing unity in a graph of observed versus predicted delta (see Figure 1 in the article). If, on the other hand, predicted delta uniformely exceeds observed delta, there is directional bias in its estimation. Indeed, this is exactly what we found. This is no different from the weatherman consistently overpredicting the probability of precipitation, a horserace handicapper consistently setting too long of odds on winning horses, or Tiger Woods consistently putting too far to the right. Bias is bias. And it is unequivocally demonstrated in Figure 1.
Another point, which we unfortunately failed to emphasize in the article, is that if the predicted deltas were being guided by a MCID, well, the MCID for mortality should be the same across the board. It is not (Figure 1 again). It ranges from 3%-20% absolute reduction in mortality. Moreover, in Figure 1, note the clustering around numbers like 10% - how many fingers or toes you have should not determine the effect size you seek to find when you design an RCT.
We surely hope that this article stimulates some debate in the critical care community about the statistical design of RCTs, and indeed in what primary endpoints are chosen for them. It seems that chasing mortality is looking more and more like a fool's erand.
Caption for Figure 1:
Figure 1. Plot of observed versus predicted delta (with associated 95% confidence intervals for observed delta) of 38 trials included in the analysis. Point estimates of treatment effect (deltas) are represented by green circles for non-statistically significant trials and red triangles for statistically significant trials. Numbers within the circles and triangles refer to the trials as referenced in Additional file 1. The blue ‘unity line’ with a slope equal to one indicates perfect concordance between observed and predicted delta; for visual clarity and to reduce distortions, the slope is reduced to zero (and the x-axis is horizontally expanded) where multiple predicted deltas have the same value and where 95% confidence intervals cross the unity line. If predictions of delta were accurate and without bias, values of observed delta would be symmetrically scattered above and below the unity line. If there is directional bias, values will fall predominately on one side of the line as they do in the figure.