Tuesday, March 23, 2010

"Prospective Meta-analysis" makes as much sense as "Retrospective Randomized Controlled Trial"

A recent article in JAMA ( http://jama.ama-assn.org/cgi/content/abstract/303/9/865) reports a meta-analysis of (three) trials comparing a strategy of higher versus lower PEEP (positive end-expiratory pressure) in Acute Lung Injury (ALI – a less severe form of lung injury) and Acute Respiratory Distress Syndrome (ARDS – a more severe form, at least as measured by oxygenation, one facet of its effects on physiology). The results of this impeccably conducted analysis are interesting enough (High PEEP is beneficial in a pre-specified subgroup analysis of ARDS patients, but may be harmful in the subgroup with less severe ALI), but I am more struck by the discussion as it pertains to the future of trials in critical care medicine – a discussion which was echoed by the editorialist (http://jama.ama-assn.org/cgi/content/extract/303/9/883 ).

The trials included in this meta-analysis lacked statistical precision for two principal reasons: 1.) they used the typical cookbook approach to sample size determination, choosing a delta of 10% without any justification whatever for this number (thus the studies were guilty of DELTA INFLATION); 2.) according to the authors of the meta-analysis , two of the three trials were stopped early for futility, thus further decreasing the statistical precision of already effectively underpowered trials. The resulting 95% CIs for delta in these trials thus ranged from (-)10% (in the ARDSnet ALVEOLI trial; i.e., high PEEP may increase mortality by up to 10%) to +10% (in the Mercat and Meade trials; i.e., high(er) PEEP may decrease mortality by upwards of 10%).

Because of the lack of statistical precision of these trials, the authors of the meta-analysis appropriately used individual patient data from the trials as meta-analytical fodder, with a likely useful result – high PEEP is probably best reserved for the sickest patients with ARDS, and avoided for those with ALI. (Why there is an interaction between severity of lung injury and response to PEEP is open for speculation, and is an interesting topic in itself.) What interests me more than this main result is the authors' and editorialist's suggestion that we should be doing “prospective meta-analyses” or at least design our trials so that they easily lend themselves to this application should we later decide to do so. Which begs the question: why not just make a bigger trial from the outset, choosing a realistic delta and disallowing early stopping for “futility”?

(It is useful to note that the term futility is unhappily married to or better yet, enslaved by, alpha (the threshold P-value for statistical significance). A trial is deemed futile if there is no hope of crossing the alpha/p-value threshold. But it is certainly not futile to continue enrolling patients if each additional accrual increases the statistical precision of the final result, by narrowing the 95% CI of delta. Indeed, I’m beginning to think that the whole concept of “futility” is a specious one - unless you're a funding agency.)

Large trials may be cumbersome, but they are not impossible. The SAFE investigators (http://content.nejm.org/cgi/content/abstract/350/22/2247 ) enrolled ~7000 patients seeking a delta of 3% in a trial involving 16 ICUs in two countries. Moreover, a prospective meta-analysis doesn’t reduce the number of patients required, it simply splits the population into quanta and epochs, which will hinder homogeneity in the meta-analysis if enrollment and protocols are not standardized or if temporal trends in care and outcomes come into play. If enrollment and protocols ARE standardized, it is useful to ask “then why not just do one large study from the outset?” using a realistic delta and sample size? Why not coordinate all the data (American, French, Canadian, whatever) through a prospective RCT rather than a prospective meta-analysis?

Here’s my biggest gripe with the prospective meta-analysis – in essence, you are taking multiple looks at the data, one look after each trial is completed (I’m not even counting intra-trial interim analyses), but you’re not correcting for the multiple comparisons. And most likely, once there is a substantial positive trial, it will not be repeated, for a number of reasons such as all the hand-waving about it not being ethical to repeat it and randomize people to no treatment, (one of the cardinal features of science being repeatability notwithstanding). Think ARMA (http://content.nejm.org/cgi/content/extract/343/11/812 ) . There were smaller trials leading up to it, but once ARMA was positive, no additional noteworthy trials sought to test low tidal volume ventilation for ARDS. So, if we’re going to stop conducting trials for our “prospective meta-analysis”, what will our early stopping rule be? When will we stop our sequence of trials? Will we require a P-value of 0.001 or less after the first look at the data (that is, after the first trial is completed)? Doubtful. As soon as a significant result is found in a soundly designed trial, further earnest trials of the therapy will cease and victory will be declared. Only when there is a failure or a “near-miss” will we want a “do-over” to create more fodder for our “prospective meta-analysis”. We will keep chasing the result we seek until we find it, nitpicking design and enrollment details of “failed” trials along the way to justify the continued search for the “real” result with a bigger and better trial.

If we’re going to go to the trouble of coordinating a prospective meta-analysis, I don’t understand why we wouldn’t just coordinate an adequately powered RCT based on a realistic delta (itself based on an MCID or preliminary data), and carry it to its pre-specified enrollment endpoint, “futility stopping rules” be damned. With the statistical precision that would result, we could examine the 95% CI of the resulting delta to answer the practical questions that clinicians want answers for, even if our P-value were insufficient to satisfy the staunchest of statisticians. Perhaps the best thing about such a study is that the force of its statistical precision would incapacitate single center trialists, delta inflationists, and meta-analysts alike.

1 comment:

  1. "As soon as a significant result is found in a soundly designed trial, further earnest trials of the therapy will cease and victory will be declared."
    ...unless you are a novel, proprietary agent which lacks immediate positive reinforcement of use for the clinician. In that case, you keep chipping away trying to show it doesn't work - with studies ended prematurely for futility.