Monday, March 10, 2008

The CORTICUS Trial: Power, Priors, Effect Size, and Regression to the Mean

The long-awaited results of another trial in critical care were published in a recent NEJM: ( Similar to the VASST trial, the CORTICUS trial was "negative" and low dose hydrocortisone was not demonstrated to be of benefit in septic shock. However, unlike VASST, in this case the results are in conflict with an earlier trial (Annane et al, JAMA, 2002) that generated much fanfare and which, like the Van den Berghe trial of the Leuven Insulin Protocol, led to widespread [and premature?] adoption of a new therapy. The CORTICUS trial, like VASST, raises some interesting questions about the design and interpretation of trials in which short-term mortality is the primary endpoint.

Jean Louis Vincent presented data at this year's SCCM conference with which he estimated that only about 10% of trials in critical care are "positive" in the traditional sense. (I was not present, so this is basically hearsay to me - if anyone has a reference, please e-mail me or post it as a comment.) Nonetheless, this estimate rings true. Few are the trials that show a statistically significant benefit in the primary outcome, fewer still are trials that confirm the results of those trials. This begs the question: are critical care trials chronically, consistently, and woefully underpowered? And if so, why? I will offer some speculative answers to these and other questions below.

The CORTICUS trial, like VASST, was powered to detect a 10% absolute reduction in mortality. Is this reasonable? At all? What is the precedent for a 10% ARR in mortality in a critical care trial? There are few, if any. No large, well-conducted trials in critical care that I am aware of have ever demonstrated (least of all consistently) a 10% or greater reduction in mortality of any therapy, at least not as a PRIMARY PROSPECTIVE OUTCOME. Low tidal volume ventilation? 9% ARR. Drotrecogin-alfa? 7% ARR in all-comers. So I therefore argue that all trials powered to detect an ARR in mortality of greater than 7-9% are ridiculously optimistic, and that the trials that spring from this unfortunate optimism are woefully underpowered. It is no wonder that, as JLV purportedly demonstrated, so few trials in critical care are "positive". The prior probability is is exceedingly low that ANY therapy will deliver a 10% mortality reduction. The designers of these trials are, by force of pragmatic constraints, rolling the proverbial trial dice and hoping for a lucky throw.

Then there is the issue of regression to the mean. Suppose that the alternative hypothesis (Ha) is indeed correct in the generic sense that hydrocortisone does beneficially influence mortality in septic shock. Suppose further that we interpret Annane's 2002 data as consistent with Ha. In that study, a subgroup of patients (non-responders) demonstrated a 10% ARR in mortality. We should be excused for getting excited about this result, because after all, we all want the best for our patients and eagerly await the next breaktrough, and the higher the ARR, the greater the clinical relevance, whatever the level of statistical significance. But shouldn't we regard that estimate with skepticism since no therapy in critical care has ever shown such a large reduction in mortality as a primary outcome? Since no such result has ever been consistently repeated? Even if we believe in Ha, shouldn't we also believe that the 10% Annane estimate will regress to the mean on repeated trials?

It may be true that therapies with robust data behind them become standard practice, equipoise dissapates, and the trials of the best therapies are not repeated - so they don't have a chance to be confirmed. But the knife cuts both ways - if you're repeating a trial, it stands to reason that the data in support of the therapy are not that robust and you should become more circumspect in your estimates of effect size - taking prior probability and regression to the mean into account.

Perhaps we need to rethink how we're powering these trials. And funding agencies need to rethink the budgets they will allow for them. It makes little sense to spend so much time, money, and effort on underpowered trials, and to establish the track record that we have established where the majority of our trials are "failures" in the traditional sence and which all include a sentence in the discussion section about how the current results should influence the design of subsequent trials. Wouldn't it make more sense to conduct one trial that is so robust that nobody would dare repeat it in the future? One that would provide a definitive answer to the quesiton that is posed? Is there something to be learned from the long arc of the steroid pendulum that has been swinging with frustrating periodicity for many a decade now?

This is not to denigrate in any way the quality of the trials that I have referred to. The Canadian group in particular as well as other groups (ARDSnet) are to be commended for producing work of the highest quality which is of great value to patients, medicine, and science. But in keeping with the advancement of knowledge, I propose that we take home another message from these trials - we may be chronically underpowering them.


  1. Some background on CORTICUS...this study was underpowered because they ran out of money and were having a hard time recruiting patients because clinicians were giving steroids to their patients. Amazingly, the NEJM did not require the usual CONSORT figure of patients screened but not enrolled in the study. This suggests a couple of things. 1. MDs think steroids work. 2. The enrolled subjects were likely dominated by those for whom their MDs had some clinical equipoise regarding the benefit of steroids. Therefore, this may be a different population than the septic patients we see every day. Steroids benefit and suffer from their availability. Because there is no need to seek FDA approval for their use, they can be used regardless of the evidence. This is why there was a national shortage of hydrocortisone following Annane's initial study. However, without an industry interest, it is hard to fund a large, well-designed study. We get these murky results that make it hard to interpret. Part of the problem with steroids IMHO stems from the tie to the replacement of endogenous cortisol - so-called relative adrenal insufficiency. There are protean problem with this. First, we have intensivists believing they are endocrinologists - not a good sign. We know that the free cortisol fraction is the biologically active fraction - yet we measure total cortisol in the setting of extremely deranged serum protein metabolism. Also, we know that steroid resistance occurs with the inflammation of sepsis. Therefore, truly adrenally insufficient patients are those who have signs of inadequate steroid effects on cells - something not easily measured. There are also many assays in clinical labs for cortisol. The referral of all samples in CORTICUS to a central lab showed clearly that local tests are unreliable. 25% of subjects were misclassified regarding their response in total cortisol to the supraphysiologic dose of ACTH. Finally, it is impractical to think that we would get an ACTH stim test back in time to make a decision about steroids or not. There are not adequate data supporting the safety and/or efficacy of the usual practice of giving dexamethasone (which affects cortisol synthesis in response to ACTH) while and ACTH stim test is performed and stopping steroids if response is adequate. If there were resources to do a large enough study, I would suggest three arms of treatment of septic shock. Placebo, 200 mg/d hydrocortisone and 100 mg/d hydrocortisone. I would not do ACTH stim tests. I would not act like this is an effort to normalize cortisol levels (since 200 mg/d of hydrocortisone far exceeds this level). I suspect this would require something in the order of 2000 patients. At this point, most of us are sick of steroids. They make our brain hurt thinking about the lousy data we have. I doubt such a study will ever occur. I think my practice will be to give steroids to those who I think have impressive septic shock and to give it early. My first choice of therapy will remain rhAPC and steroids would be a second choice. Don't know if this is right or not. I do think we need to be more considerate about the use of etomidate. There are more options than benzo + opioid. Perhaps we should learn more from our ED and anesthesia colleagues and be quicker to consider RSI and paralytics. - JMO

  2. JMO
    You should have taken the lead on this one.

  3. Dr Aberegg,
    You stated "...wouldn't it make sense to conduct 1 trial that is so robust that nobody dare repeat it in the future..." but isn't that the idea behind research--if a particular study can be repeated (assuming identical or near identical outcome) then the results are considered propitious for healthcare, which is a good thing.
    Or am I way off base on this and missing the point entirely?


  4. E.A. -

    Your commemt brings up some interesting points.

    Firstly, it depends on whether you think a RCT is to benefit science, patients, society, or some combination.

    Because a "negative" trial is often a harbinger of doom for a therapy, and because even therapies that have met the burden of proof are taken up slowly if at all (average lag stated to be 17 years from proof of efficacy to widespread adoption), I would say that a small negative trial with the trends in the right direction, while advancing science, does little to advance patient care.

    And I wonder what patient enrollees in the trial would think if they knew that perhpas the majority of CC trials are underpowered and therefore negative? I dont' think that would be a good slogan for recruitment.

    I understand the need for "replicability" for scientific validity. However, is it not appropriate to think of a 50 center trial with 5000 patients equivalent to 10 5-center trials with 500 patients each? Would not the external validity be about the same?

    More to the point may be that we appear to have blindly followed the mantra in critical care research that states that the only legitimate and valid outcome measure is mortality. I think that in the future we will come to see this as a sophomoric perspective. Perhaps we should go the way of the cardiologists and use combined endpoints that include mortality, as well as days off pressors, days off the vent, freedom from complications. Such endpoints, I think, require greater consideration than they have heretofore been given in terms of their composition, something that I have stated on an early post on this blog. In other words, you should not get the same weight for a vent-free day as you do a survival. The problem with this is that people have not yet embraced an economic perspective when designing and interpreting RCTs, (one in which expected utility theory would be the standard for evaluating the evidnece) and this probably stems from the inherent difficulties in comparing the value of say, a death, to the value of an episode of bleeding. That would force people to consider how many times they would have to bleed before they would rather be dead. There may not ever be consensus on such an issue, but at least considering it may provide us insights into our own values that until now are implict rather than explicit.

    The issues are complex. But given a paradigm for critical care research that focuses almost blindly on mortality as the only valid endpoint and that far more often than not yields null results that are of little use to patients or practitioners and generate more confusion than they resolve, I would be in favor of a re-examination of these complex issues.