Sunday, March 9, 2008

The "Trials" and Tribulations of Powering Clinical Trials: The Case of Vasopressin for Septic Shock (VASST trial)

Nobody likes "negative" trials. They're just not as exciting as positive ones. (Unless they show that something we're doing is harmful or that a product that Wall Street has bet heavily on is headed for the chopping block.) But "negative" studies such as an excellent one by Russell et al in a recent NEJM (http://content.nejm.org/cgi/content/abstract/358/9/877 ) show just how difficult it is to design and conduct a "positive" trial. The [non-significant] trends in this study, namely that vasopressin is superior to norepinephrine in reducing mortality in septic shock, were demonstrated in a study that had an a priori power of 80%, based on an expected mortality rate of 60% in the placebo group. Actual power in the study was significantly less, not because, as the authors appear to suggest, the observed placebo mortality was only ~39%, but rather because the observed effect size fell markedly short of the anticipated 10% absolute mortality reduction. In order to demonstrate a mortality benefit of the magnitude observed in the current trial (~4% ARR) at a significance level of 0.05, approximately 1500 patients in each study arm would be required. This is a formidable number for a critical care trial.

Thus, this trial illustrates the trials and tribulations of designing and conducting studies with 28-day mortality as an endpoint. These studies not only entail substantial costs, but pose challenges for patient recruitment, necessitating the participation of numerous centers in a multinational setting. The coordination of such a trial is daunting. It is understandable, therefore, that investigators may wish to be optimistic about the ARR they can expect from a therapy, as this will reduce sample size and increase the chances that the trial will be successfully completed in a resonable period of time. (For an example of a study which had to be terminated early because of these challenges, see Mancebo et al : http://ajrccm.atsjournals.org/cgi/content/short/200503-353OCv1 ). Powering the trial at 80% instead of 90% likewise represents a compromise between optimism for the efficacy of the therapy and optimism for patient recruitment. In essence, the lower the power, the more "faith" there must be that a roll of the trial dice will confirm the alternative hypothesis.

These realities played out [dissappointingly] in the Russell trial. The p-value for the ARR (28-day mortality - the primary endpoint) associated with vasopressin compared with placebo was 0.26, while that associated with 90-day mortality (a prespecified secondary endpoint) was 0.11. Thus, this trial is considered negative by conventional standards.

But its being "negative" does not mean that it is not of value to practitioners. This large experience with vasopressin demonstrates both that this agent is a viable alternative to norepinephrine in regards to raising the MAP to within the goal range, as also that we can expect that there will not be a significant excess of adverse events when this agent is used. In my opinion, this study represents a veritable "green light" for continued use of this agent, as I agree with the editorialist (http://content.nejm.org/cgi/reprint/358/9/954.pdf ) that many patients with sepsis who are not responding to norepinephrine respond dramatically and favorably to this agent.

Perhaps there is a larger lesson here. Should we use the same p-value threshold for a study of, say, an antidepressant as we do for a study of an agent that may reduce mortality? In the former case, we may be most concerned about exposure of patients to a costly drug with no benefits and potential side effects - in essence, we are most concerned with a Type I Error, i.e., concluding that there is a benefit when in reality there is none. Perhaps in a trial of a potentially life-saving therapy (e.g., vasopressin) we should be most concerned with a Type II Error, i.e., concluding that there is no real benefit when in reality one exists. If that were the case, and you may have already guessed that I believe that it should be, we could address this concern by loosening the standard of statistical significance for a study of potentially life-saving agents.

The standards notwithstanding, critical care practitioners are free to interpret these data as they see fit. And one reasonable conclusion is that, the trends being in the right direction and the side effect profile being acceptable, we should be using more vasopressin in septic shock.

Or, we must make a tough call: do we want to invest the resources in a much larger trail to determine if vasopressin can be shown to reduce mortality at the conventional p-value level of 0.05? Can we recruit the necessary 3000 patients?

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.