Medical Evidence Blog: December 2009

Last month JAMA published another article that underscores the need for circumspection when, as by routine, habit, or tradition, we apply the results of laboratory experiments and pathophysiological reasoning to the treatment of intact persons. Olasveengen et al (http://jama.ama-assn.org/cgi/content/abstract/302/20/2222 ) report the results of a Norwegian trial in which people with Out of Hospital (OOH) cardiac arrest were randomized to receive or not receive intravenous medication during resuscitation attempts.

It's not as heretical as it sounds. In 2000, the NEJM reported the results of a Seattle study by Hallstrom et al (http://content.nejm.org/cgi/content/abstract/342/21/1546 ) showing that CPR appears to be as effective (and indeed perhaps more effective) when mouth-to-mouth ventilation is NOT performed along with chest compressions by bystanders. Other evidence with a similar message has since accumulated. With resuscitation, more effort, more intervention does not necessarily lead to better results. The normalization heuristic fails us again.

Several things can be learnt from the recent Norwegian trial. First, recall that RCTs are treasure troves of epidemiological data. The data from this trial reinforce what we practitioners already know, but which is not well-known among uninitiated laypersons: the survival of OOH cardiac arrest is dismal, on the order of 10% or so.

Next, looking at Table 2 of the outcomes data, we note that while survival to hospital discharge, the primary outcome, seems to be no different between the drug and no-drug groups, there are what appear to be important trends in favor of drug - there is more Return of Spontaneous Circulation (ROSC), there are more admissions to the ICU, there are more folks discharged with good neurological function. This is reminescent of a series of studies in the 1990s (e.g., http://content.nejm.org/cgi/content/abstract/339/22/1595 ) showing that high dose epinephrine, while improving ROSC, did not lead to improved survival. Ultimately, the usefulness of any of these interventions hinges on what your goals are. If your goal is survival with good neurological function, epinephrine in any dose may not be that useful. But if your goal is ROSC, you might prefer to give a bunch of it. I'll leave it to you to determine what your goals are, and whether, on balance, you think they're worthy goals.

There are two other good lessons from this article. In this study, the survival rate in the drug group was 10.5% and that in the no-drug group was 9.2%, for a difference of 1.3% and this small difference was not statistically significant. Does that mean there's no difference? No, it does not, not necessarily. There might be a difference that this study failed to detect because of a Type II error. (The study was designed with 91% power, so there's a 9% chance that a true difference will be missed, and the chances are even greater since the a priori sample size was not achieved.) If you follow this blog, you know that if the study is negative, we need to look at the 95% confidence interval (CI) around the difference to see if it might include clinically meaningful values. The 95% CI for this difference (not reported by the authors, but calculated by me using Stata) was -5.2% to +2.8%. That is, no drug might be up to about 5% worse or up to about 3% better than drug. Would you stop giving Epi for resuscitation on the basis of this study? Is the CI narrow enough for you? Is a 5% decrease in survival with no drug negligible? I'll leave that for you to decide.

(I should not gloss over the alternative possibility which is that the results are also compatible with no-drug being 2.8% better than drug. But if you're playing the odds, methinks you are best off betting the other way, given table 2.)

Now, as an extension of the last blog post, let's look at the relative numbers. The 95% CI for the relative risk (RR) is 0.59 - 1.33. That means that survival might be reduced by as much as 41% with no drug! That sounds like a LOT doesn't it? This is why I consistently argue that relative numbers be avoided in appraising the evidence. RRs give unfair advantages to therapies targeting diseases with survivals closer to 0%. There is no rational reason for such an advantage. A 1% chance of dying is a 1% chance of dying no matter where it falls along the continuum from zilch to unity.

Lessons from this article: beware of pathophysiological reasoning, and translation from the hampster and molecule labs; determine the goals of your therapy and whether they are worthy goals; absence of evidence is not evidence of absence; look at CIs for the difference between therapies in "negative" trials and see if they include clinically meaningful values; and finally, beware of inflation of perceived benefit caused by presentation of relative risks rather than absolute risks.

Anyone who thought, based on the evidence outlined in the last post on this blog, that dabigatran was going to be a "superior" replacement for warfarin was chagrinned last week with the publication in the NEJM of the RE-COVER study of dabigatran versus warfarin in the treatment of venous thromboembolism (VTE): http://content.nejm.org/cgi/content/abstract/361/24/2342 . Dabigatran for this indication is NOT superior to warfarin, but may be non-inferior to warfarin, if we are willing to accept the assumptions of the non-inferiority hypothesis in the RE-COVER trial.

Before we go on, I ask you to engage in a mental exercise of sorts that I'm trying to make a habit. (If you have already read the article and recall the design and the results, you will be biased, but go ahead anyway this time.) First, ask yourself what increase in an absolute risk of recurrent DVT/PE/death is so small that you consider it negligible for the practical purposes of clinical management. That is, what difference between two drugs is so small as to be pragmatically irrelevant? Next, ask yourself what RELATIVE increase in risk is negligible? (I'm purposefully not suggesting percentages and relative risks as examples here in order to avoid the pitfalls of "anchoring and adjustment": http://en.wikipedia.org/wiki/Anchoring .) Finally, assume that the baseline risk of VTE at 6 months is ~2% - with this "baseline" risk, ask yourself what absolute and relative increases above this risk are, for practical purposes, negligible. Do these latter numbers jibe with your answers to the first two questions which were answered when you had no particular baseline in mind?

Note how it is difficult to reconcile your "intuitive" instincts about what is a negligible relative and absolute risk with how these numbers might vary depending upon what the baseline risk is. Personally, I think about a 3% absolute increase in the risk of DVT at 6 months to be on the precipice of what is clinically significant. But if the baseline risk is 2%, a 3% absolute increase (to 5%) represents a 2.5x increase in risk! That's a 150% increase, folks! Imagine telling a patient that the use of drug ABC instead of XYZ "only doubles your risk of another clot or death". You can visualize the bewildered faces and incredulous, furrowed brows. But if you say, "the difference between ABC and XYZ is only 3%, and drug ABC costs pennies but XYZ is quite expensive, " that creates quite a different subjective impression of the same numbers. Of course, if the baseline risk were 10%, a 3% increase is only a 30% or 1.3x increase in risk. Conversely, with a baseline risk of 10%, a 2.5x increase in risk (RR=2.5) means a 15% absolute increase in the risk of DVT/PE/Death, and hardly ANYONE would argue that THAT is negligible. We know that doctors and laypeople respond better to, or are more impressed by, results that are described as RRR than ARR, ostensibly because the former inflates the risk because the number appears bigger (e-mail me if you want a reference for this). The bottom line is that what matters is the absolute risk. We're banking health dollars. We want the most dollars at the end of the day, not the largest increase over some [arbitrary] baseline. So I'm not sure why we're still designing studies with power calculations that utilize relative risks.

With this in mind, let's check the assumptions of the design of this non-inferiority trial (NIT). It was designed with 90% power to exclude a hazard ratio (HR; similar to a relative risk for our purposes) of 2.75. That HR of 2.75 sure SOUNDS like a lot. But with a predicted baseline risk of 2% (which prediction panned out in the trial - the baseline risk with warfarin was 2.1%), that amounts to only 5.78, or an increase of 3.78%, which I will admit is close to my own a priori negligibility level of 3%. The authors justify this assignment based on 4 referenced studies all prior to 1996. I find this curious. Because they are so dated and in a rather obscure journal, I have access only to the 1995 NEJM study (http://content.nejm.org/cgi/reprint/332/25/1661.pdf ). In this 1995 study, the statistical design is basically not even described, and there were 3 primary endpoints (ahhh, the 1990s). This is not exactly the kind of study that I want to model a modern trial after. In the table below, I have abstracted data from the 1995 trial and three more modern ones (al lcomparing two treatment regimens for DVT/PE) to determine both the absolute risk and relative risks that were observed in these trials.

Table 1. Risk reductions in several RCTs comparing treatment regimens for DVT/PE. Outcomes are the combination of recurrent DVT/PE/Death unless otherwise specified. *recurrent DVT/PE only; raw numbers used for simplicity in lieu of time to event analysis used by the authors

From this table we can see that in SUCCESSFUL trials of therapies for DVT/PE treatment, absolute risk reductions in the range of 5-10% have been demonstrated, with associated relative risk increases of ~1.75-2.75 (for placebo versus comparator - I purposefully made the ratio in this direction to make it more applicable to the dabigatran trial's null hypothesis [NH] that the 95% CI for dabigatran includes 2.75 HR - note that the NH in an NIT is the enantiomer of the NH in a superiority trial). Now, from here we must make two assumptions, one which I think is justified and the other which I think is not. The first is that the demonstrated risk differences in this table are clinically significant. I am inclined to say "yes, they are" not only because a 5-10% absolute difference just intuitively strikes me as clinically relevant compared to other therapies that I use regularly, but also because, in the cases of the 2003 studies, these trials were generally counted as successes for the comparator therapies. The second assumption we must make, if we are to take the dabigatran authors seriously, is that differences smaller than 5-10% (say 4% or less) are clinically negligible. I would not be so quick to make this latter assumption, particularly in the case of an outcome that includes death. Note also that the study referenced by the authors (reference 13 - the Schulman 1995 trial) was considered a success with a relative risk of 1.73, and that the 95% CI for the main outcome of the RE-COVER study ranged from 0.65-1.84 - it overlaps the Schulman point estimate of RR of 1.73, and the Lee point estimate of 1.83! Based on an analysis using relative numbers, I am not willing to accept the pre-specified margin of non-inferiority upon which this study was based/designed.

But, as I said earlier, relative differences are not nearly as important to us as absolute differences. If we take the upper bound of the HR in the RE-COVER trial (1.84) and multiply it by the baseline risk (2.1) we get an upper 95% CI for the risk of the outcome of 3.86, which corresponds to an absolute risk difference of 1.76. This is quite low, and personally it satisfies my requirement for very small differences between two therapies if I am to call them non-inferior to one another.

So, we have yet again a NIT which was designed upon precarious and perhaps untenable assumptions, but which, through luck or fate was nonetheless a success. I am beginning to think that this dabigatran drug has some merit, and I wager that it will be approved. But this does not change the fact that this and previous trials were designed in such a way as to allow a defeat of warfarin to be declared based on much more tenuous numbers.

I think a summary of sorts for good NIT design is in order:

• The pre-specified margin of non-inferiority should be smaller than the MCID (minimal clinically important difference), if there is an accepted MCID for the condition under study

• The pre-specified margin of non-inferiority should be MUCH smaller than statistically significant differences found in "successful" superiority trials, and ideally, the 95% CI in the NIT should NOT overlap with point estimates of significant differences in superiority trials

• NITs should disallow "asymmetry" of conclusions - see the last post on dabigatran. If the pre-specified margin of non-inferiority is a relative risk of 2.0 and the observed 95% CI must not include that value to claim non-inferiority, then superiority cannot be declared unless the 95% confidence interval of the point estimate does not cross -2.0. What did you say? That's impossible, it would require a HUGE risk difference and a narrow CI for that to ever happen? Well, that's why you can't make your delta unrealistically large - you'll NEVER claim superiority, if you're being fair about things. If you make delta very large it's easier to claim non-inferiority, but you should also suffer the consequences by basically never being able to claim superiority either.

• We should concern ourselves with Absolute rather than Relative risk reductions

Medical Evidence Blog

Tuesday, December 29, 2009

How much Epi should we give, if we give Epi at all?

Wednesday, December 16, 2009

Dabigatran and Dabigscam of non-inferiority trials, pre-specified margins of non-inferiority, and relative risks