Wednesday, December 16, 2009

Dabigatran and Dabigscam of non-inferiority trials, pre-specified margins of non-inferiority, and relative risks

Anyone who thought, based on the evidence outlined in the last post on this blog, that dabigatran was going to be a "superior" replacement for warfarin was chagrinned last week with the publication in the NEJM of the RE-COVER study of dabigatran versus warfarin in the treatment of venous thromboembolism (VTE): . Dabigatran for this indication is NOT superior to warfarin, but may be non-inferior to warfarin, if we are willing to accept the assumptions of the non-inferiority hypothesis in the RE-COVER trial.

Before we go on, I ask you to engage in a mental exercise of sorts that I'm trying to make a habit. (If you have already read the article and recall the design and the results, you will be biased, but go ahead anyway this time.) First, ask yourself what increase in an absolute risk of recurrent DVT/PE/death is so small that you consider it negligible for the practical purposes of clinical management. That is, what difference between two drugs is so small as to be pragmatically irrelevant? Next, ask yourself what RELATIVE increase in risk is negligible? (I'm purposefully not suggesting percentages and relative risks as examples here in order to avoid the pitfalls of "anchoring and adjustment": .) Finally, assume that the baseline risk of VTE at 6 months is ~2% - with this "baseline" risk, ask yourself what absolute and relative increases above this risk are, for practical purposes, negligible. Do these latter numbers jibe with your answers to the first two questions which were answered when you had no particular baseline in mind?

Note how it is difficult to reconcile your "intuitive" instincts about what is a negligible relative and absolute risk with how these numbers might vary depending upon what the baseline risk is. Personally, I think about a 3% absolute increase in the risk of DVT at 6 months to be on the precipice of what is clinically significant. But if the baseline risk is 2%, a 3% absolute increase (to 5%) represents a 2.5x increase in risk! That's a 150% increase, folks! Imagine telling a patient that the use of drug ABC instead of XYZ "only doubles your risk of another clot or death". You can visualize the bewildered faces and incredulous, furrowed brows. But if you say, "the difference between ABC and XYZ is only 3%, and drug ABC costs pennies but XYZ is quite expensive, " that creates quite a different subjective impression of the same numbers. Of course, if the baseline risk were 10%, a 3% increase is only a 30% or 1.3x increase in risk. Conversely, with a baseline risk of 10%, a 2.5x increase in risk (RR=2.5) means a 15% absolute increase in the risk of DVT/PE/Death, and hardly ANYONE would argue that THAT is negligible. We know that doctors and laypeople respond better to, or are more impressed by, results that are described as RRR than ARR, ostensibly because the former inflates the risk because the number appears bigger (e-mail me if you want a reference for this). The bottom line is that what matters is the absolute risk. We're banking health dollars. We want the most dollars at the end of the day, not the largest increase over some [arbitrary] baseline. So I'm not sure why we're still designing studies with power calculations that utilize relative risks.

With this in mind, let's check the assumptions of the design of this non-inferiority trial (NIT). It was designed with 90% power to exclude a hazard ratio (HR; similar to a relative risk for our purposes) of 2.75. That HR of 2.75 sure SOUNDS like a lot. But with a predicted baseline risk of 2% (which prediction panned out in the trial - the baseline risk with warfarin was 2.1%), that amounts to only 5.78, or an increase of 3.78%, which I will admit is close to my own a priori negligibility level of 3%. The authors justify this assignment based on 4 referenced studies all prior to 1996. I find this curious. Because they are so dated and in a rather obscure journal, I have access only to the 1995 NEJM study ( ). In this 1995 study, the statistical design is basically not even described, and there were 3 primary endpoints (ahhh, the 1990s). This is not exactly the kind of study that I want to model a modern trial after. In the table below, I have abstracted data from the 1995 trial and three more modern ones (al lcomparing two treatment regimens for DVT/PE) to determine both the absolute risk and relative risks that were observed in these trials.

Table 1. Risk reductions in several RCTs comparing treatment regimens for DVT/PE. Outcomes are the combination of recurrent DVT/PE/Death unless otherwise specified. *recurrent DVT/PE only; raw numbers used for simplicity in lieu of time to event analysis used by the authors

From this table we can see that in SUCCESSFUL trials of therapies for DVT/PE treatment, absolute risk reductions in the range of 5-10% have been demonstrated, with associated relative risk increases of ~1.75-2.75 (for placebo versus comparator - I purposefully made the ratio in this direction to make it more applicable to the dabigatran trial's null hypothesis [NH] that the 95% CI for dabigatran includes 2.75 HR - note that the NH in an NIT is the enantiomer of the NH in a superiority trial). Now, from here we must make two assumptions, one which I think is justified and the other which I think is not. The first is that the demonstrated risk differences in this table are clinically significant. I am inclined to say "yes, they are" not only because a 5-10% absolute difference just intuitively strikes me as clinically relevant compared to other therapies that I use regularly, but also because, in the cases of the 2003 studies, these trials were generally counted as successes for the comparator therapies. The second assumption we must make, if we are to take the dabigatran authors seriously, is that differences smaller than 5-10% (say 4% or less) are clinically negligible. I would not be so quick to make this latter assumption, particularly in the case of an outcome that includes death. Note also that the study referenced by the authors (reference 13 - the Schulman 1995 trial) was considered a success with a relative risk of 1.73, and that the 95% CI for the main outcome of the RE-COVER study ranged from 0.65-1.84 - it overlaps the Schulman point estimate of RR of 1.73, and the Lee point estimate of 1.83! Based on an analysis using relative numbers, I am not willing to accept the pre-specified margin of non-inferiority upon which this study was based/designed.

But, as I said earlier, relative differences are not nearly as important to us as absolute differences. If we take the upper bound of the HR in the RE-COVER trial (1.84) and multiply it by the baseline risk (2.1) we get an upper 95% CI for the risk of the outcome of 3.86, which corresponds to an absolute risk difference of 1.76. This is quite low, and personally it satisfies my requirement for very small differences between two therapies if I am to call them non-inferior to one another.

So, we have yet again a NIT which was designed upon precarious and perhaps untenable assumptions, but which, through luck or fate was nonetheless a success. I am beginning to think that this dabigatran drug has some merit, and I wager that it will be approved. But this does not change the fact that this and previous trials were designed in such a way as to allow a defeat of warfarin to be declared based on much more tenuous numbers.

I think a summary of sorts for good NIT design is in order:

• The pre-specified margin of non-inferiority should be smaller than the MCID (minimal clinically important difference), if there is an accepted MCID for the condition under study

• The pre-specified margin of non-inferiority should be MUCH smaller than statistically significant differences found in "successful" superiority trials, and ideally, the 95% CI in the NIT should NOT overlap with point estimates of significant differences in superiority trials

• NITs should disallow "asymmetry" of conclusions - see the last post on dabigatran. If the pre-specified margin of non-inferiority is a relative risk of 2.0 and the observed 95% CI must not include that value to claim non-inferiority, then superiority cannot be declared unless the 95% confidence interval of the point estimate does not cross -2.0. What did you say? That's impossible, it would require a HUGE risk difference and a narrow CI for that to ever happen? Well, that's why you can't make your delta unrealistically large - you'll NEVER claim superiority, if you're being fair about things. If you make delta very large it's easier to claim non-inferiority, but you should also suffer the consequences by basically never being able to claim superiority either.

• We should concern ourselves with Absolute rather than Relative risk reductions

1 comment:

  1. This comment has been removed by a blog administrator.