Monday, September 21, 2009

The unreliable assymmetric design of the RE-LY trial of Dabigatran: Heads I win, tails you lose

I'm growing weary of this. I hope it stops. We can adapt the diagram of non-inferiority shenanigans from the Gefitinib trial (see ) to last week's trial of dabigatran, which came on the scene of the NEJM with another ridiculously designed non-inferiority trial (see ). Here we go again.

These jokers, lulled by the corporate siren song of Boehringer Ingelheim, had the utter unmitigated gall to declare a delta of 1.46 (relative risk) as the margin of non-inferiority! Unbelievable! To say that a 46% difference in the rate of stroke or arterial clot is clinically non-significant! Seriously!?

They justified this felonious choice on the basis of trials comparing warfarin to PLACEBO as analyzed in a 10-year-old meta-analysis. It is obvious (or should be to the sentient) that an ex-post difference between a therapy and placebo in superiority trials does not apply to non-inferiority trials of two active agents. Any ex-post finding could be simply fortuitously large and may have nothing to do with the MCID (minimal clinically important difference) that is SUPPOSED to guide the choice of delta in a non-inferiority trial (NIT). That warfarin SMOKED placebo in terms of stroke prevention does NOT mean that something that does not SMOKE warfarin is non-inferior to warfarin. This kind of duplicitious justification is surely not what the CONSORT authors had in mind when they recommended a referenced justification for delta.

That aside, on to the study and the figure. First, we're testing two doses, so there are multiple comparisons, but we'll let that slide for our purposes. Look at the point estimate and 95% CI for the 110 mg dose in the figure (let's bracket the fact that they used one-sided 97.5% CIs - it's immaterial to this discussion). There is a non-statistically significant difference between dabigatran and warfarin for this dose, with a P-value of 0.34. But note that in Table 2 of the article, they declare that the P-value for "non-inferiority" is <0.001 [I've never even seen this done before, and I will have to look to see if we can find a precedent for reporting a P-value for "non-inferiority"]. Well, apparently this just means that the RR point estimate for 110 mg versus warfarin is statistically significantly different from a RR of 1.46. It does NOT mean, but it is misleadingly suggested that the comparison between the two drugs on stroke and arterial clot is highly clinically significant, but it is not. This "P-value for non-inferiority" is just an artifical comparison: had we set the margin of non-inferiority at a [even more ridiculously "P-value for non-inferiority" as small as we like by just inflating the margin of non-inferiority! So this is a useless number, unless your goal is to create an artificial and exaggerated impression of the difference between these two agents.

Now let's look at the 150 mg dose. Indeed, it is statistically significantly different than warfarin (I shall resist using the term "superior" here), and thus the authors claim superiority. But here again, the 95% CI is narrower than the margin of non-inferiority, and had the results gone the other direction, as in Scenarios 3 and 4, (in favor of warfarin), we would have still claimed non-inferiority, even though warfarin would have been statistically significantly "better than" dabigatran! So it is unfair to claim superiority on the basis of a statistically significant result favoring dabigatran, but that's what they do. This is the problem that is likely to crop up when you make your margin of non-inferiority excessively wide, which you are wont to do if you wish to stack the deck in favor of your therapy.

But here's the real rub. Imagine if the world were the mirror image of what it is now and dabigatran were the existing agent for prevention of stroke in A-fib, and warfarin were the new kid on the block. If the makers of warfarin had designed this trial AND GOTTEN THE EXACT SAME DATA, they would have said (look at the left of the figure and the dashed red line there) that warfarin is non-inferior to the 110 mg dose of dabigatran, but that it was not non-inferior to the 150 mg dose of dabigatran. They would NOT have claimed that dabigatran was superior to warfarin, nor that warfarin was inferior to dabigatran, because the 95% CI of the difference between warfarin and dabigatran 150 mg crosses the pre-specified margin of non-inferiority. And to claim superiority of dabigatran, the 95% CI of the difference would have to fall all the way to the left of the dashed red line on the left. (See Piaggio, JAMA, 2006.)

The claims that result from a given dataset should not depend on who designs the trial, and which way the asymmetry of interpretation goes. But as long as we allow asymmetry in the interpretation of data, they shall. Heads they win, tails we lose.


  1. Was waiting for your comments on this. Thanks !

  2. I should say that this may actually end up being a good drug, and I bet the FDA approves it. I largely agree with the editorialist's comments.

    I just grow weary of this NIT nonsense, and love to pick apart these trials because they are unfairly designed and they're tainting the evidence base.

  3. This comment has been removed by a blog administrator.

  4. When I first read this I wondered why wouldn't they just do an equivalence trial (besides the fact that no one does them), but it sounds like your major beef is with their delta?

  5. I am not in total agreement here.

    First, I am glad that Dr. Aberegg has brought this up; it is an alarming issue. A review of 332 NITs found that in about one half of the trials claimed an unreasonably large delta (Lange et al).

    Yes, the margin is absurd. However, if you look at the figure of the CONSORT statement (Piaggio, et al), there is no mirrored-delta on the left side like Dr. Aberegg’s diagram shows here. As soon as both confidence intervals fall onto the left side of the blue line, you have superiority per Piaggio.

    The trial (after very easily claiming non-inferiority) also reported two-tailed p-values for superiority (with a cox regression to calculate relative risks, confidence intervals, and P values; and chi-squared for rates of discontinuation and adverse events). Hooray for them. Would they have done this if they were scraping the bottom of the margin on the right side? No way, and that's what sucks about the NITs

    If Boehringer Ingelheim was developing warfarin instead, warfarin would have had an inconclusive non-inferiority result (per Piaggio) against dabigatran 150 BID, and would have fallen on its face with any reported two-tailed p-values for superiority with the exception of MI.

    What I am trying to say is that you don’t need to beat your NIT margin for superiority.

    Conclusion: A questionable NIT margin with better-than-expected results. Clinically relevant? Most likely not at NNT=137 for net clinical benefit endpoint without any overall mortality benefit.

    Also of note: ximelgatran cited the same meta-analysis and picked a delta of 1.65!!!. At least the FDA commented on it in their final statistical review and evaluation document.

    Lange S, Freitag G. Choice of delta: requirements and reality—results of a systematic
    review. Biomed J. 2005;47:12-27.

    Piaggio G, Elbourne DR, Altman DG, Pocock SJ, Evans SJW; CONSORT Group. Reporting of noninferiority and equivalence randomized trials: an extension of the CONSORT statement. JAMA. 2006;295:1152-1160.

  6. danleeg -

    I am fond of your thoughtful comments.

    The Piaggio figure is for non-inferiority. If it were for equivalence, it would be two sided.

    So, I guess I disagree with Piaggio et al that we should allow ANY asymmetry.

    Asymmetry of any kind allows us to arrive at contradictory conclusions, depending on which way the trial is designed.

    I stand behind my analysis and my indignation at the "heads I win, tails you lose" situations that asymmetry allows.

  7. a final thought - I do not deny that dabigatran may indeed be superior to warfarin. I only begrudge the path that the investigators took to demonstrate it. They stacked the deck in their favor. Since clinical investigation is an iterative process, our allowing this bunkum over long runs of clinical trials runs the risk, perhaps not realized in this case, that superiority and particularly non-inferiority will be falsely claimed because of the unfair asymmetry.

    If you have a superior product, "well, good for you!".

    Just use a fair trial to prove it to me.

  8. Unfortunately, I also think this will be a growing trend with drug trials since no one understands NITs well; and because there are so many me-toos in the works.

    I wonder if any drugs got the slammer from the FDA due to poor NIT design...

    Also, if you feel like throwing up!: