Sunday, September 6, 2009

There's no such thing as a free lunch - unless you're running a non-inferiority trial. Gefitinib for pulmonary adenocarcinoma

A 20% difference in some outcome is either clinically relevant, or it is not. If A is worse than B by 19% and that's NOT clinically relevant and significant, then A being better than B by 19% must also NOT be clinically relevant and significant. But that is not how the authors of trials such as this one see it: . According to Mok and co-conspirators, if gefitinib is no worse in regard to progression free survival than Carboplatin-Paclitaxel based on a 95% confidence interval that does not include 20% (that is, it may be up to 19.9% worse, but not more worse), then they call the battle a draw and say that the two competitors are equally efficacious. However, if the trend is in the other direction, that is, in favor of gefitinib BY ANY AMOUNT HOWEVER SMALL (as long as it's statistically significant), they declare gefinitib the victor and call it a day. It is only because of widespread lack of familiarity with non-inferiority methods that they can get away with a free lunch like this. A 19% difference is either significant, or it is not. I have commented on this before, and it should come as no surprise that these trials are usually used to test proprietary agents ( ). Note also that in trials of adult critical illness, the most commonly sought mortality benefit is about 10% (more data on this forthcoming in a article soon to be submitted and hopefully published). So it's a difficult argument to subtend to say that something is "non-inferior" if it is less than 20% worse than something else. Designers of critical care trials will tell you that a 10% difference, often much less, is clinically significant.

I have created a figure to demonstrate the important nuances of non-inferiority trials using the gefitinib trial as an example. (I have adapted this from the Piaggio 2006 JAMA article of the CONSORT statement for the reporting of non-inferiority trials - a statement that has been largely ignored: .) The authors specified delta, or the margin of non-inferiority, to be 20%. I have already made it clear that I don't buy this, but we needn't challenge this value to make war with their conclusions, although challenging it is certainly worthwhile, even if it is not my current focus. This 20% delta corresponds to a hazard ratio of 1.2, as seen in the figure demarcated by a dashed red line on the right. If the hazard ratio (for progression or death) demonstrated by the data in the trial were 1.2, that would mean that gefitinib is 20% worse than comparator. The purpose of a non-inferiority trial is to EXCLUDE a difference as large as delta, the pre-specified margin of non-inferiority. So, to demonstrate non-inferiority, the authors must show that the 95% confidence interval for the hazard ratio falls all the way to the left of that dashed red line at HR of 1.2 on the right. They certainly achieved this goal. Their data, represented by the lowermost point estimate and 95% CI, falls entirely to the left of the pre-specified margin of non-inferiority (the right red dashed line). I have no arguments with this. Accepting ANY margin of non-inferiority (delta), gefitinib is non-inferior to the comparator. What I take exception to is the conclusion that gefitinib is SUPERIOR to comparator, a conclusion that is predicated in part on the chosen delta, to which we are beholden as we make such conclusions.

First, let's look at [hypothetical] Scenario 1. Because the chosen delta was 20% wide (and that's pretty wide - coincidentally, that's the exact width of the confidence interval of the observed data), it is entirely possible that the point estimate could have fallen as pictured for Scenario 1 with the entire CI between an HR of 1 and 1.2, the pre-specified margin of non-inferiority. This creates the highly uncomfortable situation in which the criterion for non-inferiority is fulfilled, AND the comparator is statistically significantly better than gefitinib!!! This could have happened! And it's more likely to happen the larger you make delta. The lesson here is that the wider you make delta, the more dubious your conclusions are. Deltas of 20% in a non-inferiority trial are ludicrous.

Now let's look at Scenarios 2 and 3. In these hypothetical scenarios, comparator is again statistically significantly better than gefitinib, but now we cannot claim non-inferiority because the upper CI falls to the right of delta (red dashed line on the right). But because our 95% confidence interval includes values of HR less than 1.2 and our delta of 20% implies (or rather states) that we consider differences of less than 20% to be clinically irrelevant, we cannot technically claim superiority of comparator over gefitinib either. The result is dubious. While there is a statistically significant difference in the point estimate, the 95% CI contains clinically irrelevant values and we are left in limbo, groping for a situation like Scenario 4, in which comparator is clearly superior to gefitinib, and the 95% CI lies all the way to the right of the HR of 1.2.

Pretend you're in Organic Chemistry again, and visualize the mirror image (enantiomer) of scenario 4. That is what is required to show superiority of gefitinib over comparator - a point estimate for the HR whose 95% CI does not include delta or -20%, an HR of 0.8. The actual results come close to Scenario 5, but not quite, and therefore, the authors are NOT justified in claiming superiority. To do so is to try to have a free lunch, to have their cake and eat it too.

You see, the larger you make delta, the easier it is to achieve non-inferiority. But the more likely it is also that you might find a statistically significant difference favoring comparator rather than the preferred drug which creates a serious conundrum and paradox for you. At the very least, if you're going to make delta large, you should be bound by your honor and your allegiance to logic and science to make damned sure that to claim superiority, your 95% confidence interval must not include negative delta. If not, shame on you. Eat your free lunch if you will, but know that the ireful brow of logic and reason is bent unfavorably upon you.


  1. Hi Scott,
    I have to admit I don't get it. I compared your graph to the one from the CONSORT statement. To me it looks like they fully endorse exactly what your castigating. Somehow in the CONSORT graph there is no -Delta.

  2. Jan -

    Thanks for your interest.

    I know it's confusing, and what I'm advocating is an extension of the Consort Statement, with my own twist of symmetry. Their figure is for non-inferiority trials rather than equivalence trials, the former allowing assymetry, the latter not. I refer to the Consort article for education and clarity of principles, not because I entirely agree with it, which I do not. I would advocate Equivalence Trials, perfectly symmetrical, as a way of mitigating these shenanigans.

  3. Interesting post, I think I understood most of the big words. I especially like your use of the phrase 'mitigating shenanigans.'