Friday, September 28, 2007

Badly designed studies - is the FDA to blame?

On the front page of today's NYT (
is an article describing a report to be released today by teh inspector general of the Department of Health and Human Service that concludes that FDA oversight of clinical trials (mostly for drugs seeking approval by the agency from the industry) is sorely lacking.

In it, Rosa DeLauro (D-CT) opines that the agency puts industry interests ahead of public health. Oh, really?

Read the posts below and you might be of the same impression. Some of the study designs the FDA approves for testing of agents are just unconscionable. These studies have little or no value for the public health, science, or patients. They serve only as coffer-fillers for the industry. Sadly, they often serve as coffin-fillers when things sometimes go terribly awry. Think Trovan. Rezulin. Propulsid. Vioxx.

The medical community, as consumers of these "data" and the resulting products, has an obligation to its patients which extends beyond those which we see in our offices. We should stop tolerating shenanigans in clinical trials, "me-too" drugs, and corporate profiteering at the expense of patient safety.

Thursday, September 27, 2007

Defaults suggested to improve healthcare outcomes

In today's NEJM (, Halpern, Ubel, and Asch describe the use of defaults to improve utilization of evidence-based practices. This strategy, which requires that we give up our status quo and omission biases ( ), could prove highly useful - if we have the gumption to follow their good advice and adopt it.

It is known that patients recieve only approximately 50% of the evidence-based therapies that are indicated in their care (see McGlynn et al: and that there is a lag of approximately 17 years between substantial evidence of benefit of a therapy and its adoption into routine care.

Given this dismal state of affairs, it seems that the biggest risk is not that a patient is going to receive a defalut therapy that is harmful, wasteful, or not indicated, but rather that patients are going to continue to receive inadequate and incomplete care. The time to institute defaults into practice is now.

Wednesday, September 26, 2007

Dueling with anideulafungin

Our letter to the editor of the NEJM regarding the anidulafungin article (described in a blog post in July - see below) was published today and can be seen at: .

To say the least, I am disappointed in the authors' response, particularly in regards to the non-inferiority and superiority issues.

The "two-step" process they describe for sequential determination of non-inferiority followed by superiority is simply the way that a non-inferiority trial is conducted. Superiority is declared in a non-inferiority trial if the CI of the point estimate does not include zero. (See .

The "debate" among statisticians that they refer to is not really a debate at all, but relates to the distinction between a non-inferiority trial and an equivalence trial - in the latter, the CI of the point estimate must not include negative delta; in this case that would mean the 95% CI would have to fall so far to the left of zero that it did not include minus 20, or the pre-specified margin of non-inferiority. Obviously, the choice of a non-inferiority trial rather than an equivalence trial makes it easier to declare superiority. And this choice can create, as it did in this case, an apparent contradiction that the authors try to gloss over by restating the definition of superiority they chose when designing the trial.

Here is the contradiction, the violation of logic. The drug is declared superior because the 95% CI does not cross zero, but of course, that 95% CI is derived from a point estimate, in this case 15.4%. So, 15.4% is sufficient for the drug to be superior. But if your very design implied that a difference less than 20% is clinically negligible (a requirement for the rational determination of a delta, a prespecified margin of non-inferiority), aren't you obliged by reason and fairness to qualify the declaration of superiority by saying something like "but, we think that a 15.4% difference is clinically negligible?"

There is no rule that states that you must qualify it in this way, but I think it's only fair. Perhaps we, the medical community, should create a rule - namely that you cannot claim superiority in a non-inferiority trial, only in an equivalence trial. This would prevent the industry from getting one of the "free lunches" they currently get when they conduct these trials, and the apparent contradictions that sometimes arise from them.

Tuesday, September 25, 2007

Lilly, Xigris, the XPRESS trial and non-inferiority shenanigans

The problem with non-inferiority trials (in addition to the apparent fact that the pharmaceutical industry uses them to manufacture false realities) is that people don't generally understand them (which is what allows false realities to be manufactured and consumed.) One only need look at the Windish article described below to see that the majority of folks struggle with biomedical statistics.

The XPRESS trial, published in AJRCCM Sept. 1st, ( was mandated by the FDA as a condition of the approval of drotrecogin-alfa for severe sepsis. According to the authors of this study, the basic jist is to see if heparin interferes with the efficacy of Xigris (drotrecogin-alfa) in severe sepsis. The trial is finally published in a peer-reviewed journal, although Lilly has been touting the findings as supportive of Xigris for quite a while already.

The stated hypothesis was that Xigris+placebo is equivalent to Xigris+heparin (LMWH or UFH). [Confirmation of this hypothesis has obvious utility for Lilly and users of this drug because it would allay concerns of coadministration of Xigris and heparinoids, the use of the latter which is staunchly entrenched in ICU practice).

The hypothesis was NOT that Xigris+heparin is superior to Xigris alone. If Lilly had thought this, they would have conducted a superiority trial. They did not. Therefore, they must have thought that the prior probability of superiority was low. If the prior probability of a finding (e.g., superiority) is low, we need a strong study result to raise the posterior probability into a reasonable range - that is, a powerful study which produces a very small p-value (e.g., <0.001)>
  • This study used 90% confidence intervals. Not appropriate. This is like using a p-value of 0.10 for significance. I have calculated the more appropriate 95% CIs for the risk difference observed and they are: -0.077 to +0.004.
  • The analysis used was intention to treat. The more conservative method for an equivalence trial is to present the results as "as treated". This could be done at least in addition to the ITT analysis to see if the results are consistent.
  • Here we are doing an equivalence trial with mortality as an outcome. This requires us to choose a "delta" or mortality difference between active treatment and control which is considered to be clinically negligible. Is an increased risk of death of 6.2% negligible? I think not. It is simply not reasonable to conduct a non-inferiority or equivalence trial with mortality as the outcome. Mortality differences would have to be, I would say, less than 1% to convince me that they might be negligible.
  • Because an equivalence design was chosen, the 95% CIs (90% if you're willing to accept that -and I'm not) for the treatment difference would have to fall entirely outside of delta (6.2%) in order for treatment to be declared superior to placebo. Clearly it does not. So any suggestion that Xigris+heparin is superior to Xigris alone based on this study is bunkum. Hogwash. Tripe. Based upon the chosen design, superiority is not even approached. The touted p-value of 0.08 conceals this fact. If they had chosen an superiority design, yes, they would have been close. But they did not.
  • Equivalence was not demonstrated in this trial either, as the 95% (and the 90%) CIs crossed the pre-specified delta. So sorry.
    • The design of this study and its very conception as an equivalence trial with a mortality endpoint is totally flawed. Equivalence was not demonstrated even with a design that would seem to favor its demonstration. (Interestingly, if a non-inferiority design had been chosen, superiority of Xigris+heparin would in fact have been demonstrated! [with 90, but NOT with 95% CIs] ).

      The biggest problem I'm going to have is when the Kaplan-Meier curve presented in Figure 3A with its prominently featured "near miss" p-value of 0.09 is used as ammunition for the argument that Xigris+heparin trended toward superior in this study. If it had been a superiority trial, I would be more receptive of that trend. But you can't have your cake and eat it too. You either do a superiority trial, or you do an equivalence trial. In this case, the equivalence trial appeared to backfire.

      Having said all that, I think we can be reassured that Xigris+heparin is not worse than Xigris+placebo and the concern that heparin abrogates the efficacy of Xigris should be mostly dispelled. And because almost all critically ill patients are at high frisk of DVT/PE, they should all be treated with heparinoids, and the administration of Xigris should not change that practice.

      I just think we should stop letting folks get away with these non-inferiority/equivalence shenanigans. In this case, there is little ultimate difference. But in many cases a non-inferiority or equivalence trial such as this will allow the manufacture of a false reality. So I'll call this a case of "attempted manufacture of a false reality".

      Friday, September 21, 2007

      Medical Residents Don't Understand Statistics

      But they want to:

      This is but one of many unsettling findings of an excellent article by Windish et al in the September 5th issue of JAMA.

      Medical residents correctly answer only approximately 40% of questions pertaining to basic statistics related to clinical trials. Fellows and general medicine faculty with research training fared better statistically, but still have some work to do: they answered correctly approximately 70% of the questions.

      An advanced degree in addition to a medical degree conferred only modest benefit: 50% answered correctly rather than 40%.

      The solution to this apparent problem is therefore elusive. Even if we encouraged all residents to pursue advanced degrees or research training, we would still have vast room for improvement in the understanding of basic biomedical statistics. And this is not a realistic expectation (that they all pursue advanced degrees or research training).

      While it would appear that directed training in medical statistics might have a beneficial effect on performance of this test, with work hours restrictions and the daunting amount of material they must already master for the practice of medicine, it seems unlikely that a few extra courses in statistics during residency is going to make a large and sustainable difference.

      Moreover, we must remember that performance on this test is a surrogate outcome - what we're really interested in is how they practice medicine with whatever skills they have. My anecdotal experience is that few physicians are actually keeping abreast of the medical literature - few are actually reading the few journals that they subscribe to - so improving their medical evidence interpretation skills is going to have little impact on how they practice. (For example, few of my colleagues were aware of the Windish article itself, in spite of their practice in an academic center, its publication in a very high impact journal, and their considerable luxury of time compared to our colleagues in private practice.)

      In some ways, the encouragement that the average physician critically evaluate the medical literature seems like a far-fetched and idyllic notion. This may be akin to expecting them to stay abreast of the latest technology for running serum specimens, PCR machines, or to the sensitivity and specificity of various assays for BNP - they just don't have the time or the training to bother with nuances such as these, which are better left to the experts in the clinical and research laboratories. Likewise, it may be asking too much in the current era of medicine to expect that the average physician will possess and maintain biostatistical and trial analysis skills, consistently apply them to emerging literature, and change practice promptly and accordingly. Empirical evidence suggests that this is not happening, and I don't think it has much to do with lack of statistical skills - it has to do with lack of time.

      Perhaps what Windish et al have reinforced is support for the notion that individual physicians should not be expected to keep abreast of the medical literature, but should instead rely upon practice guidelines formulated by those experts properly equipped and compensated to appraise and make recommendations about the emerging evidence.

      Saturday, September 15, 2007

      Idraparinux, the van Gogh investigators, and clinical trials pointillism: connecting the dots shows that Idraparinux increases the risk of death

      It eludes me why the NEJM continues to publish specious, industry-sponsored, negative, non-inferiority trials. Perhaps they do it for my entertainment. And this past week, entertained I was indeed.

      Idraparinux is yet another drug looking for an indication. Keep looking, Sanofi. Your pipeline problems will not be solved by this one.

      First, let me dismiss the second article out of hand: it is not fair to test idraparinux against placebo (for the love of Joseph!) for the secondary prevention of VTE after a recent epidode! (

      It is old news that one can reduce the recurrence of VTE after a recent episode by either using low intensity warfarin ( or by extending the duration of warfarin anticoagulation ( Therefore, the second van Gogh study does not merit further consideration, especially given the higher rate of bleeding in this study.

      Now for the first study and its omissions and distortions. It is important to bear in mind that the only outcome that cannot be associated with ascertainment bias (assuming a high follow-up rate) is mortality, AND that the ascertainment of DVT and PE are fraught with numerous difficulties and potential biases.

      The Omission: Failure to report in the abstract that Idraparinux use was associated with an increased risk of death in these studies, which was significant in the PE study, and which trended strongly in the DVT study. The authors attempt to explain this away by suggesting that the increased death rate was due to cancer, but of course we are not told how causes of death were ascertained (a notoriously difficult and messy task), and cancer is associated with DVT/PE which is among the final common pathways of death from cancer. This alone, this minor factoid that Idraparinux was associated with an increased risk of death should doom this drug and should be the main headline related to these studies.

      Appropriate headline: "Idraparinux increases the risk of death in patients with PE and possibly DVT."

      If we combine the deaths in the DVT and PE studies, we see that the 6-month death rates are 3.4% in the placebo group and 4.5% in the idraparinux group, with an overall (chi-square) p-value of 0.035 - significant!

      This is especially worrisome from a generalizability perspective - if this drug were approved and the distinction between DVT and PE is blurred in clinical practice as it often is, we would have no way of being confident that we're using it in a DVT patient rather than a PE patient. Who wants such a messy drug?

      The Obfuscations and Distortions: Where to begin? First of all, no justification of an Odds Ratio of 2.0 as a delta for non-inferiority is given. Is twice the odds of recurrent DVT/PE insignificant? It is not. This Odds Ratio is too high. Shame.

      To give credit where it is due, the investigation at least used a one sided 0.025 alpha for the non-inferiority comparison.

      Second, regarding the DVT study, many if not the majority of patients with DVT also have PE, even if it is subclinical. Given that ascertainment of events (other than death) in this study relied on symptoms and was poorly described, that patients with DVT were not routinely tested for PE in the absence of symptoms, and that the risk of death was increased with idraparinux in the PE study, one is led to an obvious hypothesis: that the trend towary an increased risk of death in the DVT study patients who received idraparinux was due to unrecognized PE in some of these patients. The first part of the conclusion in the abstract "in patients with DVT, once weekly SQ idraparinux for 3 or 6 months had an efficacy similar to that of heparin and vitamin K antagonists" obfuscates and conceals this worrisome possibility. Many patients with DVT probably also had undiagnosed PE and might have been more likely to die given the drug's failure to prevent recurrences in the PE study. The increased risk of death in the DVT study might have been simply muted and diluted by the lower frequency of PE in the patients in the DVT study.

      Then there is the annoying the inability to reverse the effects of this drug with a very long half-life.

      Scientific objectivity and patient safety mandate that this drug not receive further consideration for clinical use. Persistence with the study of this drug will most likely represent "sunk cost bias" on the part of the manufacturer. It's time to cut bait and save patients in the process.

      Wednesday, September 5, 2007

      More on Prophylactic Cranial Irradiation

      One of our astute residents at OSU (Hallie Prescott, MD) wrote this letter to the editor of the NEJM about the Slotman article discussed 2 weeks ago - unfortunately, we did not meet the deadline for submission, so I'm posting it here:

      Slotman et al report that prophylactic cranial irradiation (PCI) increases median overall survival (a secondary endpoint) by 1.3 months in patients with small cell lung cancer. There were no significant differences in various quality of life (QOL) measures between the PCI and control groups. However, non-significant trends toward differences in QOL measures are noted in Table 2. We are not told the direction of these trends, and low compliance (46.3%) with QOL assessments at 9 months limits the statistical power of this analysis. Moreover, significant increases in side effects such as fatigue, nausea, vomiting, and leg weakness may limit the attractiveness of PCI for many patients. Therefore, the conclusion that “prophylactic cranial irradiation should be part of standard care for all patients with small-cell lung cancer” makes unwarranted assumptions about how patients with cancer value quantity and quality of life. The Evidence-Based Medicine working group has proposed that all evidence be considered in light of patients’ preferences, and we believe that this advice applies to PCI for extensive small cell lung cancer.


      1. Slotman B, Faivre-Finn C, Kramer G, Rankin E, Snee M, Hatton M et al. Prophylactic Cranial Irradiation in Extensive Small-Cell Lung Cancer. N Engl J Med 2007; 357(7):664-672.
      2. Weeks JC, Cook EF, O'Day SJ, Peterson LM, Wenger N, Reding D et al. Relationship Between Cancer Patients' Predictions of Prognosis and Their Treatment Preferences. JAMA 1998; 279(21):1709-1714.
      3. McNeil BJ, Weichselbaum R, Pauker SG. Speech and survival: tradeoffs between quality and quantity of life in laryngeal cancer. N Engl J Med 1981; 305(17):982-987.
      4. Voogt E, van der Heide A, Rietjens JAC, van Leeuwen AF, Visser AP, van der Rijt CCD et al. Attitudes of Patients With Incurable Cancer Toward Medical Treatment in the Last Phase of Life. J Clin Oncol 2005; 23(9):2012-2019.
      5. Guyatt GH, Haynes RB, Jaeschke RZ, Cook DJ, Green L, Naylor CD et al. Users' Guides to the Medical Literature: XXV. Evidence-Based Medicine: Principles for Applying the Users' Guides to Patient Care. JAMA 2000; 284(10):1290-1296.