Saturday, April 27, 2013

Tell Them to Go Pound Salt: Ideology and the Campaign to Legislate Dietary Sodium Intake


In the March 28th, 2013 issue of the NEJM, a review of sorts entitled "Salt in Health and Disease - A Delicate Balance" by Kotchen et al can be found.  My interest in this topic stems from my interest in the question of association versus causation, my personal predilection for salt, my observation that I lose a good deal of sodium in outdoor activities in the American Southwest, and my concern for bias in the generation of and especially the implementation of evidence in medicine as public policy.

This is an important topic, especially because sweeping policy changes regarding the sodium content of food are proposed, but it is a nettlesome topic to study, rife with hobgoblins.  First we need a well-defined research question:  does reduction in dietary sodium intake:  a.) reduce blood pressure in hypertensive people?  in all people?  b.) does this reduction in hypertension lead to improved outcomes (hypertension is in some ways a surrogate marker)?  In a utopian world, we would randomize thousands of participants to diets low in sodium and "normal" in sodium, we would measure sodium intake carefully, and we would follow the participants for changes in blood pressure and clinical outcomes for a protracted period.  But alas, this has not been done, and it will not likely be done because of cost and logistics, among other obstacles (including ideology).

Friday, April 19, 2013

David versus Goliath on the Battlefield of Non-inferiority: Strangeness is in the Eye of the Beholder

In this week's JAMA is my letter to the editor about the CONSORT statement revision for the reporting of non-inferiority trials, and the authors' responses.  I'll leave it to interested readers to view for themselves the revised CONSORT statement, and the letter and response.

In sum, my main argument is that Figure 1 in the article is asymmetric, such that inferiority is stochastically less likely than superiority and an advantage is therefore conferred to the "new" [preferred; proprietary; profitable; promulgated] treatment in a non-inferiority trial.  Thus the standards for interpretation of non-inferiority trials are inherently biased.  There is no way around this, save for revising the standards.

The authors of CONSORT say that my proposed solution is "strange" because it would require revision of the standards of interpretation for superiority trials as well.  For me it is "strange" that we would endorse asymmetric and biased standards of interpretation in any trial.  The compromised solution, as I suggested in my letter, is that we force different standards for superiority only in the context of a non-inferiority trial.  Thus, superiority trial interpretation standards remain untouched.  It is only if you start with a non-inferiority trial that you have a higher hurdle to claiming superiority that is contingent on evidence of non-inferiority in the trial that you designed.  This would disincentivise the conduct of non-inferiority trials for a treatment that you hope/think/want to be superior.  In the current interpretation scheme, it's a no-brainer - conduct a non-inferiority trial and pass the low hurdle for non-inferiority, and then if you happen to be superior too, BONUS!

In my proposed scheme, there is no bonus superiority that comes with a lower hurdle than inferiority.  As I said in the last sentence, "investigators seeking to demonstrate superiority should design a superiority trial."  Then, there is no minimal clinically important difference (MCID) hurdle that must be cleared, and a statistical difference favoring new therapy by any margin lets you declare superiority.  But if you fail to clear that low(er) hurdle, you can't go back and declare non-inferiority.  

Which leads me to something that the word limit of the letter did not allow me to express:  we don't let unsuccessful superiority trials test for non-inferiority contingently, so why do we let successful non-inferiority trials test for superiority contingently?

Symmetry is beautiful;  Strangeness is in the eye of the beholder.

(See also:  Dabigatran and Gefitinib especially the figures, analogs of Figure 1 of Piaggio et al, on this blog.)

Wednesday, April 17, 2013

Out to Lunch: Nutrition and Supplementation in Critical Illness


A study in week's issue of the NEJM (Heyland et al, Glutamine in Critical Illness, April 18th, 2013) left me titillated in consideration of how new evidence demonstrates underlying misconceptions, shortcomings, and biases in our understanding of, and general approach to, disease and its pathophysiology.  Before you read on, try to predict:  Will supplemental glutamine and anti-oxidants influence the course of critical illness?

The Canadian Critical Care Trials group has continued the effort to determine the causal role of macro- and micronutrients and their deficiency and supplementation in critical (and other) illness.  The results are discouraging (glutamine and anti-oxidants don't work), but only if we consider RCTs to be a tool for the assessment of the therapeutic value of putative molecules and their manipulation in disease states.  RCTs are such a tool, but only if we happen to be fortunate enough to be pursuing a causal pathway.  In the absence of this good fortune, RCTs remain valuable but only to help us understand that the associations we have labored to delineate are not causal associations, and that we should direct our focus to other, potentially more fruitful, investigations.  As I articulated in the last post, this dual role of RCTs represents a paradox which can be the source of great cognitive dissonance (and misunderstanding).  The (properly conducted and adequately powered) RCT is a method for determining if observational associations are causal associations, but the promise of confirming causal associations in an RCT by manipulating dependent variables with a potential therapeutic agents carries with it the possibility of proving the efficacy of a disease treatment.   During this protracted scientific process, there is a tendency to get carried away, such that our hypothesis mutates into a premise that we are studying a causal factor and the RCT is the last hurdle to confirming that we have advanced not only the science of causation, but also clinical therapeutics.  Alas, the historical record shows that we are far better at advancing our understanding (if we are willing to accept the results for what they are) than we are at finding new treatments for disease, because most of the associations we are investigating turn out not to be causal.

Sunday, March 24, 2013

Why Most Clinical Trials Fail: The Case of Eritoran and Immunomodulatory Therapies for Sepsis

The experimenter's view of the trees.
The ACCESS trial of eritoran in the March 20, 2013 issue of JAMA can serve as a springboard to consider why every biological and immunomodulatory therapy for sepsis has failed during the last 30 years.  Why, in spite of extensive efforts spanning several decades have we failed to find a therapy that favorably influences the course of sepsis?  More generally, why do most clinical trials, when free from bias, fail to show benefit of the therapies tested?

For a therapeutic agent to improve outcomes in a given disease, say sepsis, a fundamental and paramount precondition must be met:  the agent/therapy must interfere with part of the causal pathway to the outcome of interest.  Even if this precondition is met, the agent may not influence the outcome favorably for several reasons:
  • Causal pathway redundancy:  redundancy in causal pathways may mitigate the agent's effects on the downstream outcome of interest - blocking one intermediary fails because another pathway remains active
  • Causal factor redundancy:  the factor affected by the agent has both beneficial and untoward effects in different causal pathways - that is, the agent's toxic effects may outweigh/counteract its beneficial ones through different pathways
  • Time dependency of the causal pathway:  the agent interferes with a factor in the causal pathway that is time dependent and thus the timing of administration is crucial for expression of the agent's effects
  • Multiplicity of agent effects:  the agent has multiple effects on multiple pathways - e.g., HMG-CoA reductase inhibitors both lower LDL cholesterol and have anti-inflammatory effects.  In this case, the agent may influence the outcome favorably, but it's a trick of nature - it's doing so via a different mechanism than the one you think it is.

Tuesday, March 12, 2013

Falling to Pieces: Hemolysis of the Hemoglobin Hypothesis


A paramount goal of this blog is to understand the evidence as it applies to the epistemology of medical knowledge, hypothesis testing, and overarching themes in the so-called evidence based medicine movement.  Swedberg et al report the results of a large[Amgen funded] randomized controlled trial of darbepoetin [to normalize hemoglobin values] in congestive heart failure (published online ahead of print this weekend) which affords us the opportunity to explore these themes afresh in the context of new and prior data.

The normalization heuristic, simply restated, is the tendency for all healthcare providers including nurses, respiratory therapists, nutritionists, physicians, and pharmacists among others, to believe intuitively or explicitly that values and variables that can be measured should be normalized if interventions to this avail are at their disposal.  As an extension, modifiable variables should be measured so that they can be normalized.  This general heuristic is deeply flawed, and indeed practically useless as a guide for clinical care.

Sunday, March 3, 2013

HFOV Fails as a Routine Therapy for moderate-to-severe ARDS. Musings on the Use and Study of “Rescue Therapies”.

Ferguson et al report the results of the OSCILLATE randomized controlled trial of HFOV for moderate to severe ARDS in this week’s NEJM.  (A similar RCT of HFOV, the OSCAR trial, is reported in the same issue but I limit most of my commentary to OSCILLATE because I think it’s a better and more interesting trial and more data are presented in its report.)  A major question is answered by this trial, but an important question remains open:  is HFOV an acceptable and rational option as “rescue therapy” in certain patients with “refractory” ARDS?  I remain undecided about this question, and its implications are the subject of this post.

Before I segue to the issue of the study and efficacy of rescue therapies, let’s consider some nuances of this trial:

·         Patients in both groups received high doses of sedatives (average midazolam dose for the first week: 8.3 mg/hour in the HFOV group versus 5.9 mg/hour in the control group – a 41% increase in HFOV).  Was this “too much” sedation?  What if propofol had been used instead?

·         Patients in the HFOV group received significantly more paralytics.  If you believe the Papazian data (I don’t) paralytics should confer a mortality benefit in early ARDS and this should contribute to LOWER mortality in the HFOV group.  What if paralytics had been used less frequently?

·         Does HFOV confer a nocebo effect by virtue of its “unnatural” pattern of ventilation, its “requirement” for more sedation and paralysis, or the noise associated with its provision, or its influence on the perceptions of caregivers and patient’s families (recognizing that deaths after withdrawal of life support were similar in HFOV versus conventional ventilation (55 versus 49%, P=0.12)?

·         The respiratory frequency in the HFOV group (5.5 Hz) was at the low end of the usual range (3-15 Hz).  If a higher frequency (and a lower tidal volume) had been delivered, would the result have changed?  (Probably not.)

·         What about the high plateau pressure in the control group (32 cm H2O) despite the low tidal volume of 6.1 ml/kg PBW?  Why was not tidal volume reduced such that plateau pressure was lower than the commonly recommended target of 30 cm H2O?  Did this make a difference?  (Probably not.)

·         Why was mortality higher in the minority (12%) of control patients who were changed to HFOV (71% mortality)?  Is this related to confounding by indication or reflective of the general harmful effects of HFOV?

·         Why was there a difference between the OSCILLATE study and the OSCAR study, reported in the same issue, in terms of mortality?  Because OSCILLATE patients were sicker?  Because OSCAR control patients received higher tidal volumes, thereby curtailing the advantage of conventional ventilation?  I find this last explanation somewhat compelling.

Monday, January 28, 2013

Coffee Drinking, Mortality, and Prespecified Falsification Endpoints

A few months back, the NEJM published this letter in response to an article by Freedman et al in the May 17, 2012 NEJM reporting an association between coffee drinking and reduced mortality found in a large observational dataset.  In a nutshell, the letter said that there was no biological plausibility for mortality reductions resulting from coffee drinking so the results were probably due to residual confounding, and that reductions in mortality in almost all categories (see Figure 1 of the index article) including accidents and injuries made the results dubious at best.  The positive result in the accidents and injuries category was in essence a failed negative control in the observational study.

Last week in the January 16th issue of JAMA Prasad and Jena operationally formalized this idea of negative controls for observational studies, especially in light of Ioannidis' call for a registry of observational studies.  They recommend that investigators mining databases establish a priori hypotheses that ought to turn out negative because they are biologically implausible.  These hypotheses can therefore serve as negative controls for the observational associations of interest, the ones that the authors want to be positive.  In essence, they recommend that the approach to observational data become more scientific.  At the most rudimentary end of the dataset analysis spectrum, investigators just mine the data to see what interesting associations they can find.  In the middle of the spectrum, investigators have a specific question that they wish to answer (usually in the affirmative), and they leverage a database to try to answer that question.  Prasad and Jena are suggesting going a step further towards the ideal end of the spectrum:  to specify both positive and negative associations that should be expected in a more holistic assessment of the ability of the dataset to answer the question of interest.  (If an investigator were looking to rule out an association rather than to find one, s/he could use a positive control rather than a negative one [a falsification end point] to establish the database's ability to confirm expected differences.)

I think that they are correct in noting that the burgeoning availability of large databases (of almost anything) and the ease with which they can be analyzed poses some problems for interpretation of results.  Registering observational studies and assigning prespecified falsification end points should go a long way towards reducing incorrect causal inferences and false associations.

I wish I had thought of that.

Added 3/3/2013 - I just realized that another recent study of dubious veracity had some inadvertent unspecified falsification endpoints, which nonetheless cast doubt on the results.  I blogged about it here:  Multivitamins caused epistaxis and reduced hematuria in male physicians.

Sunday, January 27, 2013

Therapeutic Agnosticism: Stochastic Dominance of the Null Hypothesis

Here are some more thoughts on the epistemology of medical science and practice that were stimulated by reading three articles this week relating to monitoring interventions:  monitoring respiratory muscle function in the ICU (AJRCCM, January 1, 2013); monitoring intracranial pressure in traumatic brain injury (NEJM, December 27, 2013); and monitoring of gastric residual volume in the ICU (JAMA, January 16th, 2013).

In my last post about transfusion thresholds, I mused that overconfidence in their understanding of complex pathophysiological phenomena (did I say arrogance?) leads investigators and practitioners to overestimate their ability to discern the value and efficacy of a therapy in medicine.  Take, for instance, the vascular biologist studying pulmonary hypertension who, rounding in the ICU, elects to give sildenafil to a patient with acute right heart failure, and who proffers a plethora of complex physiological explanations for this selection.  Is there really any way for anyone to know the effects of sildenafil in this scenario?

Monday, January 14, 2013

Death by 1000 Needlesticks: The Nocebo effects of Hospitalization

I couldn't decide if this belonged on Status Iatrogenicus or the Medical Evidence Blog.  Since it has relevance to both, I'll post a link here:

http://statusiatrogenicus.blogspot.com/2013/01/death-by-1000-needlesticks-nocebo.html

Hemoglobin In Limbo: How Low Can [should] It Go?

In this post about transfusion thresholds in elderly patients undergoing surgery for hip fracture, I indulged in a rant about the irresistible but dodgy lure of transfusing hospitalized patients with anemia (which I attributed to the normalization heuristic) and the wastefullness and potential harms it entails.  But I also hedged my bets, stating that I could get by with transfusing only one unit of blood a month in non-acutely bleeding patients, while noting in a comment that a Cochrane review of this population was equivocal and the authors suggested an RCT of transfusion in acute upper gastrointestinal hemorrhage.  Little did I know at the time that just such a trial was nearing completion, and that 12 units of PRBCs could probably get me by for a year in just about all the patients I see.

In this article by Villanueva in the January 3, 2013 issue of the NEJM, Spanish investigators report the results of a trial of transfusion thresholds in patients with acute upper gastrointestinal hemorrhage.  After receiving one unit of PRBCs for initial stabalization, such patients were randomized to receive transfusions at a hemoglobin threshold of 7 versus 9 mg/dL.  And lo! - the probability of transfusion was reduced 35%, survival increased by 4%, rebleeding decreased by 4%, and adverse events decreased by 8% in the lower threshold group - all significant!  So it is becoming increasingly clear that the data belie the sophomoric logic of transfusion.

Tuesday, December 4, 2012

The Cholesterol Hypothesis on the Beam: Dalcetrapib, PCSK9 inhibitors, and "off-target" effects of statins

The last month has witnessed the publication of three lines of research that could tip the balance of the evidence for the cholesterol hypothesis depending how things play out.  Followers of this blog know that I have a healthy degree of skepticism for the cholesterol hypothesis which was emboldened by studies of torcetrapib (blogged here and here) and anacetrapib that have come to light along with the failures of vytorin (ezetimibe; blogged here and here and hereand the addition of niacin to statins to improve cardiovascular outcomes in parallel with improvements in cholesterol numbers.

I think it's finally time to bury the CETP inhibitors. The November 29th NEJM (published online on November 5th) reports the results of the dal-OUTCOMES trial of dalcetrapib in patients with a recent acute coronary syndrome. Almost 16,000 patients were enrolled in this study of high risk patients, providing the study with ample power to detect meaningful improvements in cardiovascular outcomes - but alas, none were detected. The target is HDL, so the LDL hypothesis is not debunked by these data, but I think it is challenged nonetheless.

Bite the Bullet and Pull It: The NIKE approach to extubation.


I was very pleased to see McConville and Kress' Review article in the NEJM this week (December 6, 2012 issue) regarding weaning patients from the ventilator. I have long been a fan of the University of Chicago crew as well as their textbook and their pioneering study of sedation interruption a decade ago.


In their article, they provide a useful review of the evidence relating to the discontinuation of mechanical ventilation (aka weaning , liberation, and various other buzz words used to describe this process.) Yet at the end of the article, in describing their approach to discontinuation of mechanical ventilation, they provide a look into the crystal ball that I think and hope shows what the future may hold in this area. In a nutshell, they push the envelope and try to extubate patients as quickly as they can, ignoring inconvenient conventional parameters that may impede this approach in select instances.

Much of the research in this field has been dedicated to trying to predict the result of extubating a patient. (In the case of the most widely cited study, by Yang and Tobin, the research involves predicting the result of a predictor of the ultimate result of interest. This reminds me of Cervantes' Quijote - a story within a story within a story....but I digress.) And this is a curious state of affairs. What other endeavor do we undertake in critical care medicine where we wring our hands and so helplessly and wantonly try to predict what is going to happen? Don't we usually just do something and see what happens, making corrections along the way, in silent acknowledgment that predicting the future is often a fool's errand? What makes extubation so different? Why the preoccupation with prediction when it comes to extubation? Why not "Just Do It" and see what happens?

Wednesday, October 24, 2012

A Centrum a Day Keeps the Cancer at Bay?


Alerted as usual by the lay press to the provocative results of a non-provocative study, I read with interest the article in the October 17th JAMA by Gaziano and colleagues: Multivitamins in the Prevention of Cancer in Men. From the lay press descriptions (see: NYT summary and a less sanguine NYT article published a few days later,) I knew only that it was a positive (statistically significant) study, that the reduction in cancer observed was 8%, that a multivitamin (Centrum Silver) was used, and the study population included 14,000 male physicians.

Needless to say, in spite of a dormant hope something so simple could prevent cancer, I was skeptical. Despite decades, perhaps eons of enthusiasm for the use of vitamins, minerals, and herbal remedies, there is, to my knowledge (please, dear reader, direct me to the data if this is an omission) no credible evidence of a durable health benefit from taking such supplements in the absence of deficiency. But supplements have a lure that can beguile even the geniuses among us (see: Linus Pauling). So before I read the abstract and methods to check for the level of statistical significance, the primary endpoint, the number of endpoints, and sources of bias, I asked myself: "What is the probability that taking a simple commercially available multivitamin can prevent cancer?" and "what kind of P-value or level of statistical significance would I require to believe the result?" Indeed, if you have not yet seen the study, you can ask yourself those same questions now.

Thursday, September 27, 2012

True Believers: Faith and Reason in the Adoption of Evidence

In last week's NEJM, in an editorial response to an article demonstrating that physicians, in essence, probability adjust (a la Expected Utility Theory) the likelihood that data are true based on the funding source of a study, editor-in-Chief Jeffery M. Drazen implored the journal's readership to "believe the data." Unfortunately, he did not answer the obvious question, "which data?" A perusal of the very issue in which his editorial appears, as well as this week's journal, considered in the context of more than a decade of related research demonstrates just how ironic and ludicrous his invocation is.

This November marks the eleventh year since the publication, with great fanfare, of Van den Berghe's trial of intensive insulin therapy (IIT) in the NEJM.  That article was followed by what I have called a "premature rush to adopt the therapy" (I should have called it a stampede), creation of research agendas in multiple countries and institutions devoted to its study, amassing of reams of robust data failing to confirm the original results, and a reluctance to abandon the therapy that is rivaled in its tenacity only by the enthusiasm that drove its adoption.  In light of all the data from the last decade, I am convinced of only one thing - that it remains an open question whether control of hyperglycemia within ANY range is of benefit to patients.
Suffice it to say that the Van den Berghe data have not suffered from lack of believers - the Brunkhorst, NICE-SUGAR, and Glucontrol data have - and  it would seem that in many cases what we have is not a lack of faith so much as a lack of reason when it comes to data.  The publication of an analysis of hypoglycemia using the NICE-SUGAR database in the September 20th NEJM, and a trial in this week's NEJM involving pediatric cardiac surgery patients by by Agus et al gives researchers and clinicians yet another opportunity to apply reason and reconsider their belief in IIT and for that matter the treatment of hyperglycemia in general.

Thursday, May 24, 2012

Fever, external cooling, biological precedent, and the epistemology of medical evidence

It is rare occasion that one article allows me to review so many aspects of the epistemology of medical evidence, but alas Schortgen et al afforded me that opportunity in the May 15th issue of AJRCCM.

The issues raised by this article are so numerous that I shall make subsections for each one. The authors of this RCT sought to determine the effect of external cooling of febrile septic patients on vasopressor requirements and mortality. Their conclusion was that "fever control using external cooling was safe and decreased vasopressor requirements and early mortality in septic shock." Let's explore the article and the issues it raises and see if this conclusion seems justified and how this study fits into current ICU practice.

PRIOR PROBABILITY, BIOLOGICAL PLAUSIBILITY, and BIOLOGICAL PRECEDENTS

These are related but distinct issues that are best considered both before a study is planned, and before its report is read. A clinical trial is in essence a diagnostic test of a hypothesis, and like a diagnostic test, its influence on what we already know depends not only on the characteristics of the test (sensitivity and specificity in a diagnostic test; alpha and power in the case of a clinical trial) but also on the strength of our prior beliefs. To quote Sagan [again], "extraordinary claims require extraordinary evidence." I like analogies of extremes: no trial result is sufficient to convince the skeptical observer that orange juice reduces mortality in sepsis by 30%; and no evidence, however cogently presented, is sufficient to convince him that the sun will not rise tomorrow. So when we read the title of this or any other study, we should pause to ask: What is my prior belief that external cooling will reduce mortality in septic shock? That it will reduce vasopressor requirements?

Sunday, December 18, 2011

Modern Day Bloodletting: The Good Samaritan, the Red Cross, and the Jehovah's Witness

How many studies do you suppose that we need before doctors realize that their tendency to want to transfuse blood in every manner of patient admitted to the hospital is nothing more than an exercise in stupidity futility based on the normalization heuristic?  It's a compelling logic and an irresistible practice, I know.  The hemoglobin level is low, that can't be good for the heart, circulation, perfusion, oxygen delivery, you name it.  If we just give a transfusion or two, everything will be all better.  I can hear family members on their mobile phones reassuring other loved ones that the doctors are acting with great prudence and diligence taking care of Mr. Jones, having perspicaciously measured his hemoglobin (as by routine, for a hospital charge of ~$300/day - the leviathan bill and the confused, incredulous faces come months later - "why does it cost so much?"), discovered that perilous anemia, and ordered two units of life-saving blood to be transfused.  It's so simple but so miraculous!  Thank God for the Red Cross!

Not so fast.  The TRICC trial published in 1999 demonstrated that at least in critically ill patients, using a lower as compared to a higher transfusion threshold led to a statistically insignificant trend towards improved outcomes in the lower threshold group.  That is, less blood is better.  For every reason you can think of that transfusion can improve physiological parameters or outcomes, there is a counterargument about how transfusions can wreak havoc on homeostasis and the immune system (see :Marik_2008_CCM, and others.)

Not to mention the cost.  My time honored estimate of the cost of one unit of PRBCs was about $400.  It may indeed be three times higher.  That's right, $1200 per unit transfused, and for reasons of parity or some other nonsense, in clinical practice they're usually transfused in "twos".  Yep, $2400 a pair.  (Even though Samaritans donate for free, the cost of processing, testing, storage, transportation, etc. drive up the price.)  What value do we get for this expense?

Thursday, November 10, 2011

Post-hOckham analyses - the simplest explanation is that it just plain didn't flipp'n work


You're probably familiar with that Franciscan friar Sir William of Ockham, and his sacred saw. Apparently the principle has been as oversimplified as it has ignored, as a search of Wikipedia will attest. Suffice it to say, nonetheless, that this maxim guides us to select the simplest from among multiple explanations for any phenomenon - and this intuitively makes sense, because there are infinite and infinitely complex possible explanations for any phenomenon.

So I'm always amused and sometimes astonished when medical scientists reappraise their theories after they've been defeated by their very own data and begin to formulate increasingly complex explanations and apologies, so smitten and beholden to them as they are. "True Believers" is what Jon Abrams, MD, one of my former attendings, used to call them. The transition from scientist to theist is an insidious and subversive one.

The question is begged: did we design such and such clinical trial to test the null hypothesis or not? If some post-hoc subgroup is going to do better with therapy XYZ, why didn't we identify that a priori? Why didn't we test just THAT group? Why didn't we say, in advance, "if this trial fails to show efficacy, it will be because we should have limited it to this or that subgroup. And if it fails, we will follow up with a trial of this or that subgroup."

Tuesday, November 8, 2011

The Nihilist versus the Trialist: Why Most Published Research Findings Are False

I came across this PLoS Med article today that I wish I had seen years ago: Why Most Published Research Findings Are False . In this delightful essay, John P. A. Ioannidis describes why you must be suspicious of everything you read, because most of it is spun hard enough to give you a wicked case of vertigo. He highlights one of the points made repeatedly on this blog, namely that all hypotheses are not created equal, and some require more evidence to confirm (or refute) than others - basically a Bayesian approach to the evidence. With this approach, the diagnostician's "pre-test probability" becomes the trialist's "pre-study probability" and likelihood ratios stem from the data from the trial as well as alpha and beta. He creates a function for trial bias and shows how this impacts the probability that the trial's results are true as the pre-study probability and the study power are varied. He infers that alpha is probably too high (and hence Type I error rates too high) and beta is too low (both alpha and beta influence the likelihood ratio of a given dataset). He discusses terms (coined by others whom he references) such as "false positive" for study reports, and highlights several corollaries of his analysis (often discussed on this blog), including:
  • beware of studies with small sample sizes
  • beware of studies with small effect sizes (delta)
  • beware of multiple hypothesis testing and soft outcome measures
  • beware of flexibility of designs (think Prowess/Xigris among others), definitions, outcomes (NETT trial), and analytic modes

Perhaps most importantly, he discusses the role that researcher bias may play in analyzing or aggregating data from research reports - the GIGO (garbage in, garbage out) principle. Conflicts of interest extend beyond the financial to tenure, grants, pride, and faith. Gone forever is the notion of the noble scientist in pursuit of the truth, replaced by the egoist climber of ivory and builder of Babel towers, so bent on promoting his or her (think Greet Van den Berghe) hypothesis that they lose sight of the basic purpose of scientific testing, and the virtues of scientific agnosticism.

Thursday, October 6, 2011

ECMO and H1N1 - more fodder for debate

There is perhaps no better way to revive the dormant blog than to highlight an article published in JAMA yesterday about the role and effect of ECMO in the H1N1 epidemic in England: http://jama.ama-assn.org/content/early/2011/09/28/jama.2011.1471.full . Other than to recognize its limitations which are similar if not identical to those of the CESAR trial, there is little to say about this study beyond that it further bolsters the arguments of my last post about ECMO and the ongoing debate about it.

In light of the recent failures of albuterol and omega-3 fatty acids in ARDS treatment, I echo the editorialist in calling for funding for a randomized controlled trial of ECMO in severe ARDS (see: http://jama.ama-assn.org/content/early/2011/09/28/jama.2011.1504.full ).

Tuesday, April 19, 2011

ECMO and logic: Absence of Evidence is not Evidence of Absence

I have been interested in ECMO for adults with cardiorespiratory failure since the late 1990s during the Hantavirus cardiopulmonary syndrome endemic in New Mexico, when I was a house officer at the University of New Mexico. Nobody knows for sure if our use of AV ECMO there saved any lives, but we all certainly suspected that it did. There were simply too many patients too close to death who survived. It made an impression.

I have since practiced in other centers where ECMO was occasionally used, and I had the privilege of writing a book chapter on ECMO for adult respiratory failure in the interim.

But alas, I now live in the Salt Lake Valley where, for reasons as cultural as they are scientific, ECMO is taboo. The main reason for this is, I think, an over-reliance on outdated data, along with too much confidence in and loyalty to, locally generated data.

And this is sad, because this valley was hit with another epidemic two years ago - the H1N1 epidemic, which caused the most severe ARDS I have seen since the Hanta days in New Mexico. To my knowledge, no patients in the Salt Lake Valley received ECMO for refractory hypoxemia in H1N1 disease.


Thus I read with interest the Pro Con debate in Chest a few months back, and revisited in the correspondence of the current issue of Chest, which was led by some of the local thought leaders (and those who believe that, short of incontrovertible evidence, ECMO should remain taboo and outright disparaged) - See: http://chestjournal.chestpubs.org/content/139/4/965.1.citation and associated content.

It was an entertaining and incisive exchange between a gentleman in Singapore with recent ECMO experience in H1N1 disease, and our local thought leaders, themselves led by Dr. Alan Morris. I leave it to interested readers to read the actual exchange, as it is too short to merit a summary here. My only comment is that I am particularly fond of the Popper quote, taken from The Logic of Scientific Discovery: "If you insist on strict proof (or disproof) in the empirical sciences, you will never benefit from experience and never learn from it how wrong you are." Poignant.

I will add my own perhaps Petty insight into the illogical and dare I say hypocritical local taboo on ECMO. ECMO detractors would be well-advised to peruse the first Chapter in Martin Tobin's Principles and Practice of Mechanical Ventilation called "HISTORICAL PERSPECTIVE ON THE DEVELOPMENT OF MECHANICAL VENTILATION". As it turns out, mechanical ventilation for most diseases, and particularly for ARDS, was developed empirically and iteratively during the better part of the last century, and none of that process was guided, until the last 20 years or so, by the kind of evidence that Morris considers both sacrosanct and compulsory. Indeed, Morris, each time he uses mechanical ventilation for ARDS, is using a therapy which is unproved to the standard that he himself requires. And indeed, the decision to initiate mechanical ventilation for a patient with respiratory failure remains one of the most opaque areas in our specialty. There is no standard. Nobody knows who should be intubated and ventilated, and exactly when - it is totally based on gestalt, is difficult to learn or to teach, and is not even addressed in studies of ARDS. Patients must be intubated and mechanically ventilated for entry to an ARDS trial, but there are no criteria which must be met on how, when, and why they were intubated. It's just as big a quagmire as the one Morris describes for ECMO.

And much as he, and all of us, will not stand by idly and allow a spontaneously breathing patient with ARDS to remain hypoxemic with unacceptable gas exchange, those of us with experience with ECMO, an open mind, equipoise, and freedom from rigid dogma will not stand by idly and watch a ventilated patient remain hypoxemic with unacceptable gas exchange for lack of ECMO.

It is the same thing. Exactly the same thing.

Saturday, April 9, 2011

Apixaban: It's been a while since I've read about a new drug that I actually like

In the March 3rd NEJM, Apixaban makes its debut on the scene of stroke prevention in Atrial Fibrillation with the AVERROES trial (see: http://www.nejm.org/doi/full/10.1056/NEJMoa1007432#t=abstract ), and I was favorably impressed. The prophylaxis of stroke in atrial fibrillation is truly an unmet need because of the problematic nature of chronic anticoagulation with coumarin derivatives. So a new player on the team is welcome.

A trusted and perspicacious friend dislikes the AVERROES study for the same reason that I like it - he says that comparing Apixaban to aspirin (placebo, as he called it) is tantamount to a "chump shot". But I think the comparison is justified. There are numerous patients who defer anticoagulation with coumarins because of their pesky monitoring requirements, and this trial assures any such patient that Apixaban is superior to aspirin, beyond any reasonable doubt (P=0.000002). (Incidentally, I applaud the authors for mentioning, in the second sentence of the discussion, that the early termination of the trial might have inflated the results - something that I'm less concerned with than usual because of the highly statistically significant results which all go basically in the same direction.) Indeed, it was recently suggested that "me-too" agents or those tested in a non-inferiority fashion should be tested in the population in which they are purported to have an advantage (see: http://jama.ama-assn.org/content/305/7/711.short?rss=1 ). The AVERROES trial does just that.

Had Apixiban been compared to warfarin in a non-inferiority trial (this trial is called Aristotle and is ongoing) without the AVERROES trial, some[busy]body would have come along and said that non-inferiority to warfarin does not demonstrate superiority over aspirin +/- clopidogrel, nor does it demonstrate safety compared to aspirin, etc. So I respectfully disagree with my friend who thinks it's a useless chump shot - I think the proof of efficacy from AVERROES is reassuring and welcome, especially for patients who wish to avoid coumarins.

Moreover, these data bolster the biological plausability of efficacy of oral factor Xa antagonists, and will increase my confidence in the results of any non-inferiority trial, e.g., ARISTOTLE. And that train of thought makes me muse as to whether the prior probability of some outcome from a non-inferiority trial might not depend on the strength of evidence for efficacy of each agent prior to the trial (self-evident, I know, but allow me to finish). That is, if you have an agent that has been used for decades compared to one that is a newcomer, is there really equipoise, and should there be equipoise about the result? Shouldn't the prior be higher that the old dog will win the fight or at least not be beat? These musings might have greater gravity in light of the high rate of recall from the market of new agents with less empirical evidence of safety buttressing them.

One concerning finding, especially in light of the early termination, is illustrated in Figure 1B. The time-to event curves for major bleeding were just beginning to separate between 9 and 14 months, and then, inexplicably, there were no more bleeding events documented after about 14 months. It almost looks as if monitoring for severe bleeding stopped at 14 months. I'm not sure I fully understand this, but one can't help but surmise that, if this agent truly is a suitable replacement for warfarin, it will come with a cost - and that cost will be bleeding. I wager that had the trial continued there would have been a statistically significant increase in major bleeding with apixaban.

It is one interesting that not a single mention is made by name in the article to a competitor oral factor Xa inhibitor, indeed one that is going to make it to market sooner than Apixaban albeit for a different indication: Rivaroxaban. Commercial strategizing is also seen foreshadowed in the last paragraph of the article before the summary: the basic tenets of a cost benefit analysis are laid bare before us with one exception: cost of Apixaban. Surely the sponsor will feel justified in usurping all but just a pittance of the cost savings from obviated INR monitoring and hospitalizations when they price this agent. Only time will tell.

Thursday, April 7, 2011

Conjugated Equine Estrogen (CEE) reduces breast cancer AFTER the trial is completed?

I awoke this morning to a press release from the AMA, and a front page NYT article declaring that, in a post-trial follow-up of the WHI study, CEE reduces breast cancer in the entire cohort of post-hysterectomy patients, and lowers CHD (coronary heart disease) risk in the youngest age stratum studied.

Here's a couple of links: http://well.blogs.nytimes.com/2011/04/05/estrogen-lowers-risk-of-heart-attack-and-breast-cancer-in-some/?src=me&ref=general

http://jama.ama-assn.org/content/305/13/1305.short

Now why would that be?

One need look no further than the data in figures 2 and 5 to see that it's a Type I statistical error (a signigicant result is found by chance when the null hypothesis is true and there is in reality no effect) - that's why.

For the love of Jehovah, did this really make the headlines? The P-value for the breast cancer risk is....well, they don't give a P-value, but the upper bound of the 95% CI is 0.95 so the P-value is about 0.04, BARELY significant. Seriously? This is one of FIFTEEN (15) comparisons in Table 2 alone. Corrected for multiple comparisons, this is NOT a statistically significant effect, NOT EVEN CLOSE. I think I'm having PTSD from flashbacks to the NETT trial.

And table 5? There are TEN outcomes with THREE age strata for each outcome, so, what, 30 comparisons? And look at the width of the 95% CI for the youngest age stratum in the CHD outcome - wide. So there weren't a lot of patients in that group.
And nevermind the lack of an a priori hypothesis, or a legitimate reason to think some difference based on age strata might make biological sense.

Bad old habits die hard. Like a former colleague is fond of pointing out, don't assume that because an investigator does not have drug company ties that s/he is not biased. Government funding and entire careers are at stake if an idea that you've been pursuing for years yields to the truth and dies off. Gotta keep stoking the coals of incinerated ideas long enough to get tenure!

Sunday, April 3, 2011

If at first you don't succeed, try, try again: Anacetrapib picks up where Torcetrapib left off

I previously blogged on Torcetrapib because of my interest in causality and in a similar vein, the cholesterol hypothesis. And I was surprised and delighted when the ILLUMINATE trial showed that Torcetrapib, in spite of doubling HDL, failed miserably. Surprised because like so many others I couldn't really believe that if you double HDL that on balance wonderful things wouldn't happen; and delighted because of the possible insights this might give into the cholesterol hypothesis and causality. (See this post: http://medicalevidence.blogspot.com/2007/11/torcetrapib-torpedoed-sunk-by-surrogate.html )

I must have been too busy skiing when this article on Anacetrapib came out last year: http://www.nejm.org/doi/full/10.1056/NEJMoa1009744 . You may recall that after Torcetrapib was torpedoed, the race was on for apologists to find reasons it didn't work besides the obvious one, that it doesn't work. It raised blood pressure and does things to aldosterone synthesis, etc. Which I find preposterous. Here is an agent with profound effects on HDL and mild effects on other parameters (at least the parameters we can measure) and I am supposed to believe that the minor effects outweigh the major ones when it comes time to measure final outcomes? Heparin also affects aldosterone synthesis, but to my knowledge, when used appropriately to treat patients with clots, its major effects prevail over its minor effects and thus it doesn't kill people.


This is no matter to the true believers. Anacetrapib doesn't have these pesky minor effects, and it too doubles HDL, so the DEFINE investigators conducted a safety study to see if its lack of effects on aldosterone synthesis and the like might finally allow its robust effects on HDL to shine down favorably on cardiovascular outcomes (or at least not kill people.) The results are favorable, and there is no increase in blood pressure or changes in serum electrolytes, so their discussion focuses on all the reasons that this agent might be that Holy Grail of cholesterol lowering agents after all. All the while they continue to ignore the lack of any positive signal on cardiovascular outcomes at 72 weeks with this HDL-raising miracle agent, and what I think may be a secret player in this saga: CRP levels.

Only time and additional studies will tell, but I'd like to be on the record as saying that given the apocalyptic failure of Torcetrapib, the burden of evidence is great to demonstrate that this class will have positive effects on cardiovascular outcomes. I don't think it will. And the implications for the cholesterol hypothesis will perhaps be the CETP inhibitors' greatest contributions to science and medicine.

Monday, March 28, 2011

Cultural Relativism in Clinical Trials: Composite endpoints are good enough for Cardiology, but not for Pulmonihilism and Critical Care


There are many differences between cardiology and pulmonary and critical care medicine as medical specialties, and some of these differences are seen in how they conduct clinical trials. One thing is for sure: cardiology has advanced in leaps and bounds in terms of new therapies (antiplatelet agents, coated stents, heparinoids, GP2B3A inhibitors, direct thrombin inhibitors, AICDs, etc.) in the last 15 years while critical care has....well, we have low tidal volume ventilation, and that's about it. Why might this be?

One possible explanation was visible last week as the NEJM released the results of the PROTECT study - http://www.nejm.org/doi/full/10.1056/NEJMoa1014475 - of Dalteparin ("Dalty") versus unfractionated heparin (UFH) for the prevention of proximal DVT in critically ill patients. Before we delve into the details of this study, imagine that a cardiologist was going to conduct a study of Dalty vs. UFH for use in acute coronary syndromes (ACS) - how would that study be designed?


Well, if it were an industry sponsored study, we might guess that it would be a non-inferiority study with an ϋber-wide delta to bias the study in favor of the branded agent. (Occasionally we're surprised when a "me-too" drug such as Prasugrel is pitted against Plavix in a superiority contest.....and wins.....but this is the exception rather than the rule.) We would also be wise to guess that in addition to being a very large study with thousands of patients, that the endpoint in the cardiologists' study will be a composite endpoint - something like "death, recurrent MI, or revascularization," etc. How many agents currently used in cardiology would be around if it weren't for the use of composite endpoints?

Not so in critical care medicine. We're purists. We want mortality improvements and mortality improvements only. (Nevermind if you're alive at day 28 or 60 and slated to die in 6 months after a long run in nursing homes with multiple readmissions to the hospital, with a tracheostomy in place, on dialysis, being fed through a tube, not walking or getting out of bed....you're alive! Pulmonologists pat themselves on the back for "saves" such as those.) After all, mortality is the only truly objective outcome measure, there is no ascertainment bias, and no dispute about the value of the endpoint - alive is good, dead is bad, period.

Cardiologists aren't so picky. They'll take what they can get. They wish to advance their field, even if it does mean being complicit with the profit motives of Big Pharma.

So the PROTECT study is surprising in one way (that proximal DVT rather than mortality was the primary endpoint) but wholly unsurprising in others - the primary endpoint was not a composite, and the study was negative overall, the conclusion being that "Among critically ill patients, dalteparin was not superior to unfractionated heparin in decreasing the incidence of proximal deep-vein thrombosis." Yet another critical care therapy to be added to the therapeutic scrap heap?

Not so fast. Even though it was not the primary endpoint, the authors DID measure a composite: any venous thromboembolism (VTE) or death. And the 95% confidence interval for that outcome was 0.79-1.01, just barely missing the threshold for statistical significance with a P-value of 0.07. So, it appears that Dalty may be up to 1% worse than UFH or up to 21% better than UFH. Which drug do YOU want for YOURSELF or your family in the ICU? (We could devise an even better composite endpoint that includes any VTE, death, bleeding, or HITS, +/- others. Without the primary data, I cannot tell what the result would have been, but I'm interested.)

I don't have an answer for the question of why these cultural differences exist in the conduct of clinical trials in medicine. But I am pretty certain that they do indeed exist. The cardiologists appear to recognize that there are things that can happen to you short of death that have significant negative utility for you. The pulmonihilists, in their relentless pursuit of untainted truth, ignore inconvenient realities and eschew pragmatism in the name of purity. The right course probably lies somewhere in between these two extremes. I can only hope that one day soon pulmonary and critical care studies will mirror the reality that alive but bedridden with a feeding tube and a tracheostomy is not the same as being alive and walking, talking, breathing on your own, and eating. Perhaps we should, if only in the name of diversity, include some cardiologists the next time we design a critical care trial.

Wednesday, February 23, 2011

Burning Sugar in the Brain: Cell Phones Join the Fight Against Obesity

News channels are ablaze with spin on an already spun report of the effects of cell phone radiofrequency (RF) signal on glucose metabolism in the human brain (see: http://jama.ama-assn.org/content/305/8/808.short).

I'm not going to say that this study is an example of a waste of taxpayer money and research resources, but WOW, what a waste of taxpayer money and research resources.

Firstly, why would anybody go looking for an effect of RF from cell phones on brain glucose metabolism anyway? Answer: Because we have a PET scan, and that's what a PET scan can do -not because we have any good reason to believe that changes in brain glucose metabolism are meaningful in this context. This is an example of the hammer dictating the floorplan of the house. We are looking at glucose metabolism simply because we can, not because we have any remote inkling of what changes in glucose metabolism may mean.

Secondly, this whole topic is deeply permeated by a bias that assumes that cell phones are in some way harmful. To date, with the exception of distracted driving, which incidentally gets nobody excited until a law is proposed to reduce it and thus improve public safety, there is no credible evidence that cell phone radiation is harmful. It may be. But it may also be BENEFICIAL. Who's to say that the increase in brain glucose utilization isn't causing positive effects in the brain? Maybe it's making you smarter. That's just as likely as that it's causing harm, but far less likely than that there is no effect.


Thirdly, the experiment is inadequately controlled. What if you strap a cell phone to somebody's hind end and put them in a PET scanner? Does glucose metabolism of the gluteus maximus increase from the RF signal? If it did, we would have the same problem of interpretation: "What does it mean?" But we might somewhat be able to quell all the hand waving about radio signals altering the function of your brain. No, it's simply altering the biochemistry of your cells in a subtle way of unknown significance.

Here is what news organizations are saying, as proudly promulgated by the publicity intoxicated AMA this morning in their member communication:
"We need to rule out that there is a not long-lasting effect in healthy people." - Nora Volkow, first author of the study. [Of course she thinks that - it means millions more dollars in grants for her.]

The study, "by providing solid evidence that cellphone use has measurable effects on brain activity...suggests that the nation's passionate attachment to its 300 million cellphones may be altering the way we think and behave in subtle ways." - The LA Times. [Really? Really?]

Fortunately, the only thing more powerful than the inherent biases about RF signals from cellphones and the lay public's ignorance about PET scanners, glucose metabolism and the like, is the public's penchant for mobile devices, the latter which will surely overwhelm any concerns about altered sugar burning in the brain, just as it has any concerns about distracted driving.

Monday, January 17, 2011

Like Two Peas in a Pod: Cis-atracurium for ARDS and the Existence of Extra Sensory Perception (ESP)

Even the lay public and popular press (see: http://www.nytimes.com/2011/01/11/science/11esp.html?_r=1&scp=1&sq=ESP&st=cse) caught on to the subversive battle between frequentist and Bayesian statistics when it was announced (ahead of print) that a prominent psychologist was to publish a report purporting to establish the presence of Extra Sensory Perception (ESP) in the Journal of Personal and Social Psychology (I don't think it's even published yet, but here's the link to the journal: http://www.apa.org/pubs/journals/psp). So we're back to my Orange Juice (OJ) analogy - if I published the results of a study showing that the enteral administration of OJ reduced severe sepsis mortality by a [marginally] statistically significant 20%, would you believe it? As Carl Sagan was fond of saying, "extraordinary claims require extraordinary evidence" - which to me means, among other things, an unbelievably small P-value produced by a study with scant evidence of bias.

And I remain utterly incredulous that the administration of a paralytic agent for 48 hours in ARDS (see Papazian et al: http://www.nejm.org/doi/full/10.1056/NEJMoa1005372#t=abstrac) is capable of reducing mortality. Indeed, FEW THERAPIES IN CRITICAL CARE MEDICINE REDUCE MORTALITY (see Figure 1 in our article on Delta Inflation: http://www.ncbi.nlm.nih.gov/pubmed/20429873). So what was the P-value of the Cox regression (read: ADJUSTED) analysis in the Papazian article? It was 0.04. This is hardly the kind of P-value that Car Sagan would have accepted as Extraordinary Evidence.

The correspondence regarding this article in the December 23rd NEJM (see: http://www.nejm.org/doi/full/10.1056/NEJMc1011677) got me to thinking again about this article. It emphasized the striking sedation practices used in this trial: patients were sedated to a Ramsay score of 6 (no response to glabellar tap) prior to randomization - the highest score on the Ramsay scale. Then they received Cis-at or placebo. Thus the Cis-at group could not, for 48 hours, "fight the vent," while the placebo group could, thereby inducing practitioners to administer more sedation. Could it be that Cis-at simply saves you from oversedation, much as intensive insulin therapy (IIT) a la 2001 Leuven protocol saved you from the deleterious effects of massive dextrose infusion after cardiac surgery?

To explore this possibility further, one needs to refer to Table 9 in the supplementary appendix of the Papazian article (see: http://www.nejm.org/doi/suppl/10.1056/NEJMoa1005372/suppl_file/nejmoa1005372_appendix.pdf ) which tabluates the total sedative doses used in the Cis-at and placebo groups DURING THE FIRST SEVEN (7) DAYS OF THE STUDY. Now, why 7 days was chosen, when the KM curves separate at 14 days (as my former colleagues O'Brien and Prescott pointed out here: http://f1000.com/5240957 ), when the study reported data on other outcomes at 28 and 90 days, remains a mystery to me. I have e-mailed the corresponding author to see if he can/will provide data on sedative doses further out. I will post any updates as further data become available. Suffice it to say, that I'm not going to be satisfied unless sedative doses further out are equivalent.

Scrutiny of Table 9 in the SA leads to some other interesting discoveries, such as the massive doses of ketamine used in this study - a practice that does not exist in the United States, as well as strong trends toward increased midazolam use in the placebo group. And if you believe Wes Ely's and others' data on benzodiazepine use, and its association with delirium and mortality, one of your eyebrows might involuntarily rise. Especially when you consider that the TOTAL sedative dose administered between groups is an elusive sum, because equivalent doses of all the various sedatives are unknown and the total sedative dose calculation is insoluble.

Saturday, September 25, 2010

In the same vein: Intercessory Prayer for Heart Surgery and Neuromuscular Blockers for ARDS

Several years back, in the American Heart Journal, was published a now-widely referenced study of intercessory prayer to aid recovery of patients who had had open heart surgery (see: Am Heart J. 2006 Apr;151(4):934-42). This study was amusing for several reasons, not least of which because, in spite of being funded by a religious organization, the results were "negative" meaning that there was no apparent positive effect of prayer. Of course, the "true believers" called foul, claiming that the design was flawed, etc. (Another ironic twist of the study: patients who knew they were being prayed for actually fared worse than those who had received no prayers.)

The most remarkable thing about this study for me is that it was scientifically irresponsible to conduct it. Science (and biomedical research) must be guided by testing a defensible hypothesis, based on logic, historical and preliminary data, and, in the case of biomedical research, an understanding of the underlying pathophysiology of the disease process under study. Where there is no scientifically valid reason to believe that a therapy might work, no preliminary data - nothing - a hypothesis based on hope or faith has no defensible justification in biomedical research, and its study is arguably unethical.

Moreover, a clinical trial is in essence a diagnostic test of a hypothesis, and the posterior probability of a hypothesis (null or alternative) depends not only on the frequentist data produced by the trial, but also on a Bayesian analysis incorporating the prior probability that the alternative (or null) hypothesis is true (or false). That is, if I conducted a trial of orange juice (OJ) for the treatment of sepsis (another unethical design) and OJ appeared to reduce sepsis mortality by, say, 10% with P=0.03, you should be suspicious. With no biologically plausible reason to believe that OJ might be efficacious, the prior probability of Ha (that OJ is effective) is very low, and a P-value of 0.03 (or even 0.001) is unconvincing. That is, the less compelling the general idea supporting the hypothesis is, the more robust a P-value you should require to be convinced by the data from the trial.

Thus, a trial wherein the alternative hypothesis tested has a negligible probability of being true is uninformative and therefore unethical to conduct. In a trial such as the intercessory prayer trial, there is NO resultant P-value which is sufficient to convince us that the therapy is effective - in effect, all statistically significant results represent Type I errors, and the trial is useless.
(I should take a moment here to state that, ideally, the probability of Ho and Ha should both be around 50%, or not far off, representing true equipoise about the scenario being studied. Based on our data in the Delta Inflation article (see: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2887200/ ), it appears that at least in critical care trials evaluating comparative mortality, the prior probability of Ha is on the order of 18%, and even that figure is probably inflated because many of the trials that comprise it represent Type I errors. In any case, it is useful to consider the prior probability of Ha before considering the data from a trial, because that prior is informative. [And in the case of trials for biologics for the treatment of sepsis {be it OJ or drotrecogin, or anti-TNF-alpha}, the prior probability that any of them is efficacious is almost negligibly low.)

Which segues me to Neuromuscular Blockers (NMBs) for ARDS (see: http://www.nejm.org/doi/full/10.1056/NEJMoa1005372 ) - while I have several problems with this article, my most grievous concern is that we have no (in my estimation) substantive reason to believe that NMBs will improve mortality in ARDS. They may improve oxygenation, but we long ago abandoned the notion that oxygenation is a valid surrogate end-point in the management of ARDS. Indeed, the widespread abandonment of use of NMBs in ARDS reflects consensus agreement among practitioners that NMBs are on balance harmful. (Note in Figure 1 that, in contrast to the contention of the authors in the introduction that NMBs remain widely used, only 4.3% of patients were excluded because of use of NMBs at baseline.)

In short, these data fail to convince me that I should be using NMBs in ARDS. But many readers will want to know "then why was the study positive?" And I think the answer is staring us right in the face. In addition to the possibility of a simple Type I error, and the fact that the analysis was done with a Cox regression, controlling for baseline imbalances (even ones such as PF ratio which were NOT prospectively defined as variables to control for in the analysis), the study was effectively unblinded/unmasked. It is simply not possible to mask the use of NMBs, the clinicians and RNs will quickly figure out who is and is not paralyzed - paralyzed patients will "ride the vent" while unparalyzed ones will "fight the vent". And differences in care may/will arise.

It is the simplest explanation, and I wager it's correct. I will welcome data from other trials if they become available (should it even be studied further?), but in the meantime I don't think we should be giving NMBs to patients with ARDS any more than we should be praying (or avoiding prayer) for the recovery of open-heart patients.

Friday, August 20, 2010

Heads I Win, Tails it's a Draw: Rituximab, Cyclophosphamide, and Revising CONSORT



The recent article by Stone et al in the NEJM (see: http://www.nejm.org/doi/full/10.1056/NEJMoa0909905 ), which appears to [mostly] conform to the CONSORT recommendations for the conduct and reporting of NIFTs (non-inferiority trials, often abbreviated NIFs, but I think NIFTs ["Nifties"] sounds cooler), allowed me to realize that I fundamentally disagree with the CONSORT statement on NIFTs (see JAMA, http://jama.ama-assn.org/cgi/content/abstract/295/10/1152 ) and indeed the entire concept of NIFTs. I have discussed previously in this blog my disapproval of the asymmetry with which NIFTs are designed such that they favor the new (and often proprietary agent), but I will use this current article to illustrate why I think NIFTs should be done away with altogether and supplanted by equivalence trials.

This study rouses my usual and tired gripes about NIFTs: too large a delta, no justification for delta, use of intention-to-treat rather than per-protocol analysis, etc. It also describes a suspicious statistical maneuver which I suspect is intended to infuse the results (in favor of Rituximab/Rituxan) with extra legitimacy in the minds of the uninitiated: instead of simply stating (or showing with a plot) that the 95% CI excludes delta, thus making Rituxan non-inferior, the authors tested the hypothesis that the lower 95.1% CI boundary is different from delta, which test results in a very small P-value (<0.001). This procedure adds nothing to the confidence interval in terms of interpretation of the results, but seems to imbue them with an unassailable legitimacy - the non-inferiority hypothesis is trotted around as if iron-clad by this miniscule P-value, which is really just superfluous and gratuitious.

But I digress - time to focus on the figure. Under the current standards for conducting a NIFT, in order to be non-inferior, you simply need a 95% CI for the preferred [and usually proprietary] agent with an upper boundary which does not include delta in favor of the comparator (scenario A in the figure). For your preferred agent to be declared inferior, the LOWER 95% CI for the difference between the two agents must exclude the delta in favor of the comparator (scenario B in the figure.) For that to ever happen, the preferred/proprietary agent is going to have to be WAY worse than standard treatment. It is no wonder that such results are very, very rare, especially since deltas are generally much larger than is reasonable. I am not aware of any recent trial in a major medical journal where inferiority was declared. The figure shows you why this is the case.

Inferiority is very difficult to declare (the deck is stacked this way on purpose), but superiority is relatively easy to declare, because for superiority your 95% CI doesn't have to exclude an obese delta, but rather must just exclude zero with a point estimate in favor of the preferred therapy. That is, you don't need a mirror image of the 95% CI that you need for inferiority (scenario C in the figure), you simply need a point estimate in favor of the preferred agent with a 95% CI that does not include zero (scenario D in the figure). Looking at the actual results (bottom left in the figure), we see that they are very close to scenario D and that they would have only had to go a little bit more in favor of rituxan for superiority to have been able to be declared. Under my proposal for symmetry (and fairness, justice, and logic), the results would have had to be similar to scenario C, and Rituxan came nowhere near to meeting criteria for superiority.

The reason it makes absolutely no sense to allow this asymmetry can be demonstrated by imagining a counterfactual (or two) - supposing that the results had been exactly the same, but they had favored Cytoxan (cyclophosphamide) rather than Rituxan, that is, Cytoxan was associated with a 11% improvement in the primary endpoint. This is represented by scenario E in the figure; and since the 95% CI includes delta, the result is "inconclusive" according to CONSORT. So how can it be that the classification of the result changes depending on what we arbitrarily (a priori, before knowing the results) declare to be the preferred agent? That makes no sense, unless you're more interested in declaring victory for a preferred agent than you are in discovering the truth, and of course, you can guess my inferences about the motives of the investigators and sponsors in many/most of these studies. In another counterfactual example, scenario F in the figure represents the mirror image of scenario D, which represented the minimum result that would have allowed Stone et al to declare that Rituxan was superior. But if the results had favored Cytoxan by that much, we would have had another "inconclusive" result, according to CONSORT. Allowing this is just mind-boggling, maddening, and unjustifiable!

Given this "heads I win, tails it's a draw", it's no wonder that NIFTs are proliferating. It's time we stop accepting them, and require that non-inferiority hypotheses be symmetrical - in essence, making equivalence trials the standard operating procedure, and requiring the same standards for superiority as we require for inferiority.