Thursday, May 24, 2018

You Have No Idea of the Predictive Value of Weaning Parameters for Extubation Success, and You Probably Never Will

As Dr. O'brien eloquently described in this post, many people misunderstand the Yang-Tobin (f/Vt) index as being a "weaning parameter" that is predictive of extubation success.  Far from that, it's sensitivity and specificity and resultant ROC curve relate to the ability of f/Vt after one minute of spontaneous ventilation to predict the success of a prolonged (~ one hour) spontaneous breathing trial.  But why would I want to predict the result of a test (the SBT), and introduce error, when I can just do the test and get the result an hour later?  It makes absolutely no sense.  What we want is a parameter that predicts extubation success.  But we don't have that, and we probably will never have that.

In order to determine the sensitivity and specificity of a test for extubation success, we will need to ascertain the outcome in all patients regardless of their performance on the test of interest.  That means we would have to extubate patients that failed the weaning parameter test.  In the original Yang & Tobin article, their cohort consisted of 100 patients.  60(%) of the 100 were said to have passed the weaning test and were extubated, and 40(%) failed and were not extubated.  (There is some over-simplification here based on how Yang & Tobin classified and reported events - its not at all transparent in their article - the data to resolve the issues are not reported and the differences are likely to be small.  Suffice it to say that about 60% of their patients were successfully weaned and the remainder were not.)  Let's try to construct a 2x2 table to determine the sensitivity and specificity of a weaning parameter using a population like theirs.  The top row of the 2x2 table would look something like this, assuming an 85% extubation success rate - that is, of the 60 patients with a positive or "passing" SBT score (based on whatever parameter), all were extubated and the positive predictive value of the test is 85% (the actual rate of reintubation in patients with a passing weaning test is not reported, so this is a guess):



Thursday, May 17, 2018

Increasing Disparities in Infant Mortality? How a Narrative Can Hinge on the Choice of Absolute and Relative Change

An April, 11th, 2018 article in the NYT entitled "Why America's Black Mothers and Babies are in a Life-or-Death Crisis" makes the following alarming summary statement about racial disparities in infant mortality in America:
Black infants in America are now more than twice as likely to die as white infants — 11.3 per 1,000 black babies, compared with 4.9 per 1,000 white babies, according to the most recent government data — a racial disparity that is actually wider than in 1850, 15 years before the end of slavery, when most black women were considered chattel.
Racial disparities in infant mortality have increased since 15 years before the end of the Civil War?  That would be alarming indeed.  But a few paragraphs before, we are given these statistics:

In 1850, when the death of a baby was simply a fact of life, and babies died so often that parents avoided naming their children before their first birthdays, the United States began keeping records of infant mortality by race. That year, the reported black infant-mortality rate was 340 per 1,000; the white rate was 217 per 1,000.
The white infant mortality rate has fallen 217-4.9 = 212.1 infants per 1000.  The black infant mortality rate has fallen 340-11.3 = 328.7 infants per 1000.  So in absolute terms, the terms that concern babies (how many of us are alive?), the black infant mortality rate has fallen much more than the white infant mortality rate.  In fact, in absolute terms, the disparity is almost gone:  in 1850, the absolute difference was 340-217 = 123 more black infants per 1000 births dying to 11.3-4.9 = 6.4 more black infants per 1000 births dying.

Analyzed a slightly different way, the proportion of white infants dying has been reduced by (217-4.9/217) 97.7%, and the proportion of black infants dying has been reduced by (340-11.3/340)= 96.7%.  So, within 1%, black and white babies shared almost equally in the improvements in infant mortality that have been seen since 15 years before the end of the Civil War.  Or, we could do a simple reference frame change and look at infant survival rather than mortality.  If we did that, the current infant survival rate is 98.87% for black babies and .9951% for white babies.  The rate ratio for black:white survival is .994 - almost parity depending on your sensitivity to variances from unity.

It's easy to see how the author of the article arrived at different conclusions by looking only at the rate ratios in 1850 and contemporaneously.  But doing the math that way makes it seem as if a black baby is worse off today than in 1850!  Nothing could be farther from the truth.

You might say that this is just "fuzzy math" as our erstwhile president did in the debates of 2000.  But there could be important policy implications also.  Suppose that I have an intervention that I could apply across the US population and I estimate that it will save an additional 5 black babies per 1000 and an additional 3 white babies per 1000.  We implement this policy and it works as projected.  The black infant mortality rate is 6.3/1000 and the white infant mortality rate is 1.9/1000.  We have saved far more black babies than white babies.  But the rate ratio for black:white mortality has increased from 2.3 to 3.3!  Black babies are now 3 (three!) times as likely to die as white babies!  The policy has increased disparities even though black babies are far better off after the policy change than before it.

It reminds me of the bias where people would rather take a smaller raise if it increased their standing relative to their neighbor.  Surprisingly, when presented with two choices:

  1. you make $50,000 and your peers make $25,000 per year
  2. You make $100,000 and your peers make $250,000 per year
many people choose 1, as if relative social standing is worth $50,000 per year in income.  (Note that relative social standing is just that, relative, and could change if you arbitrarily change the reference class.)

So, relative social standing has value and perhaps a lot of it.  But as regards the hypothetical policy change above, I'm not sure we should be focusing on relative changes in infant mortality.  We just want as few babies dying as possible.

Wednesday, May 2, 2018

Hollow Hegemony: The Opportunity Costs of Overemphasizing Sepsis


Protocols are to make complex tasks simple, not simple tasks complex. - Scott K Aberegg

Yet here we find ourselves some 16 years after the inauguration of the Surviving Sepsis Campaign, and their influence continues to metastasize, even after the message has been hollowed out like a piece of fallen, old-growth timber.

Surviving sepsis was the brainchild of Eli Lilly, who, in the year after the ill-fated FDA approval of drotrecogin-alfa, worried that the drug would not sell well if clinicians did not have an increased awareness of sepsis. That aside, in those days, there were legitimate questions surrounding the adoption and implementation of several new therapies such as EGDT, corticosteroids for septic shock, Xigris for those with APACHE scores over 25, intensive insulin therapy, etc.

Those questions are mostly answered. Sepsis is now, quite simply, a complex of systemic manifestations of infection almost all of which will resolve with treatment of the infection and general supportive care. The concept of sepsis could vanish entirely, and nothing about the clinical care of the patient would change: an infection would be diagnosed, the cause/source identified and treated, and hemodynamics and laboratory dyscrasias supported meanwhile. There is nothing else to do (because lactic acidosis does not exist.)

But because of the hegemony of the sepsis juggernaut (the spawn of the almighty dollar), we are now threatened with a mandate to treat patients carrying the sepsis label (oftentimes assigned by a hospital coder after the fact) with antibiotics and a fluid bolus within one hour of triage in the ED. Based on what evidence?

Weak recommendation, "Best Practice Statement" and some strong recommendations based on low and moderate quality evidence.  So if we whittle it down to just moderate quality of evidence, what do we have?  Give antibiotics for infections, and give vasopressors if MAP less than 65.  But now we have to hurry up and do the whole kit and caboodle boiler plate style within 60 minutes?

Sepsis need not be treated any differently than a gastrointestinal hemorrhage, or for that matter, any other disease.  You make the diagnosis, determine and control the cause (source), give appropriate treatments, and support the physiology in the meantime, all while prioritizing the sickest patients.  But that counts for all diseases, not just sepsis, and there is only so much time in an hour.  When every little old lady with fever and a UTI suddenly rises atop the priorities of the physician, this creates an opportunity cost/loss for the poor bastard bleeding next door who doesn't have 2 large-bore IVs or a type and cross yet because grandma is being flogged with 2 liters of fluid, and in a hurry.  If only somebody had poured mega-bucks into increased recognition and swift treatment of GI bleeds....


Petition to retire the surviving sepsis campaign guidelines:

(Sign the Petition Here.)

Friends,

Concern regarding the Surviving Sepsis Campaign (SSC) guidelines dates back to their inception.  Guideline development was sponsored by Eli Lilly and Edwards Life Sciences as part of a commercial marketing campaign (1).  Throughout its history, the SSC has a track record of conflicts of interest, making strong recommendations based on weak evidence, and being poorly responsive to new evidence (2-6).

The original backbone of the guidelines was a single-center trial by Rivers defining a protocol for early goal-directed therapy (7).  Even after key elements of the Rivers protocol were disproven, the SSC continued to recommend them.  For example, SSC continued to recommend the use of central venous pressure and mixed venous oxygen saturation after the emergence of evidence that they were nonbeneficial (including the PROCESS and ARISE trials).  These interventions eventually fell out of favor, despite the slow response of SSC that delayed knowledge translation. 

SSC has been sponsored by Eli Lilly, manufacturer of Activated Protein C.  The guidelines continued recommending Activated Protein C until it was pulled from international markets in 2011.  For example, the 2008 Guidelines recommended this, despite ongoing controversy and the emergence of neutral trials at that time (8,9).  Notably, 11 of 24 guideline authors had financial conflicts of interest with Eli Lilly (10).

The Infectious Disease Society of America (IDSA) refused to endorse the SSC because of a suboptimal rating system and industry sponsorship (1).  The IDSA has enormous experience in treating infection and creating guidelines.  Septic patients deserve a set of guidelines that meet the IDSA standards.


Guidelines should summarize evidence and provide recommendations to clinicians.  Unfortunately, the SSC doesn’t seem to trust clinicians to exercise judgement.  The guidelines infantilize clinicians by prescribing a rigid set of bundles which mandate specific interventions within fixed time frames (example above)(10).  These recommendations are mostly arbitrary and unsupported by evidence (11,12).  Nonetheless, they have been adopted by the Centers for Medicare & Medicaid Services as a core measure (SEP-1).  This pressures physicians to administer treatments despite their best medical judgment (e.g. fluid bolus for a patient with clinically obvious volume overload).

We have attempted to discuss these issues with the SSC in a variety of forums, ranging from personal communications to formal publications (13-15).  We have tried to illuminate deficiencies in the SSC bundles and the consequent SEP-1 core measures.  Our arguments have fallen on deaf ears. 

We have waited patiently for years in hopes that the guidelines would improve, but they have not.  The 2018 SSC update is actually worse than prior guidelines, requiring the initiation of antibiotics and 30 cc/kg fluid bolus within merely sixty minutes of emergency department triage (16).  These recommendations are arbitrary and dangerous.  They will likely cause hasty management decisions, inappropriate fluid administration, and indiscriminate use of broad-spectrum antibiotics.  We have been down this path before with other guidelines that required antibiotics for pneumonia within four hours, a recommendation that harmed patients and was eventually withdrawn (17).

It is increasingly clear that the SSC guidelines are an impediment to providing the best possible care to our septic patients.  The rigid framework mandated by SSC doesn’t help experienced clinicians provide tailored therapy to their patients.  Furthermore, the hegemony of these guidelines prevents other societies from developing better guidelines.

We are therefore petitioning for the retirement of the SSC guidelines.  In its place, we would call for the development of separate sepsis guidelines by the United States, Europe, ANZICS, and likely other locales as well.  There has been a monopoly on sepsis guidelines for too long, leading to stagnation and dogmatism.  We would hope that these new guidelines are written by collaborations of the appropriate professional societies, based on the highest evidentiary standards.  The existence of several competing sepsis guidelines could promote a diversity of opinions, regional adaptation, and flexible thinking about different approaches to sepsis. 

We are disseminating an international petition that will allow clinicians to express their displeasure and concern over these guidelines.  If you believe that our septic patients deserve more evidence-based guidelines, please stand with us.  

Sincerely,

Scott Aberegg MD MPH
Jennifer Beck-Esmay MD
Steven Carroll DO MEd
Joshua Farkas MD
Jon-Emile Kenny MD
Alex Koyfman MD
Michelle Lin MD
Brit Long MD
Manu Malbrain MD PhD
Paul Marik MD
Ken Milne MD
Justin Morgenstern MD
Segun Olusanya MD
Salim Rezaie MD
Philippe Rola MD
Manpreet Singh MD
Rory Speigel MD
Reuben Strayer MD
Anand Swaminathan MD
Adam Thomas MD
Lauren Westafer DO MPH
Scott Weingart MD

References
  1. Eichacker PQ, Natanson C, Danner RL.  Surviving Sepsis – Practice guidelines, marketing campaigns, and Eli Lilly.  New England Journal of Medicine  2006; 16: 1640-1642.
  2. Pepper DJ, Jaswal D, Sun J, Welsch J, Natanson C, Eichacker PQ.  Evidence underpinning the Centers for Medicare & Medicaid Services’ Severe Sepsis and Septic Shock Management Bundle (SEP-1): A systematic review.  Annals of Internal Medicine 2018; 168:  558-568. 
  3. Finfer S.  The Surviving Sepsis Campaign:  Robust evaluation and high-quality primary research is still needed.  Intensive Care Medicine  2010; 36:  187-189.
  4. Salluh JIF, Bozza PT, Bozza FA.  Surviving sepsis campaign:  A critical reappraisal.  Shock 2008; 30: 70-72. 
  5. Eichacker PQ, Natanson C, Danner RL.  Separating practice guidelines from pharmaceutical marketing.  Critical Care Medicine 2007; 35:  2877-2878. 
  6. Hicks P, Cooper DJ, Webb S, Myburgh J, Sppelt I, Peake S, Joyce C, Stephens D, Turner A, French C, Hart G, Jenkins I, Burrell A.  The Surviving Sepsis Campaign:  International guidelines for management of severe sepsis and septic shock: 2008.  An assessment by the Australian and New Zealand Intensive Care Society.  Anaesthesia and Intensive Care 2008; 36: 149-151.
  7. Rivers ME et al.  Early goal-directed therapy in the treatment of severe sepsis and septic shock.  New England Journal of Medicine 2001; 345: 1368-1377.
  8. Wenzel RP, Edmond MB.  Septic shock – Evaluating another failed treatment.  New England Journal of Medicine 2012; 366:  2122-2124.  
  9. Savel RH, Munro CL.  Evidence-based backlash:  The tale of drotrecogin alfa.  American Journal of Critical Care  2012; 21: 81-83. 
  10. Dellinger RP, Levy MM, Carlet JM et al.  Surviving sepsis campaign:  International guidelines for management of severe sepsis and septic shock:  2008.  Intensive Care Medicine 2008; 34:  17-60. 
  11. Allison MG, Schenkel SM.  SEP-1:  A sepsis measure in need of resuscitation?  Annals of Emergency Medicine 2018; 71: 18-20.
  12. Barochia AV, Xizhong C, Eichacker PQ.  The Surviving Sepsis Campaign’s revised sepsis bundles.  Current Infectious Disease Reports 2013; 15:  385-393. 
  13. Marik PE, Malbrain MLNG.  The SEP-1 quality mandate may be harmful: How to drown a patient with 30 ml per kg fluid!  Anesthesiology and Intensive Therapy 2017; 49(5) 323-328.
  14. Faust JS, Weingart SD.  The past, present, and future of the centers for Medicare and Medicaid Services quality measure SEP-1:  The early management bundle for severe sepsis/septic shock.  Emergency Medicine Clinics of North America 2017; 35:  219-231.
  15. Marik PE.  Surviving sepsis:  going beyond the guidelines.  Annals of Intensive Care 2011; 1: 17.
  16. Levy MM, Evans LE, Rhodes A.  The surviving sepsis campaign bundle:  2018 update.  Intensive Care Medicine.  Electronic publication ahead of print, PMID 29675566.
  17. Kanwar M, Brar N, Khatib R, Fakih MG.  Misdiagnosis of community-acquired pneumonia and inappropriate utilization of antibiotics: side effects of the 4-h antibiotic administration rule.  Chest 2007; 131: 1865-1869.

Sunday, April 22, 2018

The Respiratory Rosenhan Experiment on Obese Patients

Normally this post would be on Status Iatrogenicus, but the implications for "evidence" and the EBM movement are too important, so it goes here.

For those not inclined to read about the Rosenhan Experiment, a brief history.  In the early 1970s, Dr. Rosenhan a Stanford psychologist, was concerned about the validity of psychiatric diagnoses.  So he and a half dozen confederates faked mental illness and presented themselves to several prominent psychiatric facilities.  Their feigned symptoms were sufficient to have them admitted with psychiatric diagnoses, usually paranoid schizophrenia.  After admission, they behaved normally.  Nonetheless, they had been diagnosed, put on psychotropic medications, and not allowed to leave until they signed documents swearing to continue the medications upon their release.  The damning report of this experiment was published in Science Magazine in 1973 with the title "On Being Sane in Insane Places."

Psychiatric hospitals were blindsided, having been caught unawares and humiliated.  They challenged Rosenhan to "do it again", but the second time they would be vigilant about these disimulations.  Send us some more fakes, they said.  On the second experiment, they determined that 40 some patients were confederates.  But Rosenhan had the upper RosenHand - he had send no confederates.  Sensitivity, if it can be called that, suffered to specificity on round two.

Previously I have complained about patients who are intubated but needn't have been (see here, and here.)  Oftentimes, after the fact, it is difficult to determine whether the intubation was necessary, especially with alleged upper airway compromise, and with obese patients.  With the latter, roentgenograms of the chest are difficult to interpret because of "fatelectasis" or atelectasis in obese persons,  "flatelectasis" due to recumbency, fluid loading after paralysis and intubation, all which may require high PEEP to counteract.  If you were not present prior to intubation, it is very difficult to determine if respiratory distress preceded the intubation, or if "won't breathe" was mistaken for "can't breathe".  The differentiation between those two entities is "critical".

A man in his late 40s was sent to us recently for "acute respiratory failure" on the ventilator receiving 100% FiO2 and PEEP of 16.  EMS responded to a call to his 18-wheeler that he could not breathe.  His SpO2 was in the 50% range and he was admitted to a local hospital.  There were basilar opacities, and oxygen was administered.   He weighed near 500#.  The opacities were said to represent pneumonia and he was given antibiotics.  Not long after admission, in the middle of the night, he could not be aroused, an ABG was obtained, and his PaCO2 was 90-something with a pH of 7.10 or thereabouts.  This was interpreted to represent acute hypercapneic respiratory failure, on top of his "acute hypoxemic respiratory failure" and an hour-long intubation ensued.  Afterwards he was sent to us on the aforementioned high ventilator settings.

Thursday, February 15, 2018

Ruling Out PE in the ED: Critical Analysis of the PROPER Trial

This post is going to be an in-depth "journal club" style analysis of the PROPER trial.

In this week's JAMA, Freund et al report the results of the PROPER randomized controlled trial of the PERC (pulmonary embolism rule -out criteria) rule for safely excluding pulmonary embolism (PE) in the emergency department (ED) among patients with a "low clinical gestalt" of having PE.  All things pulmonary and all things noninferiority being pet topics of mine, I had to delve deeper into this article because frankly the abstract confused me.

This was a cluster randomized noninferiority trial, but for most purposes, the cluster part can be ignored when looking at the data.  Each of 14 EDs in France was randomized such that during the "PERC period" PE was excluded in patients with a "low gestalt clinical probability" (not yet defined in the abstract) if all of the 8 items of the PERC rule were excluded.  In the "control period" usual procedures for exclusion of PE were followed.  The primary end point was occurrence of a [venous] thromboembolic event (VTE) during 3 months of follow-up.  The delta (pre-specified margin of noninferiority) for the endpoint was 1.5%.  This is a pleasingly low number.  In our meta-research study of 163 noninferiority trials including those in JAMA from 2010-1016, we found that the average delta for those using an absolute risk difference (n=137) was 8.7%, almost 6 times higher!  This is laudable, but was aided by a low estimated event rate in the control group which means that the sample size of ~1900 was feasible given what I assume were relatively low costs of the study.  Kudos to the authors too, for concretely justifying delta in the methods section.

Tuesday, September 26, 2017

DIPSHIS: Diprivan Induced Pseudo-Shock & Hypoxic Illness Syndrome

Here's the occasional cross-post from Status Iatrogenicus, as it has relevance to research as well as medical practice.  Original post here:  DIPSHIS

This would be a very informative case report (and it's true and unexaggerated), but I anticipate staunch editorial resistance (even sans puns), so I'll describe it here and have some fun with it.

Background:  The author has anecdotally observed for many years that so-called "septic shock" follows rather than precedes intubation and sedation.  This raises the possibility that some proportion of what we call septic (or other) shock is iatrogenic and induced by sedative agents rather than progression of the underlying disease process.

Methods:  Use of a case report as a counterfactual to the common presumption that shock occurring after intubation and sedation is consequent to the underlying disease process rather than associated medical interventions.

Results:  A 20-something man was admitted with pharyngitis, multilobar pneumonia (presumed bacterial) and pneumomediastinum (presumed from coughing).  He met criteria for sepsis with RR=40, HR=120, T=39, BP 130/70.  He was treated with antibiotics and supportive care but remained markedly tachypneic with rapid shallow respirations, despite absence of subjective respiratory distress.  A dialectic between a trainee and the attending sought to predict whether he was "tiring out" and/or "going into ARDS", but yielded equipoise/a stalemate.  A decision was made to intubate the patient and re-evaluate the following day.  After intubation, he required high doses of propofol (Diprivan) for severe agitation, and soon had a wide pulse pressure hypotension, which led to administration of several liters of fluids and initiation of a noradrenaline infusion overnight.  He was said to have "gone into shock" and "progressed to ARDS", as his oxygen requirements doubled to 80% from 40% and PEEP had been increased from 8 to 16.  The next morning, out of concern that "shock" and "ARDS" were iatrogenic complications given considerations of temporality to other interventions, sedation and vasopressors were abruptly discontinued, diuresis of 2 liters achieved, and the patient was successfully extubated and discharged from the ICU a day later.

Conclusions:  This case provides anecdotal "proof of concept" for the counterfactual that is often unseen:  Patients "go into shock" and "progress to ARDS" not in spite of treatment, but because of it.  The author terms this syndrome, in the context of Diprivan (propofol) in the ICU setting, "DIPSHIS".  The incidence of DIPSHIS is unknown and many be underestimated because of difficulty in detection fostered by cultural biases in the care of critically ill medical patients.  Anesthesiologists have long recognized DIPSHIS but have not needed to name it, because they do not label as "shock" anesthetic-induced hypotension in the operating theater - they just give some ephedrine until the patient recovers.  DIPSHIS has implications for the epidemiological and therapeutic study of "septic shock" as well as for hospital coding and billing.

Sunday, August 27, 2017

Just Do As I (Vaguely) Say: The Folly of Clinical Practice Guidelines

If you didn't care to know anything about finance, and you hired a financial adviser (paid hourly, not through commissions, of course) you would be happy to have him simply tell you to invest all of your assets into a Vanguard life cycle fund.  But you may then be surprised that a different adviser told one of your contemporaries that the approach was oversimple and that you should have several classes of assets in your portfolio that are not included in the life cycle funds, such as gold or commodities.  In light of the discrepancies, you may conclude that to make the best economic choices for yourself, you need to understand finance and the data upon which the advisers are basing their recommendations.

Making medical decisions optimally is akin to making economic decisions and is founded on a simple framework:  EUT, or Expected Utility Theory.  To determine whether to pursue a course of action versus another one, we add up the benefits of a course multiplied by their probability of accruing (that product is the positive utility of the course of action) and then subtract the product of the costs of the course of action and their probability of accruing (the negative utility).  If utility is positive, we pursue a course of action, and if options are available, we pursue the course with the highest positive utility.  Ideally, anybody helping you navigate such a decision framework would tell you the numbers so you could do the calculus.  Using the finance analogy again, if the adviser told you "Stocks have positive returns.  So do bonds.  Stocks are riskier than bonds" - without any quantification, you may conclude that a portfolio full of bonds is the best course of action - and usually it is not.

I regret to report that that is exactly what clinical practice guideline writers do:  provide summary information without any numerical data to support it, leaving the practitioner with two choices:

  1. Just do as the guideline writer says
  2. Go figure it out for herself with a primary data search

Thursday, April 6, 2017

Why Most True Research Findings Are Useless

In his provocative essay in PLOS Medicine over a decade ago, Ioannidis argued that most published research findings are false, owing to a variety of errors such as p-hacking, data dredging, fraud, selective publication, researcher degrees of freedom, and many more.  In my permutation of his essay, I will go a step further and suggest that even if we limit our scrutiny to tentatively true research findings (scientific truth being inherently tentative), most research findings are useless.

My choice of the word "useless" may seem provocative, and even untenable, but it is intended to have an exquisitely specific meaning:  I mean useless in an economic sense of "having zero or negligible net utility", in the tradition of Expected Utility Theory [EUT], for individual decision making.  This does not mean that true findings are useless for the incremental accrual of scientific knowledge and understanding.  True research findings may be very valuable from the perspective of scientific progress, but still useless for individual decision making, whether it is the individual trying to determine what to eat to promote a long healthy life, or the physician trying to decide what to do for a patient in the ICU with delirium.  When evaluating a research finding that is thought to be true, and may at first blush seem important and useful, it is necessary to make a distinction between scientific utility and decisional utility.  Here I will argue that while many "true" research findings may have scientific utility, they have little decisional utility, and thus are "useless".

Tuesday, April 4, 2017

Tipping the Scales of Noninferiority: Abbott's "Emboshield and Xact Carotid Stent System"

I just stumbled across this and think it's worth musing over it a bit.  The recently published ACT I trial by Rosenfield et al is a noninferiority trial of an already approved device, the "emboshield embolic protection system" used in conjunction with the "Xact carotid stent system" both proprietary devices from Abbott.  I'm scrutinizing this trial (and others) to determine if adequate justification is given for the noninferiority hypothesis around which the trial is designed.  One thing I'm looking for is  evidence that there are clear secondary advantages of the novel or experimental therapy that justify accepting some degree of worse efficacy, compared to the active control, which falls within the prespecified margin of noninferiority.  This is what the authors (or their ghosts) write in the introduction:
"Most carotid revascularization procedures in the United States are carotid endarterectomies performed for the treatment of asymptomatic atherosclerotic disease. Revascularization is also performed by means of stenting with devices to capture and remove emboli (“embolic protection” devices).3,4 In the Carotid Revascularization Endarterectomy versus Stenting Trial (CREST), no significant difference was found between carotid endarterectomy and stenting with embolic protection for the treatment of atherosclerotic carotid bifurcation stenosis with regard to the composite end point of stroke, death, or myocardial infarction.5 CREST included both symptomatic and asymptomatic patients, and it was not sufficiently powered to discern whether the carotid endarterectomy and stenting with embolic protection were equivalent according to symptomatic status. The primary aim of the Asymptomatic Carotid Trial (ACT) I was to compare the outcomes of carotid endarterectomy versus stenting with embolic protection in patients with asymptomatic severe carotid-artery stenosis who were at standard risk for surgical complications."
That's a mouthful, to say the least, and probably ought to be expectorated.

Friday, February 10, 2017

The Normalization Fallacy: Why Much of “Critical Care” May Be Neither

Like many starry-eyed medical students, I was drawn to critical care because of the high stakes, its physiological underpinnings, and the apparent fact that you could take control of that physiology and make it serve your goals for the patient.  On my first MICU rotation in 1997, I was so swept away by critical care that I voluntarily stayed on through the Christmas holiday and signed up for another elective MICU rotation at the end of my 4th year.  On the last night of that first rotation, wistful about leaving, I sauntered through the unit a few times thinking how I would miss the smell of the MICU and the distinctive noise of the Puritan Bennett 7200s delivering their [too high] tidal volumes.  By then I could even tell you whether the patient’s peak pressures were high (they often were) by the sound the 7200 made after the exhalation valve released.  I was hooked, irretrievably. 

I still love thinking about physiology, especially in the context of critical illness, but I find that I have grown circumspect about its manipulation as I have reflected on the developments in our field over the past 20 years.  Most – if not all – of these “developments” show us that we were harming patients with a lot of the things we were doing.  Underlying many now-abandoned therapies was a presumption that our understanding of physiology was sufficient that we could manipulate it to beneficial ends.  This presumption hints at an underlying set of hypotheses that we have which guide our thinking in subtle but profound and pervasive ways.  Several years ago we coined the term the “normalization heuristic” (we should have called it the “normalization fallacy”) to describe our tendency to view abnormal laboratory values and physiological parameters as targets for normalization.  This approach is almost reflexive for many values and parameters but on closer reflection it is based on a pivotal assumption:  that the targets for normalization are causally related to bad outcomes rather than just associations or even adaptations.

Wednesday, January 11, 2017

Don't Get Soaked: The Practical Utility of Predicting Fluid Responsiveness

In this article in the September 27th issue of JAMA, the authors discuss the rationale and evidence for predicting fluid responsiveness in hemodynamically unstable patients.  While this is a popular academic topic, its practical importance is not as clear.  Some things, such as predicting performance on a SBT with a Yang-Tobin f/Vt,  don't make much sense - just do the SBT if that's the result you're really interested in.  The prediction of whether it will rain today is not very important if the difference in what I do is as small as tucking an umbrella into my bag or not.  Neither the inconvenience of getting a little wet walking from the parking garage nor that of carrying the umbrella is very great.  Similarly, a prediction of whether or not it will rain two months from now when I'm planning a trip to Cancun is not very valuable to me because the confidence intervals about the estimate are too wide to rely upon.  Better to just stick with the base rates:  how much rainfall is there in March in the Caribbean on an average year?

Our letter to the editor was not published in JAMA, so I will post it here:

To the Editor:  A couple of issues relating to the article about predicting responsiveness to fluid bolus1 deserve attention.  First, the authors made a mathematical error that may cause confusion among readers attempting to duplicate the Bayesian calculations described in article.  The negative predictive value (NPV) of a test is the proportion of patients with a negative test who do not have the condition – the true negative rate.2  In each of the instances in which NPV is mentioned in the article, the authors mistakenly report the proportion of patients with a negative test who do have the condition.  This value, 1-NPV, is the false negative rate - the posterior probability of the condition in those with a negative test.

Second, in the examples that discuss NPV, the authors use a prior probability of fluid responsiveness of 50%.  A clinician who appropriately uses a threshold approach to decision making3 must determine a probability threshold above which treatment is warranted, considering the net utility of all possible outcomes with and without treatment given that treatment’s risks and benefits4Because the risk of fluid administration in judicious quantities is low5, the threshold for fluid administration is correspondingly low and fluid bolus may be warranted based on prior probability alone, thus obviating additional testing.  Even if additional testing is negative and suggests a posterior probability of fluid responsiveness of only 10% (with an upper 95% confidence limit of 18%), many clinicians would still judge a trial of fluids to be justified because fluids are considered to be largely benign and untreated hypovolemia is not4.  (The upper confidence limit will be higher still if the prior probability was underestimated.)  Finally, the posterior probabilities hinge critically on the estimates of prior probabilities, which are notoriously nebulous and subjective.  Clinicians are likely intuitively aware of these quandaries, which may explain why empiric fluid bolus is favored over passive leg raise testing outside of academic treatises6.


1.            Bentzer P, Griesdale DE, Boyd J, MacLean K, Sirounis D, Ayas NT. WIll this hemodynamically unstable patient respond to a bolus of intravenous fluids? JAMA. 2016;316(12):1298-1309.
2.            Fischer JE, Bachmann LM, Jaeschke R. A readers' guide to the interpretation of diagnostic test properties: clinical example of sepsis. Intensive Care Med. 2003;29(7):1043-1051.
3.            Pauker SG, Kassirer JP. The threshold approach to clinical decision making. N Engl J Med. 1980;302(20):1109-1117.
4.            Tsalatsanis A, Hozo I, Kumar A, Djulbegovic B. Dual Processing Model for Medical Decision-Making: An Extension to Diagnostic Testing. PLoS One. 2015;10(8):e0134800.
5.            Investigators TP. A Randomized Trial of Protocol-Based Care for Early Septic Shock. N Engl J Med. 2014;370(18):1683-1693.
6.            Marik PE, Monnet X, Teboul J-L. Hemodynamic parameters to guide fluid therapy. Annals of Intensive Care. 2011;1:1-1.


Scott K Aberegg, MD, MPH
Andrew M Hersh, MD
The University of Utah School of Medicine
Salt Lake City, Utah


Thursday, January 5, 2017

RCT Autopsy: The Differential Diagnosis of a Negative Trial

At many institutions, Journal Clubs meet to dissect a trial after its results are published to look for flaws, biases, shortcomings, limitations.  Beyond the dissemination of the informational content of the articles that are reviewed, Journal Clubs serve as a reiteration and extension of the limitations part of the article discussion.  Unless they result in a letter to the editor, or a new peer-reviewed article about the limitations of the trial that was discussed, the debates of Journal Club begin a headlong recession into obscurity soon after the meeting adjourns.

The proliferation and popularity of online media has led to what amounts to a real-time, longitudinally documented Journal Club.  Named “post-publication peer review” (PPPR), it consists of blog posts, podcasts and videocasts, comments on research journal websites, remarks on online media outlets, and websites dedicated specifically to PPPR.  Like a traditional Journal Club, PPPR seeks to redress any deficiencies in the traditional peer review process that lead to shortcomings or errors in the reporting or interpretation of a research study.

PPPR following publication of a “positive” trial, that is one where the authors conclude that their a priori criteria for rejecting the null hypothesis were met, is oftentimes directed at the identification of a host of biases in the design, conduct, and analysis of the trial that may have led to a “false positive” trial.  False positive trials are those in which either a type I error has occurred (the null hypothesis was rejected even though it is true and no difference between groups exists), or the structure of the experiment was biased in such a way as that the experiment and its statistics cannot be informative.  The biases that cause structural problems in a trial are manifold, and I may attempt to delineate them at some point in the future.  Because it is a simpler task, I will here attempt to list a differential diagnosis that people may use in PPPRs of “negative” trials.

Thursday, September 8, 2016

Hiding the Evidence in Plain Sight: One-sided Confidence Intervals and Noninferiority Trials

In the last post, I linked a video podcast of my explaining non-inferiority trials and their inherent biases.  In this videocast, I revisit noninferiority trials and the use of one-sided confidence intervals.  I review the Salminen et al noninferiority trial of antibiotics versus appendectomy for the treatment of acute appendicitis in adults.  This trial uses a very large delta of 24%.  The criteria for non-inferiority were not met even with this promiscuous delta.  But the use of a 1-sided 95% confidence interval concealed a more damning revelation in the data.  Watch the 13 minute videocast to learn what was hidden in plain sight!

Erratum:  at 1:36 I say "excludes an absolute risk difference of 1" and I meant to say "excludes an absolute risk difference of ZERO."  Similarly, at 1:42 I say "you can declare non-inferiority".  Well, that's true, you can declare noninferiority if your entire 95% confidence interval falls to the left of an ARD of 0 or a HR of 1, but what I meant to say is that if that is the case "you can declare superiority."

Also, at 7:29, I struggle to remember the numbers (woe is my memory!) and I place the point estimate of the difference, 0.27, to the right of the delta dashed line at .24.  This was a mistake which I correct a few minutes later at 10:44 in the video.  Do not let it confuse you, the 0.27 point estimates were just drawn slightly to the right of delta and they should have been marked slightly to the left of it.  I would re-record the video (labor intensive) or edit it, but I'm a novice with this technological stuff, so please do forgive me.

Finally, at 13:25 I say "within which you can hide evidence of non-inferiority" and I meant "within which you can hide evidence of inferiority."

Again, I apologize for these gaffes.  My struggle (and I think about this stuff a lot) in speaking about and accurately describing these confidence intervals and the conclusions that derive from them result from the arbitrariness of the CONSORT "rules" about interpretation and the arbitrariness of the valences (some articles use negative valence for differences favoring "new" some journals use positive values to favor "new").  If I struggle with it, many other readers, I'm sure, also struggle in keeping things straight.  This is fodder for the argument that these "rules" ought to be changed and made more uniform, for equity and ease of understanding and interpretation of non-inferiority trials.

It made me feel better to see this diagram in Annals of Internal Medicine (Perkins et al July 3, 2012, online ACLS training) where they incorrectly place the point estimate at slightly less than -6% (to the left of the dashed delta line in the Figure 2), when it should have been placed slightly greater than -6% (to the right of the dashed delta line).  Clicking on the image will enlarge it.






Saturday, June 11, 2016

Non-inferiority Trials Are Inherently Biased: Here's Why

Debut VideoCast for the Medical Evidence Blog, explaining non-inferiority trial design and exposing its inherent biases:

In this related blog post, you can find links to the CONSORT statement in the Dec 26, 2012 issue of JAMA and a link to my letter to the editor.

Addendum:  I should have included this in the video.  See the picture below.  In the first example, top left, the entire 95% CI favoring "new" therapy lies in the "zone of indifference", that is, the pre-specified margin of superiority, a mirror image of the "pre-specified margin of noninferiority, in this case delta= +/- 0.15.  Next down, the majority of the 95% CI of the point estimate favoring "new" therapy lies in the "margin of superiority" - so even though the lower end of the 95% CI crosses "mirror delta", the best guess is that the effect of therapy falls in the zone of indifference.  In the lowest example, labeled "Truly Superior", the entire 95% confidence interval falls to the left of "mirror delta" thus reasonable excluding all point estimates in the "zone of indifference" (i.e. +/- delta) and all point estimates favoring the "old" therapy.  This would, in my mind, represent "true superiority" in a logical, rational, and symmetrical way that would be very difficult to mount arguments against.


Added 9/20/16:  For those who question my assertion that the designation of "New" versus "Old" or "comparator" therapy is arbitrary, here is the proof:  In this trial, the "New" therapy is DMARDs and the comparator is anti-tumour necrosis factor agents for the treatment of rheumatoid arthritis.  The rationale for this trial is that the chronologically newer anti-TNF agents are very costly, and the authors wanted to see if similar improvements in quality of life could be obtained with chronologically older DMARDs.  So what is "new" is certainly in the eye of the beholder.  Imagine colistin 50 years ago, being tested against, say, a newer spectrum penicillin.  The penicillin would have been found to be non-inferior, but with a superior side effect profile.  Fast forward 50 years and now colistin could be the "new" resurrected agent and be tested against what 10 years ago was the standard penicillin but is now "old" because of development of resistance.  Clearly, "new" and "old" are arbitrary and flexible designations.