Wednesday, January 11, 2017

Don't Get Soaked: The Practical Utility of Predicting Fluid Responsiveness

In this article in the September 27th issue of JAMA, the authors discuss the rationale and evidence for predicting fluid responsiveness in hemodynamically unstable patients.  While this is a popular academic topic, its practical importance is not as clear.  Some things, such as predicting performance on a SBT with a Yang-Tobin f/Vt,  don't make much sense - just do the SBT if that's the result you're really interested in.  The prediction of whether it will rain today is not very important if the difference in what I do is as small as tucking an umbrella into my bag or not.  Neither the inconvenience of getting a little wet walking from the parking garage nor that of carrying the umbrella is very great.  Similarly, a prediction of whether or not it will rain two months from now when I'm planning a trip to Cancun is not very valuable to me because the confidence intervals about the estimate are too wide to rely upon.  Better to just stick with the base rates:  how much rainfall is there in March in the Caribbean on an average year?

Our letter to the editor was not published in JAMA, so I will post it here:

To the Editor:  A couple of issues relating to the article about predicting responsiveness to fluid bolus1 deserve attention.  First, the authors made a mathematical error that may cause confusion among readers attempting to duplicate the Bayesian calculations described in article.  The negative predictive value (NPV) of a test is the proportion of patients with a negative test who do not have the condition – the true negative rate.2  In each of the instances in which NPV is mentioned in the article, the authors mistakenly report the proportion of patients with a negative test who do have the condition.  This value, 1-NPV, is the false negative rate - the posterior probability of the condition in those with a negative test.

Second, in the examples that discuss NPV, the authors use a prior probability of fluid responsiveness of 50%.  A clinician who appropriately uses a threshold approach to decision making3 must determine a probability threshold above which treatment is warranted, considering the net utility of all possible outcomes with and without treatment given that treatment’s risks and benefits4Because the risk of fluid administration in judicious quantities is low5, the threshold for fluid administration is correspondingly low and fluid bolus may be warranted based on prior probability alone, thus obviating additional testing.  Even if additional testing is negative and suggests a posterior probability of fluid responsiveness of only 10% (with an upper 95% confidence limit of 18%), many clinicians would still judge a trial of fluids to be justified because fluids are considered to be largely benign and untreated hypovolemia is not4.  (The upper confidence limit will be higher still if the prior probability was underestimated.)  Finally, the posterior probabilities hinge critically on the estimates of prior probabilities, which are notoriously nebulous and subjective.  Clinicians are likely intuitively aware of these quandaries, which may explain why empiric fluid bolus is favored over passive leg raise testing outside of academic treatises6.


1.            Bentzer P, Griesdale DE, Boyd J, MacLean K, Sirounis D, Ayas NT. WIll this hemodynamically unstable patient respond to a bolus of intravenous fluids? JAMA. 2016;316(12):1298-1309.
2.            Fischer JE, Bachmann LM, Jaeschke R. A readers' guide to the interpretation of diagnostic test properties: clinical example of sepsis. Intensive Care Med. 2003;29(7):1043-1051.
3.            Pauker SG, Kassirer JP. The threshold approach to clinical decision making. N Engl J Med. 1980;302(20):1109-1117.
4.            Tsalatsanis A, Hozo I, Kumar A, Djulbegovic B. Dual Processing Model for Medical Decision-Making: An Extension to Diagnostic Testing. PLoS One. 2015;10(8):e0134800.
5.            Investigators TP. A Randomized Trial of Protocol-Based Care for Early Septic Shock. N Engl J Med. 2014;370(18):1683-1693.
6.            Marik PE, Monnet X, Teboul J-L. Hemodynamic parameters to guide fluid therapy. Annals of Intensive Care. 2011;1:1-1.


Scott K Aberegg, MD, MPH
Andrew M Hersh, MD
The University of Utah School of Medicine
Salt Lake City, Utah


Thursday, January 5, 2017

RCT Autopsy: The Differential Diagnosis of a Negative Trial

At many institutions, Journal Clubs meet to dissect a trial after its results are published to look for flaws, biases, shortcomings, limitations.  Beyond the dissemination of the informational content of the articles that are reviewed, Journal Clubs serve as a reiteration and extension of the limitations part of the article discussion.  Unless they result in a letter to the editor, or a new peer-reviewed article about the limitations of the trial that was discussed, the debates of Journal Club begin a headlong recession into obscurity soon after the meeting adjourns.

The proliferation and popularity of online media has led to what amounts to a real-time, longitudinally documented Journal Club.  Named “post-publication peer review” (PPPR), it consists of blog posts, podcasts and videocasts, comments on research journal websites, remarks on online media outlets, and websites dedicated specifically to PPPR.  Like a traditional Journal Club, PPPR seeks to redress any deficiencies in the traditional peer review process that lead to shortcomings or errors in the reporting or interpretation of a research study.

PPPR following publication of a “positive” trial, that is one where the authors conclude that their a priori criteria for rejecting the null hypothesis were met, is oftentimes directed at the identification of a host of biases in the design, conduct, and analysis of the trial that may have led to a “false positive” trial.  False positive trials are those in which either a type I error has occurred (the null hypothesis was rejected even though it is true and no difference between groups exists), or the structure of the experiment was biased in such a way as that the experiment and its statistics cannot be informative.  The biases that cause structural problems in a trial are manifold, and I may attempt to delineate them at some point in the future.  Because it is a simpler task, I will here attempt to list a differential diagnosis that people may use in PPPRs of “negative” trials.

Thursday, September 8, 2016

Hiding the Evidence in Plain Sight: One-sided Confidence Intervals and Noninferiority Trials

In the last post, I linked a video podcast of my explaining non-inferiority trials and their inherent biases.  In this videocast, I revisit noninferiority trials and the use of one-sided confidence intervals.  I review the Salminen et al noninferiority trial of antibiotics versus appendectomy for the treatment of acute appendicitis in adults.  This trial uses a very large delta of 24%.  The criteria for non-inferiority were not met even with this promiscuous delta.  But the use of a 1-sided 95% confidence interval concealed a more damning revelation in the data.  Watch the 13 minute videocast to learn what was hidden in plain sight!

Erratum:  at 1:36 I say "excludes an absolute risk difference of 1" and I meant to say "excludes an absolute risk difference of ZERO."  Similarly, at 1:42 I say "you can declare non-inferiority".  Well, that's true, you can declare noninferiority if your entire 95% confidence interval falls to the left of an ARD of 0 or a HR of 1, but what I meant to say is that if that is the case "you can declare superiority."

Also, at 7:29, I struggle to remember the numbers (woe is my memory!) and I place the point estimate of the difference, 0.27, to the right of the delta dashed line at .24.  This was a mistake which I correct a few minutes later at 10:44 in the video.  Do not let it confuse you, the 0.27 point estimates were just drawn slightly to the right of delta and they should have been marked slightly to the left of it.  I would re-record the video (labor intensive) or edit it, but I'm a novice with this technological stuff, so please do forgive me.

Finally, at 13:25 I say "within which you can hide evidence of non-inferiority" and I meant "within which you can hide evidence of inferiority."

Again, I apologize for these gaffes.  My struggle (and I think about this stuff a lot) in speaking about and accurately describing these confidence intervals and the conclusions that derive from them result from the arbitrariness of the CONSORT "rules" about interpretation and the arbitrariness of the valences (some articles use negative valence for differences favoring "new" some journals use positive values to favor "new").  If I struggle with it, many other readers, I'm sure, also struggle in keeping things straight.  This is fodder for the argument that these "rules" ought to be changed and made more uniform, for equity and ease of understanding and interpretation of non-inferiority trials.

It made me feel better to see this diagram in Annals of Internal Medicine (Perkins et al July 3, 2012, online ACLS training) where they incorrectly place the point estimate at slightly less than -6% (to the left of the dashed delta line in the Figure 2), when it should have been placed slightly greater than -6% (to the right of the dashed delta line).  Clicking on the image will enlarge it.






Saturday, June 11, 2016

Non-inferiority Trials Are Inherently Biased: Here's Why

Debut VideoCast for the Medical Evidence Blog, explaining non-inferiority trial design and exposing its inherent biases:

In this related blog post, you can find links to the CONSORT statement in the Dec 26, 2012 issue of JAMA and a link to my letter to the editor.

Addendum:  I should have included this in the video.  See the picture below.  In the first example, top left, the entire 95% CI favoring "new" therapy lies in the "zone of indifference", that is, the pre-specified margin of superiority, a mirror image of the "pre-specified margin of noninferiority, in this case delta= +/- 0.15.  Next down, the majority of the 95% CI of the point estimate favoring "new" therapy lies in the "margin of superiority" - so even though the lower end of the 95% CI crosses "mirror delta", the best guess is that the effect of therapy falls in the zone of indifference.  In the lowest example, labeled "Truly Superior", the entire 95% confidence interval falls to the left of "mirror delta" thus reasonable excluding all point estimates in the "zone of indifference" (i.e. +/- delta) and all point estimates favoring the "old" therapy.  This would, in my mind, represent "true superiority" in a logical, rational, and symmetrical way that would be very difficult to mount arguments against.


Added 9/20/16:  For those who question my assertion that the designation of "New" versus "Old" or "comparator" therapy is arbitrary, here is the proof:  In this trial, the "New" therapy is DMARDs and the comparator is anti-tumour necrosis factor agents for the treatment of rheumatoid arthritis.  The rationale for this trial is that the chronologically newer anti-TNF agents are very costly, and the authors wanted to see if similar improvements in quality of life could be obtained with chronologically older DMARDs.  So what is "new" is certainly in the eye of the beholder.  Imagine colistin 50 years ago, being tested against, say, a newer spectrum penicillin.  The penicillin would have been found to be non-inferior, but with a superior side effect profile.  Fast forward 50 years and now colistin could be the "new" resurrected agent and be tested against what 10 years ago was the standard penicillin but is now "old" because of development of resistance.  Clearly, "new" and "old" are arbitrary and flexible designations.

Wednesday, June 8, 2016

Once Bitten, Twice Try: Failed Trials of Extubation



“When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.”                                                                                   – Clark’s First Law

It is only fair to follow up my provocative post about a “trial of extubation” by chronicling a case or two that didn’t go as I had hoped.  Reader comments from the prior post described very low re-intubation rates.  As I alluded in that post, decisions regarding extubation represent the classic trade-off between sensitivity and specificity.  If your test for “can breathe spontaneously” has high specificity, you will almost never re-intubate a patient.  But unless the criteria used have correspondingly high sensitivity, patients who can breathe spontaneously will be left on the vent for an extra day or two.  Which you (and your patients) favor, high sensitivity or high specificity (assuming you can’t have both) depends upon the values you ascribe to the various outcomes.  Though these are many, it really comes down to this:  what do you think is worse (or more fearsome), prolonged mechanical ventilation, or reintubation?

What we fear today we may not seem so fearsome in the future.  Surgeons classically struggled with the sensitivity and specificity trade-off in the decision to operate for suspected appendicitis.  “If you never have a negative laparotomy, you’re not operating enough” was the heuristic.  But this was based on the notion that failure to operate on a true appendicitis would lead to serious untoward outcomes.  More recent data suggest that this may not be so, and many of those inflamed appendices could have been treated with antibiotics in lieu of surgery.  This is what I’m suggesting with reintubation.  I don’t think the Epstein odds ratio (~4) of mortality for reintubation from 1996 applies today, at least not in my practice.

Tuesday, May 31, 2016

Trial of Extubation: An Informed Empiricist’s Approach to Ventilator Weaning

“The only way of discovering the limits of the possible is to venture a little way past them into the impossible.”    –Clark’s Second Law

In the first blog post, Dr. Manthous invited Drs. Ely, Brochard, and Esteban to respond to a simple vignette about a patient undergoing weaning from mechanical ventilation.  Each responded with his own variation of a cogent, evidence based, and well-referenced/supported approach.  I trained with experts of similar ilk using the same developing evidence base, but my current approach has evolved to be something of a different animal altogether.  It could best be described as a “trial of extubation”.  This approach recently allowed me to successfully extubate a patient 15 minutes into a trial of spontaneous breathing, not following commands, on CPAP 5, PS 5, FiO2 0.5 with the vital parameters in the image accompanying this post (respiratory rate 38, tidal volume 350, heart rate 129, SpO2 88%, temperature 100.8).  I think that any account of the “best” approach to extubation should offer an explanation as to how I can routinely extubate patients similar to this one, who would fail most or all of the conventional prediction tests, with a very high success rate.

A large part of the problem lies in shortcomings of the data upon which conventional prediction tests rely.  For example, in the landmark Yang and Tobin report and many reports that followed, sensitivity and specificity were calculated considering physicians’ “failure to extubate” a patient as equivalent to an “extubation failure”.  This conflation of two very different endpoints makes estimates of sensitivity and specificity unreliable.  Unless every patient with a prediction test is extubated, the sensitivity of a test for successful extubation is going to be an overestimate, as suggested by Epstein in 1995.   Furthermore, all studies have exclusion criteria for entry, with the implicit assumption that excluded patients would not be extubatable with the same effect of increasing the apparent sensitivity of the tests.

Even if we had reliable estimates of sensitivity and specificity of prediction tests, the utility calculus has traditionally been skewed towards favoring specificity for extubation success, largely on the basis of a single 20-year old observational study suggesting that patients who fail extubation have a higher odds of mortality.  I do not doubt that if patients are allowed to “flail” after it becomes clear that they will not sustain unassisted ventilation, untoward outcomes are likely.  However, in my experience and estimation, this concern can be obviated by bedside vigilance by nurses and physicians in the several hours immediately following extubation (with the caveat that a highly skilled airway manager is present or available to reintubate if necessary).  Furthermore, this period of observation provides invaluable information about the cause of failure in the event failure ensues.  There need be no further guesswork about whether the patient can protect her airway, clear her secretions, maintain her saturations, or handle the work of breathing.  With the tube removed, what would otherwise be a prediction about these abilities becomes an observation, a datapoint that can be applied directly to the management plan for any subsequent attempt at extubation should she fail – that is, the true weak link in the system can be pinpointed after extubation.

The specificity-heavy utility calculus, as I have opined before, will fail patients if I am correct that an expeditious reintubation is not harmful, but each additional day spent on the ventilator confers incremental harm.  Why don’t I think reintubations are harmful?  Because when my patients fail, I am diligent about rapid recognition, I reintubate without observing complications, and often I can extubate successfully the next day, as I did a few months ago in a patient with severe ARDS.  She had marginal performance (i.e., she failed all prediction tests) and was extubated, failed, was reintubated, then successfully extubated the next day.  (I admit that it was psychologically agonizing to extubate her the next day.  They say that a cat that walks across a hot stove will never do so again.  It also will not walk on a cold stove again.  This psychology deserves a post of its own.)

When I tweeted the image attached to this post announcing that the patient (and many like her) had been successfully extubated, there was less incredulity than I expected, but an astute follower asked – “Well, then, how do you decide whom and when to extubate?”  I admit that I do not have an algorithmic answer to this question.  Experts in opposing camps of decision psychology such as Kahneman and his adherents in the heuristics and biases camp and Gary Klein, Gird Gigerenzer and others in the expert intuition camp could have a heyday here, and perhaps some investigation is in order.  I can summarize by saying that it has been an evolution over the past 10 or so years.  I use everything I learned from the conventional, physiologic, algorithmic, protocolized, data-driven, evidence-based approach to evaluate a patient.  But I have gravitated to being more sensitive, to capture those patients that the predictors say should fail, and I give them a chance – a “trial of extubation.”  If they fail, I reintubate quickly.  I pay careful attention to respiratory parameters, mental status, and especially neuromuscular weakness, but I integrate this information into my mental map of the natural history of the disease and the specific patient’s position along that course to judge whether they have even a reasonable modicum of a chance of success.  If they do, I “bite the bullet and pull it.”

I do not eschew data, I love data.  But I am quick to recognize their limitations.  Data are generated for many reasons and have different values to different people with different prerogatives.  From the clinician’s and the patient’s perspective, the data are valuable if they reduce the burden of illness.  I worry that the current data and the protocols predicated on them are failing to capture many patients who are able to breathe spontaneously but are not being given the chance.  Hard core evidence based medicine proponents and investigators need not worry though, because I have outlined a testable hypothesis:  that a “trial of extubation” in the face of uncertainty is superior to the use of prediction tests and protocols.  The difficult part will be determining the inclusion and exclusion criteria, and no matter what compromise is made uncertainty will remain, reminding us that science is an iterative, evolving enterprise, with conclusions that are always tentative.

Monday, May 2, 2016

Hope: The Mother of Bias in Research

I realized the other day that underlying every slanted report or overly-optimistic interpretation of a trial's results, every contorted post hoc analysis, every Big Pharma obfuscation, is hope.  And while hope is generally a good, positive emotion, it engenders great bias in the interpretation of medical research.  Consider this NYT article from last month:  "Dashing Hopes, Study Shows Cholesterol Drug Had No Effect on Heart Health."  The title itself reinforces my point, as do several quotes in the article.
“All of us would have put money on it,” said Dr. Peter Libby, a Harvard cardiologist. The drug, he said, “was the great hope.”
 Again, hope is wonderful, but it blinds people to the truth in everyday life and I'm afraid researchers are no more immune to its effects than the laity.  In my estimation, three main categories of hope creep into the evaluation of research and foments bias:

  1. Hope for a cure, prevention, or treatment for a disease (on the part of patients, investigators, or both)
  2. Hope for career advancement, funding, notoriety, being right (on the part of investigators) and related sunk cost bias
  3. Hope for financial gain (usually on the part of Big Pharma and related industrial interests)
Consider prone positioning for ARDS.  For over 20 years, investigators have hoped that prone positioning improves not only oxygenation but also outcomes (mostly mortality).  So is it any wonder that after the most recent trial, in spite of the 4 or 5 previous failed trials, the community enthusiastically declared "success!"  "Prone Positioning works!"  Of course it is no wonder - this has been the hope for decades.

But consider what the most recent trial represents through the lens of replicability:  a failure to replicate previous results showing that prone positioning does not improve mortality.  The recent trial is the outlier.  It is the "false positive" rather than the previous trials being the "false negatives."

This way of interpreting the trials of prone positioning in the aggregate should be an obvious one, and it astonishes me that it took me so long to see the results this way - as a single failure to replicate previously replicable negative results.  But it hearkens to the underlying bias - we view results through the magnifying glass of hope, and it distorts our appraisal of the evidence.

Indeed, I have been accused of being a nihilist because of my views on this blog, which some see as derogating the work of others or an attempt to dash their hopes.  But these critics engage, or wish me to engage in a form of outcome bias - the value of the research lies in the integrity of its design, conduct, analysis, and reporting, not in its results.  One can do superlative research and get negative results, or shoddy research and get positive results.  My goal here is and always has been to judge the research on its merits, regardless of the results or the hopes that impel it.

(Aside:  Cholesterol researchers have a faith or hope in the cholesterol hypothesis - that cholesterol is a causal factor in pathways to cardiovascular outcomes.  Statin data corroborate this, and preliminary PCSK9 inhibitor data do, too.  But how quickly we engage in hopeful confirmation bias!  If cholesterol is a causal factor, it should not matter how you manipulate it - lower the cholesterol, lower cardiovascular events.  The fact that it does appear to matter how you lower it suggests that either there are multiplicity of agent effects (untoward and unknown effects of some agents negate some their beneficial effects in the cholesterol causal pathway) or that cholesterol levels are epiphenomena - markers of the effects of statins and PCSK9 inhibitors on the real, but as yet undelineated causal pathways.  Maybe the fact that we can easily measure cholesterol and that it is associated with outcomes in untreated individuals is a convenient accident of history that led us to trial statins which work in ways that we do not yet understand.)

Tuesday, February 23, 2016

Much Ado About Nothing? The Relevance of New Sepsis Definitions for Clinical Care Versus Research

What's in a name?  That which we call a rose, by any other name would smell as sweet. - Shakespeare, Romeo and Juliet Act II Scene II

The Society of Critical Care Medicine is meeting this week, JAMA devoted an entire issue to sepsis and critical illness, and my twitter feed is ablaze with news of release of a new consensus definition of sepsis.  Much laudable work has been done to get to this point, even as the work is already generating controversy (Is this a "first world" definition that will be forced upon second and third world countries where it may have less external validity?  Why were no women on the panel?).  Making the definition of sepsis more reliable, from a sensitivity and specificity standpoint (more accurate) is a step forward for the sepsis research enterprise, for it will allow improved targeting of inclusion criteria for trials of therapies for sepsis, and better external validity when those therapies are later applied in a population that resembles those enrolled.  But what impact will/should the new definition have on clinical care?  Are the-times-they-are-a-changing?

Diagnosis, a fundamental goal of clinical medicine is important for several reasons, chief among them:

  1. To identify the underlying cause of symptoms and signs so that treatments specific to that illness can be administered
  2. To provide information on prognosis, natural history, course, etc for patients with or without treatment
  3. To reassure the physician and patients that there is an understanding of what is going on; information itself has value even if it is not actionable
Thus redefining sepsis (or even defining it in the first place) is valuable if it allows us to institute treatments that would not otherwise be instituted, or provides prognostic or other information that is valuable to patients.  Does it do either of those two things?

Wednesday, February 10, 2016

A Focus on Fees: Why I Practice Evidence Based Medicine Like I Invest for Retirement

He is the best physician who knows the worthlessness of the most medicines."  - Ben Franklin

This blog has been highly critical of evidence, taking every opportunity to strike at any vulnerability of a trial or research program.  That is because this is serious business.  Lives and limbs hang in the balance, pharmaceutical companies stand to gain billions from "successful" trials, investigators' careers and funding are on the line if chance findings don't pan out in subsequent investigations, sometimes well-meaning convictions blind investigators and others to the truth; in short, the landscape is fertile for bias, manipulation, and even fraud.  To top it off, many of the questions about how to practice or deal with a particular problem have scant or no evidence to bear upon them, and practitioners are left to guesswork, convention, or pathophysiological reasoning - and I'm not sure which among these is most threatening.  So I am often asked, how do you deal with the uncertainty that arises from fallible evidence or paucity of evidence when you practice?

I have ruminated about this question and how to summarize the logic of my minimalist practice style for some time but yesterday the answer dawned on me:  I practice medicine like I invest in stocks, with a strategy that comports with the data, and with precepts of rational decision making.

Investors make numerous well-described and wealth destroying mistakes when they invest in stocks.  Experts such as John Bogle, Burton Malkiel, David Swenson and others have written influential books on the topic, utilizing data from studies in economics (financial and behavioral).  Key among the mistakes that investors make are trying to select high performers (such as mutual funds or hedge fund managers), chasing performance, and timing the market.  The data suggest that professional stock pickers fare little better than chance over the long run, that you cannot discern who will beat the average over the long run, and that the excess fees you are charged by high performers will negate any benefit they might otherwise have conferred to you.  The experts generally recommend that you stick with strategies that are proven beyond a reasonable doubt: a heavy concentration in stocks with their long track record of superior returns, diversification, and strict minimization of fees.  Fees are the only thing you can guarantee about your portfolio's returns.

Thursday, February 4, 2016

Diamox Results in Urine: General and Specific Lessons from the DIABOLO Acetazolamide Trial

The trial of acetazolamide to reduce duration of mechanical ventilation in COPD patients was published in JAMA this week.  I will use this trial to discuss some general principles about RCTs and make some comments specific to this trial.

My arguable but strong prior belief, before I even read the trial, is that Diamox (acetazolamide) is ineffectual in acute and chronic respiratory failure, or that it is harmful.  Its use is predicated on a "normalization fallacy" which guides practitioners to try attempt to achieve euboxia (normal numbers).  In chronic respiratory acidosis, the kidneys conserve bicarbonate to maintain normal pH.  There was a patient we saw at OSU in about 2008 who had severe COPD with a PaCO2 in the 70s and chronic renal failure with a bicarbonate under 20.  A well-intentioned but misguided resident checked an ABG and the patient's pH was on the order of 7.1.  We (the pulmonary service) were called to evaluate the patient for MICU transfer and intubation, and when we arrived we found him sitting at the bedside comfortably eating breakfast.  So it would appear that if the kidneys can't conserve enough bicarbonate to maintain normal pH, patients can get along with acidosis, but obviously evolution has created systems to maintain normal pH.  Why you would think that interfering with this highly conserved system to increase minute ventilation in a COPD patient you are trying to wean is beyond the reach of my imagination.  It just makes no sense.

This brings us to a major problem with a sizable proportion of RCTs that I read:  the background/introduction provides woefully insufficient justification for the hypothesis that the RCT seeks to test.  In the background of this paper, we are sent to references 4-14.  Here is a summary of each:

4.)  A review of metabolic alkalosis in a general population of critically ill patients
5.)  An RCT of acetazolamide for weaning COPD patients showing that it doesn't work
6.)  Incidence of alkalosis in hospitalized patients in 1980
7.)  A 1983 translational study to delineate the effect of acetazolamide on acid base parameters in 10 paitnets
8.)  A 1982 study of hemodynamic parameters after acetazolamide administration in 12 patients
9.)  A study of metabolic and acid base parameters in 14 patients with cystic fibrosis 
10.) A retrospective epidemiological descriptive study of serum bicarbonate in a large cohort of critically ill patients
11.)  A study of acetazolamide in anesthetized cats
12 - 14).  Commentary and pharmacodynamic studies of acetazolamide by the authors of the current study

Wednesday, December 23, 2015

Narrated and Abridged: There is (No) Evidence for That: Epistemic Problems in Critical Care Medicine

Below is the narrated video of my powerpoint presentation on Epistemic Problems in Critical Care Medicine, which provides a framework for understanding why we have both false positives and false negatives in clinical trials in critical care medicine and why we should be circumspect about our "evidence base" and our "knowledge".  This is not trivial stuff, and is worth the 35 minutes required to watch the narration of the slideshow.  It is a provocative presentation which gives compelling reasons to challenge our "evidence base" in critical care and medicine in general, in ways that are not widely recognized but perhaps should be, with several suggestions about assumptions that need to be challenged and revised to make our models of reality more reliable.  Please contact me if you would like me to give an iteration of this presentation at your institution.


Tuesday, November 10, 2015

Peersnickety Review: Rant on My Recent Battle With Peer Reviewers

I'd like to relate a tale of exasperation with the peer review process that I recently experienced and that is probably all too familiar - but one that most folks are too timid to complain publicly about.

Nevermind that laypersons think that peer review means that your peers are reviewing your actual data for accuracy and fidelity (they are not, they are reviewing only your manuscript, final analyses, and conclusions), which causes them to be perplexed when revelations of fraudulent data published in top journals are reported.  Nevermind that the website Retraction Watch, which began as a small side project now has daily and twice daily postings of retracted papers.  Nevermind that some scientists have built entire careers on faked data.  Nevermind that the fact that something has been peer reviewed provides very little in the way of assurance that the report contains anything other than rubbish.  Nevermind that leading investigators publish the same reviews over and over in different journals with the same figures and sometimes the same text.

The entire process is cumbersome, time consuming, frustrating, and of dubious value as currently practiced.

Last year I was invited by the editors of Chest to write a "contemporary review of ionized calcium in the ICU - should it be measured?  should it be treated?"  I am not aware of why I was selected for this, but I infer that someone suggested me as the author because of my prior research in medical decision making and because of the monograph we wrote several years back called Laboratory Testing in the ICU which applied principles of rational decision making such as Bayesian methods and back-of-the-envelope cost benefit analyses to make a framework of rational laboratory testing in the ICU.  I accepted the invitation, even knowing it would entail a good deal of work for me that would be entirely uncompensated, save for buttressing my fragile ego, he said allegorically.

Now, consider for an instant the extra barriers that I, as a non-academic physician faced in agreeing to do this.  As a non-academic physician, I do not have access to a medical library, and of course the Chest editors do not have a way to grant me access.  That is, non-academic physicians doing scholarly work such as this are effectively disenfranchised from the infrastructure that they need to do scholarly work.  Fortunately for me, my wife was a student at the University of Utah during this time so I was able to access the University library with her help.  Whether academic centers and peer-reviewed journals ought to have a monopoly on this information is a matter for debate elsewhere, and not a trivial one.

Sunday, October 11, 2015

When Hell Freezes Over: Trials of Temperature Manipulation in Critical Illness

The bed is on fire
Two articles published online ahead of print in the NEJM last week deal with actual and attempted temperature manipulation to improve outcomes in critically ill patients.

The Eurotherm3235 trial was stopped early because of concerns of harm or futility.  This trial enrolled patients with traumatic brain injury (TBI) and elevated intracranial pressure (ICP) and randomized them to induced hypothermia (which reduces ICP) versus standard care.  There was a suggestion of worse outcomes in the hypothermia group.  I know that the idea that we can help the brain with the simple maneuver of lowering body temperature has great appeal and what some would call "biological plausibility" a term that I henceforth forsake and strike from my vocabulary.  You can rationalize the effect of an intervention any way you want using theoretical biological reasoning.  So from now on I'm not going to speak of biological plausibility, I will call it biological rationalizing.  A more robust principle, as I have claimed before, is biological precedent - that is, this or that pathway has been successfully manipulated in a similar way in the past.  It is reasonable to believe that interfering with LDL metabolism will improve cardiovascular outcomes because of decades of trials of statins (though agents used to manipulate this pathway are not all created equal).  It is reasonable to believe that intervening with platelet aggregation will improve outcomes from cardiovascular disease because of decades of trials of aspirin and plavix and others.  It is reasonable to doubt that manipulation of body temperature will improve any outcome because there is no unequivocal precedent for this, save for warming people with hypothermia from exposure - which basically amounts to treating the known cause of their ailment.  This is one causal pathway that we understand beyond a reasonable doubt.  If you get exposure, you freeze to death.  If we find you still alive and warm you, you may well survive.

Wednesday, October 7, 2015

Early Mobility in the ICU: The Trial That Should Not Be

I learned via twitter yesterday that momentum is building to conduct a trial of early mobility in critically ill patients.  While I greatly respect many of the investigators headed down this path, forthwith I will tell you why this trial should not be done, based on principles of rational decision making.

A trial is a diagnostic test of a hypothesis, a complicated and costly test of a hypothesis, and one that entails risk.  Diagnostic tests should not be used indiscriminately.  That the RCT is a "Gold Standard" in the hierarchy of testing hypotheses does not mean that we should hold it sacrosanct, nor does it follow that we need a gold standard in all cases.  Just like in clinical medicine, we should be judicious in our ordering of diagnostic tests.

The first reason that we should not do a trial of early mobility (or any mobility) in the ICU is because in the opinion of this author, experts in critical care, and many others, early mobility works.  We have a strong prior probability that this is a beneficial thing to be doing (which is why prominent centers have been doing it for years, sans RCT evidence).  When the prior probability is high enough, additional testing has decreasing yield and risks false negative results if people are not attuned to the prior.  Here's my analogy - a 35 year old woman with polycystic kidney disease who is taking birth control presents to the ED after collapsing with syncope.  She had shortness of breath and chest pain for 12 hours prior to syncope.  Her chest x-ray is clear and bedside ultrasound shows a dilated right ventricle.  The prior probability of pulmonary embolism is high enough that we don't really need further testing, we give anticoagulants right away.  Even if a V/Q scan (creatnine precludes CT) is "low probability" for pulmonary embolism, we still think she has it because the prior probability is so high.  Indeed, the prior probability is so high that we're willing to make decisions without further testing, hence we gave heparin.  This process follows the very rational Threshold Approach to Decision Making approach proposed by Pauker and Kasirrer in the NEJM in 1980, which is basically a reformulation of VonNeumann and Morganstern's Expected Utility Theory to adapt it to medical decisions.  Distilled it states in essence, "when you get to a threshold probability of disease where the benefits of treatment exceed the risks, you treat."  And so let it be with early mobility.  We already think the benefits exceed the risks, which is why we're doing it.  We don't need a RCT.  As I used to ask the housestaff over and over until I was cyanotic: "How will the results of that test influence what you're going to do?"

Notice that this logical approach to clinical decision making shines a blinding light upon "evidence based medicine" and the entire enterprise of testing hypotheses with frequentist methods that are deaf to prior probabilities.  Can you imagine using V/Q scanning to test for PE without prior probabilities?  Can you imagine what a mess you would find yourself in with regard to false negatives and false positives?  You would be the neophyte medical student who thinks "test positive, disease present; test negative, disease absent."  So why do we continue ad nauseum in critical care medicine to dismiss prior probabilities and decision thresholds and blindly test hypotheses in a purist vacuum?

The next reasons this trial should not be conducted flow from the first.  The trial will not have a high enough likelihood ratio to sway the high prior below the decision threshold; if the trial is "positive" we will have spent millions of dollars to "prove" something we already knew at a threshold above our treatment threshold; if the trial is positive, some will squawk "It wasn't blinded" yada yada yada in an attempt to dismiss the results as false positives; if the trial is negative, some will, like the tyro medical student, declare that "there is no evidence for early mobility" and similar hoopla and poppycock; or the worst case:  the trial shows harm from early mobility, which will get the naysayers of early mobility very agitated.  But of course, our prior probability that early mobility is harmful is hopelessly low, making such a result highly likely to be spurious.  When we clamor about "evidence" we are in essence clamoring about "testing hypotheses with RCTs" and eschewing our responsibility to use clinical judgment, recognize the limits of testing, and practice in the face of uncertainty using our "untested" prior probabilities.

Consider a trial of exercise on cardiovascular outcomes in community dwelling adults - what good can possibly come of such a trial?  Don't we already know that exercise is good for you?  If so, a positive trial reinforces what we already know (but does little to convince sedentary folks to exercise, as they too already know they should exercise), but a negative trial risks sending the message to people that exercise is of no use to you, or that the number needed to treat is too small for you to worry about.

Or consider the recent trials of EGDT which "refuted" the Rivers trial from 14 years ago.  Now, everybody is saying, "Well, we know it works, maybe not the catheters and the ScVO2 and all those minutaie , but in general, rapid early resuscitation works.  And the trials show that we've already incorporated what works into general practice!"

I don't know the solutions to these difficult quandries that we repeatedly find ourselves in trial after trial in critical care medicine.  I'm confused too.  That's why I'm thinking very hard and very critically about the limits of our methods and our models and our routines.  But if we can anticipate not only the results of the trials, but also the community reaction to them, then we have guidance about how to proceed in the future.  Because what value does a mega-trial have, if not to guide care after its completion?  And even if that is not its goal, (maybe its goal is just to inform the science), can we turn a blind eye to the fact that it will guide practice after its completion, even if that guidance is premature?

It is my worry that, given the high prior probability that a trial in critical care medicine will be "negative", the most likely result is a negative trial which will embolden those who wish to dismiss the probable benefits of early mobility and give them an excuse to not do it.

Diagnostic tests have risks.  A false negative test is one such risk.

Wednesday, July 22, 2015

There is (No) Evidence For That: Epistemic Problems in Evidence Based Medicine

Below is a Power Point Presentation that I have delivered several times recently including one iteration at the SMACC conference in Chicago.  It addresses epistemic problems in our therapeutic knowledge, and calls into question all claims of "there is evidence for ABC" and "there is no evidence for ABC."  Such claims cannot be taken at face value and need deeper consideration and evaluation considering all possible states of reality - gone is the cookbook or algorithmic approach to evidence appraisal as promulgated by the User's Guides.  Considered in the presentation are therapies for which we have no evidence, but they undoubtedly work (Category 1 - Parachutes) and therapies for which we have evidence of efficacy or lack thereof (Category 2) but that evidence is subject to false positives and false negatives, for numerous reasons including: the Ludic Fallacy, study bias (See: Why Most Published Research Findings Are False), type 1 and 2 errors, the "alpha bet" (the arbitrary and lax standard used for alpha, namely 0.05), Bayesian interpretations, stochastic dominance of the null hypothesis, inadequate study power in general and that due to delta inflation and subversion of double significance hypothesis testing.  These are all topics that have been previously addressed to some degree on this blog, but this presentation presents them together as a framework for understanding the epistemic problems that arise within our "evidence base."  It also provides insights into why we have a generation of trials in critical care the results of which converge on the null and why positive studies in this field cannot be replicated.

Tuesday, June 2, 2015

Evolution Based Medicine: A Philosophical Framework for Understanding Why Things Don't Work

An afternoon session at the ATS meeting this year about "de-adoption" of therapies which have been shown to be ineffective was very thought provoking and the contrasts between it and the morning session on ARDS are nothing less than ironic.   As I described in the prior post about the baby in the bathwater, physicians seem to have a hard time de-adopting therapies.  Ask your colleagues at the next division conference if you should abandon hypothermia after cardiac arrest and rather just treat fever based on the TTM trial and the recent pediatric trial, and see what the response is.  Or, suggest that hyperglycemia (at any level in non-diabetic patients) in the ICU be observed rather than treated.  Or float the idea to your surgical colleagues that antibiotics be curtailed after four days in complicated intraabdominal infection, and see how quickly you are ushered out of the SICU.  Tell your dietition that you're going to begin intentionally underfeeding patients, or not feeding them at all and see what s/he say(s).  Propose that you discard sepsis resuscitation bundles, etc.  We have a hard time de-adopting.  We want to take what we have learned about physiology and pharmacology and apply it, to usurp control of and modify biological processes that we think we understand. We (especially in critical care) are interventionists at heart.

The irony occurred at ATS because in the morning session, we were told that there is incontrovertible (uncontroverted may have been a better word) evidence for the efficacy of prone positioning in ARDS (interestingly, one of the only putative therapies for ARDS that the ARDSnet investigators never trialed), and it was strongly suggested that we begin using esophageal manometry to titrate PEEP in ARDS.  So, in the morning, we are admonished to adopt, and in the afternoon we are chided to de-adopt a host of therapies.  Is this the inevitable cycle in critical care and medical therapeutics?  A headlong rush to adopt, then an uphill battle to de-adopt?

Friday, May 1, 2015

Is There a Baby in That Bathwater? Status Quo Bias in Evidence Appraisal in Critical Care

"But we are not here concerned with hopes and fears, only the truth so far as our reason allows us to discover it."  -  Charles Darwin, The Descent of Man

Status quo bias is a cognitive decision making bias that leads to decision makers' preference for the choice represented by the current status quo, even when the status quo is arbitrary or irrelevant.  Decision makers tend to perceive a change from the status quo as a loss and therefore their decisions are biased toward the status quo.  This can lead to preference reversals when the status quo reference frame is changed.  The status quo can be debiased using a reversal test, i.e., manipulating the status quo either experimentally or via thought experiment to consider a change in the opposite direction.  If reluctance to change from the status quo exists in both directions, status quo bias is likely to exist.

My collaborators Peter Terry, Hal Arkes and I reported in a study published in 2006 that physicians were far more likely to abandon a therapy that was status quo or standard therapy based on new evidence of harm than they were to adopt an identical therapy based on the same evidence of benefit from a fictitious RCT (randomized controlled trial) presented in the vignette.  These results suggested that there was an asymmetric status quo bias - physicians showed a strong preference for the status quo in the adoption of new therapies, but a strong preference for abandoning the status quo when a standard of care was shown to be harmful.  Two characteristics of the vignettes used in this intersubject study deserve attention.  First, the vignettes described a standard or status quo therapy that had no support from RCTs prior to the fictitious one described in the vignette.  Second, this study was driven in part by what I perceived at the time was a curious lack of adoption of drotrecogin-alfa (Xigris), with its then purported mortality benefit and associated bleeding risk.  Thus, our vignettes had very significant trade-offs in terms of side effects in both the adopt and abandon reference frames.  Our results seemed to explain s/low uptake of Xigris, and were also consistent with the relatively rapid abandonment of hormone replacement therapy (HRT) after publication of the WHI, the first RCT of HRT.

Thursday, January 29, 2015

The Therapeutic Paradox: What's Right for the Population May Not Be Right for the Patient

Bad for the population, good for me
An article in this week's New York Times called Will This Treatment Help Me?  There's a Statistic for that highlights the disconnect between the risks (and risk reductions) that epidemiologists, researchers, guideline writers, the pharmaceutical industry, and policy wonks think are significant and the risks (and risk reductions) patients intuitively think are significant enough to warrant treatment.

The authors, bloggers at The Incidental Economist, begin the article with a sobering look at the number needed to treat (NNT).  For the primary prevention of myocardial infarction (MI), if 2000 people with a 10% or higher risk of MI in the next 10 years take aspirin for 2 years, one MI will be prevented.  1999 people will have gotten no benefit from aspirin, and four will have an MI in spite of taking aspirin.  Aspirin, a very good drug on all accounts, is far from a panacea, and this from a man (me) who takes it in spite of falling far below the risk threshold at which it is recommended.

One problem with NNT is that for patients it is a gratuitous numerical transformation of a simple number that anybody could understand (the absolute risk reduction  - "your risk of stroke is reduced 3% by taking coumadin"), into a more abstract one (the NNT - "if we treat 33 people with coumadin, we prevent one stroke among them") that requires retransformation into examples that people can understand, as shown in pictograms in the NYT article.  A person trying to understand stroke prevention with coumadin could care less about the other 32 people his doctor is treating with coumadin, he is interested in himself.  And his risk is reduced 3%.  So why do we even use the NNT, why not just use ARR?

Saturday, January 17, 2015

Clinical Trialists Should Use Economies of Scale to Maximize Profits of Large RCTs

The lever is a powerful tool
I am writing (very slowly) a review article about ionized calcium in the ICU - should it be measured, and should it be treated?  There are several recent large observational studies that look at the association between calcium and outcomes of  critical illness, but being observational, they do not offer guidance as to whether chasing calcium levels with calcium gluconate or chloride will improve outcomes or whether hypo- or hyper-calcemia is simply a marker of severity of illness (the latter is of course my bet.)

Thinking about calcium levels and causation and repletion, one cannot help but think about all sorts of other levels we check in the ICU - potassium, magnesium, phosphate - and may other things we routinely do but about which we have no real inkling of an idea as to whether we're doing any patients any good.  (Arterial lines are another example.)  Are we just wasting our time with many of the things we do?  This question becomes more urgent as evidence mounts that much of what we do (in the ICU and elsewhere) is useless, wasteful, or downright harmful.  But who or what agency is going to fund a trial of potassium or calcium replacement in the ICU?  It certainly seems unglamorous.   Don't we have other disease-specific priorities that are paramount in importance to such a trial?

I then realized that a good businessman, wanting to maximize the "profit" from a large, randomized controlled trial (and the dollars "invested" in it), would take advantage of economies of scale.  For those who are not business savvy (I do not imply that I am), business costs can be roughly divided into fixed costs and variable costs.  If you have a factory making widgets you have certain  costs such as the rent, advertising, widget making machines.  These costs are "fixed" meaning that they are invariable whether you make 100 widgets or 10,000 widgets.  Variable costs are the costs of materials, electricity, and human resources which must be scaled up as you make more widgets.  In general, the cost of making each widget goes down as the fixed costs are spread out over more widget units.  Additionally, if you can leverage your infrastructure to make wadgets, a product similar to a widget, you likewise increase profits by lowering costs per unit.

Saturday, October 11, 2014

Enrolling Bad Patients After Good: Sunk Cost Bias and the Meta-Analytic Futility Stopping Rule

Four (relatively) large critical care randomized controlled trials were published early in the NEJM in the last week.  I was excited to blog on them, but then I realized they're all four old news, so there's nothing to blog about.  But alas, the fact that there is no news is the news.

In the last week, we "learned" that more transfusion is not helpful in septic shock, that EGDT (the ARISE trial) is not beneficial in sepsis, that simvastatin (HARP-2 trial) is not beneficial in ARDS, and that parental administration of nutrition is not superior to enteral administration in critical illness.  Any of that sound familiar?

I read the first two articles, then discovered the last two and I said to myself "I'm not reading these."  At first I felt bad about this decision, but then that I realized it is a rational one.  Here's why.

Saturday, July 12, 2014

Better the Devil You Know: Thrombolysis for Pulmonary Embolism

In my view, the task of the expert is to render the complex simple.  And the expert does do this, except when reality goes against his bets and complexity becomes a tool for obfuscating an unwanted result.

In 2002, Konstantanidis compared alteplase plus heparin versus heparin alone for submassive pulmonary embolism (PE).  The simple message from this study was "alteplase now saves you from alteplase later" and the simple strategy is to wait until there is hemodynamic deterioration (shock) and then give alteplase.  Would that it were actually viewed so simply - I would not then get calls from stressed providers hemming and hawing about the septum bowing on the echo and the sinus tachycardia and the....

If you're a true believer, you think alteplase works - you want it to work.  So, you do another study, hoping that biomarkers better identify a subset of patients that will benefit from an up front strategy of thrombolysis.  Thus, the PEITHO study appeared in the April 10th, 2014 issue of the NEJM.  It too showed that fibrinolysis (with tenecteplase) now simply saved you from tenecteplase later.  But fibrinolysis now also causes stroke later with an increase from 0.2% in the control group versus 2.4% in the fibrinolysis group - and most of them were hemorrhagic.   Again, the strategic path is in stark relief - if your patient is dying of shock from PE, give fibrinolysis.  If not, wait - because less than 5% of them are going to deteriorate.

So we have vivid clarity provided by large modern randomized controlled trials guiding us on what to do with that subset of patients with PE that is not in shock.  For those that are in shock, most agree that we should give thrombolysis.

To muddy that clarity, Chatterjee et al report the results of a meta-analysis in the June 18th issue of JAMA in which they combine all trials they could find over the past 45 years (back to 1970!) of all patients with PE, regardless of hemodynamic status.  The result:  fewer patients died but more had bleeding.  We have now made one full revolution, from trying to identify subsets likely to benefit, to combining them all back together - I think I'm getting dizzy.

If the editorialist would look at his numbers as his patients likely would (and dispense with relative risk reductions), he would see that:

Death Bleeding in the brain Other Major Bleeding
Blood Thinner 3.89% 0.19% 3.42
Clot Buster 2.17% 1.46% 9.24
Difference 1.72% -1.27% -5.82

For almost every life that is saved, there is almost one (0.74) case of bleeding in the brain and there are 3.4 more cases of major bleeding.  And bear in mind that these are the aggregate meta-analysis numbers that include patients in shock and those not in shock - the picture is worse if you exclude those in shock.

Better the devil you know.

Monday, May 19, 2014

Sell Side Bias and Scientific Stockholm Syndrome: A Report from the Annual Meeting of the American Thoracic Society

What secrets lie inside?
Analysts working on Wall Street are sometimes categorized as working on either the "buy side" or the "sell side" depending on whether their firm is placing orders for stocks (buy side, such as institutional investors for mutual funds) or filling orders for stocks (sell side, which makes commissions on stock trades).  Sell side bias refers to any tendency for the sell side to "push" stocks via overly optimistic ratings and analyses.

Well, I'm at the American Thoracic Society (ATS) meeting in San Diego right now, and it certainly does feel like people - everyone - is trying to sell me something.  From the giant industry sponsored banners, to the emblazoned tote bags, to the bags of propaganda left at my hotel room door every morning, to the exhibitor hall filled with every manner of new and fancy gadgets (but closed to cameras), to the investigators themselves, everybody is trying to convince me to buy (or prescribe) something.  Especially ideas.  Investigators have a promotional interest in their ideas.  And they want you and me to buy into their ideas.  I have become convinced that investigators without industry ties (that dying breed) are just about as susceptible to sell side bias as those with industry ties.  Indeed, I have also noted that the potential consumer of many of the ideas himself seems biased - he wants things to work, too, and he has a ready explanation for why some ideas didn't pan out in the data (see below).  It's like an epidemic of scientific Stockholm Syndrome.

The first session I attended was a synopsis of the SAILS trial by the ARDSnet investigators, testing whether use of a statin, rosuvastatin, in patients with sepsis-incited lung injury would influence 60 day mortality.  The basis of this trial was formed by observational associations that patients on statins had better outcomes in this, that, and the other thing, including sepsis.  If you are not already aware of the results, guess whether rosuvastatin was beneficial in this study.