Medical Evidence Blog

Sunday, April 22, 2018

The Respiratory Rosenhan Experiment on Obese Patients

Normally this post would be on Status Iatrogenicus, but the implications for "evidence" and the EBM movement are too important, so it goes here.

For those not inclined to read about the Rosenhan Experiment, a brief history. In the early 1970s, Dr. Rosenhan a Stanford psychologist, was concerned about the validity of psychiatric diagnoses. So he and a half dozen confederates faked mental illness and presented themselves to several prominent psychiatric facilities. Their feigned symptoms were sufficient to have them admitted with psychiatric diagnoses, usually paranoid schizophrenia. After admission, they behaved normally. Nonetheless, they had been diagnosed, put on psychotropic medications, and not allowed to leave until they signed documents swearing to continue the medications upon their release. The damning report of this experiment was published in Science Magazine in 1973 with the title "On Being Sane in Insane Places."

Psychiatric hospitals were blindsided, having been caught unawares and humiliated. They challenged Rosenhan to "do it again", but the second time they would be vigilant about these disimulations. Send us some more fakes, they said. On the second experiment, they determined that 40 some patients were confederates. But Rosenhan had the upper RosenHand - he had send no confederates. Sensitivity, if it can be called that, suffered to specificity on round two.

Previously I have complained about patients who are intubated but needn't have been (see here, and here.) Oftentimes, after the fact, it is difficult to determine whether the intubation was necessary, especially with alleged upper airway compromise, and with obese patients. With the latter, roentgenograms of the chest are difficult to interpret because of "fatelectasis" or atelectasis in obese persons, "flatelectasis" due to recumbency, fluid loading after paralysis and intubation, all which may require high PEEP to counteract. If you were not present prior to intubation, it is very difficult to determine if respiratory distress preceded the intubation, or if "won't breathe" was mistaken for "can't breathe". The differentiation between those two entities is "critical".

A man in his late 40s was sent to us recently for "acute respiratory failure" on the ventilator receiving 100% FiO2 and PEEP of 16. EMS responded to a call to his 18-wheeler that he could not breathe. His SpO2 was in the 50% range and he was admitted to a local hospital. There were basilar opacities, and oxygen was administered. He weighed near 500#. The opacities were said to represent pneumonia and he was given antibiotics. Not long after admission, in the middle of the night, he could not be aroused, an ABG was obtained, and his PaCO2 was 90-something with a pH of 7.10 or thereabouts. This was interpreted to represent acute hypercapneic respiratory failure, on top of his "acute hypoxemic respiratory failure" and an hour-long intubation ensued. Afterwards he was sent to us on the aforementioned high ventilator settings.

Ruling Out PE in the ED: Critical Analysis of the PROPER Trial

This post is going to be an in-depth "journal club" style analysis of the PROPER trial.

In this week's JAMA, Freund et al report the results of the PROPER randomized controlled trial of the PERC (pulmonary embolism rule -out criteria) rule for safely excluding pulmonary embolism (PE) in the emergency department (ED) among patients with a "low clinical gestalt" of having PE. All things pulmonary and all things noninferiority being pet topics of mine, I had to delve deeper into this article because frankly the abstract confused me.

This was a cluster randomized noninferiority trial, but for most purposes, the cluster part can be ignored when looking at the data. Each of 14 EDs in France was randomized such that during the "PERC period" PE was excluded in patients with a "low gestalt clinical probability" (not yet defined in the abstract) if all of the 8 items of the PERC rule were excluded. In the "control period" usual procedures for exclusion of PE were followed. The primary end point was occurrence of a [venous] thromboembolic event (VTE) during 3 months of follow-up. The delta (pre-specified margin of noninferiority) for the endpoint was 1.5%. This is a pleasingly low number. In our meta-research study of 163 noninferiority trials including those in JAMA from 2010-1016, we found that the average delta for those using an absolute risk difference (n=137) was 8.7%, almost 6 times higher! This is laudable, but was aided by a low estimated event rate in the control group which means that the sample size of ~1900 was feasible given what I assume were relatively low costs of the study. Kudos to the authors too, for concretely justifying delta in the methods section.

Applied Respiratory Physiology Vlog. Parts 1,2,3,4: Respiratory Failure Explained as Workload Imbalance

Here is another cross-post from Status Iatrogenicus that I think is worthwhile for anybody dealing with respiratory failure.

Applied Respiratory Physiology Vlog. Parts 1,2,3,4: Respiratory Failure Explained as Workload Imbalance

Tuesday, September 26, 2017

DIPSHIS: Diprivan Induced Pseudo-Shock & Hypoxic Illness Syndrome

Here's the occasional cross-post from Status Iatrogenicus, as it has relevance to research as well as medical practice. Original post here: DIPSHIS

This would be a very informative case report (and it's true and unexaggerated), but I anticipate staunch editorial resistance (even sans puns), so I'll describe it here and have some fun with it.

Background: The author has anecdotally observed for many years that so-called "septic shock" follows rather than precedes intubation and sedation. This raises the possibility that some proportion of what we call septic (or other) shock is iatrogenic and induced by sedative agents rather than progression of the underlying disease process.

Methods: Use of a case report as a counterfactual to the common presumption that shock occurring after intubation and sedation is consequent to the underlying disease process rather than associated medical interventions.

Results: A 20-something man was admitted with pharyngitis, multilobar pneumonia (presumed bacterial) and pneumomediastinum (presumed from coughing). He met criteria for sepsis with RR=40, HR=120, T=39, BP 130/70. He was treated with antibiotics and supportive care but remained markedly tachypneic with rapid shallow respirations, despite absence of subjective respiratory distress. A dialectic between a trainee and the attending sought to predict whether he was "tiring out" and/or "going into ARDS", but yielded equipoise/a stalemate. A decision was made to intubate the patient and re-evaluate the following day. After intubation, he required high doses of propofol (Diprivan) for severe agitation, and soon had a wide pulse pressure hypotension, which led to administration of several liters of fluids and initiation of a noradrenaline infusion overnight. He was said to have "gone into shock" and "progressed to ARDS", as his oxygen requirements doubled to 80% from 40% and PEEP had been increased from 8 to 16. The next morning, out of concern that "shock" and "ARDS" were iatrogenic complications given considerations of temporality to other interventions, sedation and vasopressors were abruptly discontinued, diuresis of 2 liters achieved, and the patient was successfully extubated and discharged from the ICU a day later.

Conclusions: This case provides anecdotal "proof of concept" for the counterfactual that is often unseen: Patients "go into shock" and "progress to ARDS" not in spite of treatment, but because of it. The author terms this syndrome, in the context of Diprivan (propofol) in the ICU setting, "DIPSHIS". The incidence of DIPSHIS is unknown and many be underestimated because of difficulty in detection fostered by cultural biases in the care of critically ill medical patients. Anesthesiologists have long recognized DIPSHIS but have not needed to name it, because they do not label as "shock" anesthetic-induced hypotension in the operating theater - they just give some ephedrine until the patient recovers. DIPSHIS has implications for the epidemiological and therapeutic study of "septic shock" as well as for hospital coding and billing.

Sunday, August 27, 2017

Just Do As I (Vaguely) Say: The Folly of Clinical Practice Guidelines

If you didn't care to know anything about finance, and you hired a financial adviser (paid hourly, not through commissions, of course) you would be happy to have him simply tell you to invest all of your assets into a Vanguard life cycle fund. But you may then be surprised that a different adviser told one of your contemporaries that the approach was oversimple and that you should have several classes of assets in your portfolio that are not included in the life cycle funds, such as gold or commodities. In light of the discrepancies, you may conclude that to make the best economic choices for yourself, you need to understand finance and the data upon which the advisers are basing their recommendations.

Making medical decisions optimally is akin to making economic decisions and is founded on a simple framework: EUT, or Expected Utility Theory. To determine whether to pursue a course of action versus another one, we add up the benefits of a course multiplied by their probability of accruing (that product is the positive utility of the course of action) and then subtract the product of the costs of the course of action and their probability of accruing (the negative utility). If utility is positive, we pursue a course of action, and if options are available, we pursue the course with the highest positive utility. Ideally, anybody helping you navigate such a decision framework would tell you the numbers so you could do the calculus. Using the finance analogy again, if the adviser told you "Stocks have positive returns. So do bonds. Stocks are riskier than bonds" - without any quantification, you may conclude that a portfolio full of bonds is the best course of action - and usually it is not.

I regret to report that that is exactly what clinical practice guideline writers do: provide summary information without any numerical data to support it, leaving the practitioner with two choices:

Just do as the guideline writer says
Go figure it out for herself with a primary data search

Why Most True Research Findings Are Useless

In his provocative essay in PLOS Medicine over a decade ago, Ioannidis argued that most published research findings are false, owing to a variety of errors such as p-hacking, data dredging, fraud, selective publication, researcher degrees of freedom, and many more. In my permutation of his essay, I will go a step further and suggest that even if we limit our scrutiny to tentatively true research findings (scientific truth being inherently tentative), most research findings are useless.

My choice of the word "useless" may seem provocative, and even untenable, but it is intended to have an exquisitely specific meaning: I mean useless in an economic sense of "having zero or negligible net utility", in the tradition of Expected Utility Theory [EUT], for individual decision making. This does not mean that true findings are useless for the incremental accrual of scientific knowledge and understanding. True research findings may be very valuable from the perspective of scientific progress, but still useless for individual decision making, whether it is the individual trying to determine what to eat to promote a long healthy life, or the physician trying to decide what to do for a patient in the ICU with delirium. When evaluating a research finding that is thought to be true, and may at first blush seem important and useful, it is necessary to make a distinction between scientific utility and decisional utility. Here I will argue that while many "true" research findings may have scientific utility, they have little decisional utility, and thus are "useless".

Tipping the Scales of Noninferiority: Abbott's "Emboshield and Xact Carotid Stent System"

I just stumbled across this and think it's worth musing over it a bit. The recently published ACT I trial by Rosenfield et al is a noninferiority trial of an already approved device, the "emboshield embolic protection system" used in conjunction with the "Xact carotid stent system" both proprietary devices from Abbott. I'm scrutinizing this trial (and others) to determine if adequate justification is given for the noninferiority hypothesis around which the trial is designed. One thing I'm looking for is evidence that there are clear secondary advantages of the novel or experimental therapy that justify accepting some degree of worse efficacy, compared to the active control, which falls within the prespecified margin of noninferiority. This is what the authors (or their ghosts) write in the introduction:

"Most carotid revascularization procedures in the United States are carotid endarterectomies performed for the treatment of asymptomatic atherosclerotic disease. Revascularization is also performed by means of stenting with devices to capture and remove emboli (“embolic protection” devices).3,4 In the Carotid Revascularization Endarterectomy versus Stenting Trial (CREST), no significant difference was found between carotid endarterectomy and stenting with embolic protection for the treatment of atherosclerotic carotid bifurcation stenosis with regard to the composite end point of stroke, death, or myocardial infarction.5 CREST included both symptomatic and asymptomatic patients, and it was not sufficiently powered to discern whether the carotid endarterectomy and stenting with embolic protection were equivalent according to symptomatic status. The primary aim of the Asymptomatic Carotid Trial (ACT) I was to compare the outcomes of carotid endarterectomy versus stenting with embolic protection in patients with asymptomatic severe carotid-artery stenosis who were at standard risk for surgical complications."

That's a mouthful, to say the least, and probably ought to be expectorated.

The Normalization Fallacy: Why Much of “Critical Care” May Be Neither

Like many starry-eyed medical students, I was drawn to critical care because of the high stakes, its physiological underpinnings, and the apparent fact that you could take control of that physiology and make it serve your goals for the patient. On my first MICU rotation in 1997, I was so swept away by critical care that I voluntarily stayed on through the Christmas holiday and signed up for another elective MICU rotation at the end of my 4^th year. On the last night of that first rotation, wistful about leaving, I sauntered through the unit a few times thinking how I would miss the smell of the MICU and the distinctive noise of the Puritan Bennett 7200s delivering their [too high] tidal volumes. By then I could even tell you whether the patient’s peak pressures were high (they often were) by the sound the 7200 made after the exhalation valve released. I was hooked, irretrievably.

I still love thinking about physiology, especially in the context of critical illness, but I find that I have grown circumspect about its manipulation as I have reflected on the developments in our field over the past 20 years. Most – if not all – of these “developments” show us that we were harming patients with a lot of the things we were doing. Underlying many now-abandoned therapies was a presumption that our understanding of physiology was sufficient that we could manipulate it to beneficial ends. This presumption hints at an underlying set of hypotheses that we have which guide our thinking in subtle but profound and pervasive ways. Several years ago we coined the term the “normalization heuristic” (we should have called it the “normalization fallacy”) to describe our tendency to view abnormal laboratory values and physiological parameters as targets for normalization. This approach is almost reflexive for many values and parameters but on closer reflection it is based on a pivotal assumption: that the targets for normalization are causally related to bad outcomes rather than just associations or even adaptations.

Don't Get Soaked: The Practical Utility of Predicting Fluid Responsiveness

In this article in the September 27th issue of JAMA, the authors discuss the rationale and evidence for predicting fluid responsiveness in hemodynamically unstable patients. While this is a popular academic topic, its practical importance is not as clear. Some things, such as predicting performance on a SBT with a Yang-Tobin f/Vt, don't make much sense - just do the SBT if that's the result you're really interested in. The prediction of whether it will rain today is not very important if the difference in what I do is as small as tucking an umbrella into my bag or not. Neither the inconvenience of getting a little wet walking from the parking garage nor that of carrying the umbrella is very great. Similarly, a prediction of whether or not it will rain two months from now when I'm planning a trip to Cancun is not very valuable to me because the confidence intervals about the estimate are too wide to rely upon. Better to just stick with the base rates: how much rainfall is there in March in the Caribbean on an average year?

Our letter to the editor was not published in JAMA, so I will post it here:

To the Editor: A couple of issues relating to the article about predicting responsiveness to fluid bolus¹ deserve attention. First, the authors made a mathematical error that may cause confusion among readers attempting to duplicate the Bayesian calculations described in article. The negative predictive value (NPV) of a test is the proportion of patients with a negative test who do not have the condition – the true negative rate.² In each of the instances in which NPV is mentioned in the article, the authors mistakenly report the proportion of patients with a negative test who do have the condition. This value, 1-NPV, is the false negative rate - the posterior probability of the condition in those with a negative test.

Second, in the examples that discuss NPV, the authors use a prior probability of fluid responsiveness of 50%. A clinician who appropriately uses a threshold approach to decision making³ must determine a probability threshold above which treatment is warranted, considering the net utility of all possible outcomes with and without treatment given that treatment’s risks and benefits⁴. Because the risk of fluid administration in judicious quantities is low⁵, the threshold for fluid administration is correspondingly low and fluid bolus may be warranted based on prior probability alone, thus obviating additional testing. Even if additional testing is negative and suggests a posterior probability of fluid responsiveness of only 10% (with an upper 95% confidence limit of 18%), many clinicians would still judge a trial of fluids to be justified because fluids are considered to be largely benign and untreated hypovolemia is not⁴. (The upper confidence limit will be higher still if the prior probability was underestimated.) Finally, the posterior probabilities hinge critically on the estimates of prior probabilities, which are notoriously nebulous and subjective. Clinicians are likely intuitively aware of these quandaries, which may explain why empiric fluid bolus is favored over passive leg raise testing outside of academic treatises⁶.

1. Bentzer P, Griesdale DE, Boyd J, MacLean K, Sirounis D, Ayas NT. WIll this hemodynamically unstable patient respond to a bolus of intravenous fluids? JAMA. 2016;316(12):1298-1309.

2. Fischer JE, Bachmann LM, Jaeschke R. A readers' guide to the interpretation of diagnostic test properties: clinical example of sepsis. Intensive Care Med. 2003;29(7):1043-1051.

3. Pauker SG, Kassirer JP. The threshold approach to clinical decision making. N Engl J Med. 1980;302(20):1109-1117.

4. Tsalatsanis A, Hozo I, Kumar A, Djulbegovic B. Dual Processing Model for Medical Decision-Making: An Extension to Diagnostic Testing. PLoS One. 2015;10(8):e0134800.

5. Investigators TP. A Randomized Trial of Protocol-Based Care for Early Septic Shock. N Engl J Med. 2014;370(18):1683-1693.

6. Marik PE, Monnet X, Teboul J-L. Hemodynamic parameters to guide fluid therapy. Annals of Intensive Care. 2011;1:1-1.

Scott K Aberegg, MD, MPH

Andrew M Hersh, MD

The University of Utah School of Medicine

Salt Lake City, Utah

Thursday, January 5, 2017

RCT Autopsy: The Differential Diagnosis of a Negative Trial

At many institutions, Journal Clubs meet to dissect a trial after its results are published to look for flaws, biases, shortcomings, limitations. Beyond the dissemination of the informational content of the articles that are reviewed, Journal Clubs serve as a reiteration and extension of the limitations part of the article discussion. Unless they result in a letter to the editor, or a new peer-reviewed article about the limitations of the trial that was discussed, the debates of Journal Club begin a headlong recession into obscurity soon after the meeting adjourns.

The proliferation and popularity of online media has led to what amounts to a real-time, longitudinally documented Journal Club. Named “post-publication peer review” (PPPR), it consists of blog posts, podcasts and videocasts, comments on research journal websites, remarks on online media outlets, and websites dedicated specifically to PPPR. Like a traditional Journal Club, PPPR seeks to redress any deficiencies in the traditional peer review process that lead to shortcomings or errors in the reporting or interpretation of a research study.

PPPR following publication of a “positive” trial, that is one where the authors conclude that their a priori criteria for rejecting the null hypothesis were met, is oftentimes directed at the identification of a host of biases in the design, conduct, and analysis of the trial that may have led to a “false positive” trial. False positive trials are those in which either a type I error has occurred (the null hypothesis was rejected even though it is true and no difference between groups exists), or the structure of the experiment was biased in such a way as that the experiment and its statistics cannot be informative. The biases that cause structural problems in a trial are manifold, and I may attempt to delineate them at some point in the future. Because it is a simpler task, I will here attempt to list a differential diagnosis that people may use in PPPRs of “negative” trials.

Hiding the Evidence in Plain Sight: One-sided Confidence Intervals and Noninferiority Trials

In the last post, I linked a video podcast of my explaining non-inferiority trials and their inherent biases. In this videocast, I revisit noninferiority trials and the use of one-sided confidence intervals. I review the Salminen et al noninferiority trial of antibiotics versus appendectomy for the treatment of acute appendicitis in adults. This trial uses a very large delta of 24%. The criteria for non-inferiority were not met even with this promiscuous delta. But the use of a 1-sided 95% confidence interval concealed a more damning revelation in the data. Watch the 13 minute videocast to learn what was hidden in plain sight!

Erratum: at 1:36 I say "excludes an absolute risk difference of 1" and I meant to say "excludes an absolute risk difference of ZERO." Similarly, at 1:42 I say "you can declare non-inferiority". Well, that's true, you can declare noninferiority if your entire 95% confidence interval falls to the left of an ARD of 0 or a HR of 1, but what I meant to say is that if that is the case "you can declare superiority."

Also, at 7:29, I struggle to remember the numbers (woe is my memory!) and I place the point estimate of the difference, 0.27, to the right of the delta dashed line at .24. This was a mistake which I correct a few minutes later at 10:44 in the video. Do not let it confuse you, the 0.27 point estimates were just drawn slightly to the right of delta and they should have been marked slightly to the left of it. I would re-record the video (labor intensive) or edit it, but I'm a novice with this technological stuff, so please do forgive me.

Finally, at 13:25 I say "within which you can hide evidence of non-inferiority" and I meant "within which you can hide evidence of inferiority."

Again, I apologize for these gaffes. My struggle (and I think about this stuff a lot) in speaking about and accurately describing these confidence intervals and the conclusions that derive from them result from the arbitrariness of the CONSORT "rules" about interpretation and the arbitrariness of the valences (some articles use negative valence for differences favoring "new" some journals use positive values to favor "new"). If I struggle with it, many other readers, I'm sure, also struggle in keeping things straight. This is fodder for the argument that these "rules" ought to be changed and made more uniform, for equity and ease of understanding and interpretation of non-inferiority trials.

It made me feel better to see this diagram in Annals of Internal Medicine (Perkins et al July 3, 2012, online ACLS training) where they incorrectly place the point estimate at slightly less than -6% (to the left of the dashed delta line in the Figure 2), when it should have been placed slightly greater than -6% (to the right of the dashed delta line). Clicking on the image will enlarge it.

Saturday, June 11, 2016

Non-inferiority Trials Are Inherently Biased: Here's Why

Debut VideoCast for the Medical Evidence Blog, explaining non-inferiority trial design and exposing its inherent biases:

In this related blog post, you can find links to the CONSORT statement in the Dec 26, 2012 issue of JAMA and a link to my letter to the editor.

Addendum: I should have included this in the video. See the picture below. In the first example, top left, the entire 95% CI favoring "new" therapy lies in the "zone of indifference", that is, the pre-specified margin of superiority, a mirror image of the "pre-specified margin of noninferiority, in this case delta= +/- 0.15. Next down, the majority of the 95% CI of the point estimate favoring "new" therapy lies in the "margin of superiority" - so even though the lower end of the 95% CI crosses "mirror delta", the best guess is that the effect of therapy falls in the zone of indifference. In the lowest example, labeled "Truly Superior", the entire 95% confidence interval falls to the left of "mirror delta" thus reasonable excluding all point estimates in the "zone of indifference" (i.e. +/- delta) and all point estimates favoring the "old" therapy. This would, in my mind, represent "true superiority" in a logical, rational, and symmetrical way that would be very difficult to mount arguments against.

Added 9/20/16: For those who question my assertion that the designation of "New" versus "Old" or "comparator" therapy is arbitrary, here is the proof: In this trial, the "New" therapy is DMARDs and the comparator is anti-tumour necrosis factor agents for the treatment of rheumatoid arthritis. The rationale for this trial is that the chronologically newer anti-TNF agents are very costly, and the authors wanted to see if similar improvements in quality of life could be obtained with chronologically older DMARDs. So what is "new" is certainly in the eye of the beholder. Imagine colistin 50 years ago, being tested against, say, a newer spectrum penicillin. The penicillin would have been found to be non-inferior, but with a superior side effect profile. Fast forward 50 years and now colistin could be the "new" resurrected agent and be tested against what 10 years ago was the standard penicillin but is now "old" because of development of resistance. Clearly, "new" and "old" are arbitrary and flexible designations.

Wednesday, June 8, 2016

Once Bitten, Twice Try: Failed Trials of Extubation

“When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.” – Clark’s First Law

It is only fair to follow up my provocative post about a “trial of extubation” by chronicling a case or two that didn’t go as I had hoped. Reader comments from the prior post described very low re-intubation rates. As I alluded in that post, decisions regarding extubation represent the classic trade-off between sensitivity and specificity. If your test for “can breathe spontaneously” has high specificity, you will almost never re-intubate a patient. But unless the criteria used have correspondingly high sensitivity, patients who can breathe spontaneously will be left on the vent for an extra day or two. Which you (and your patients) favor, high sensitivity or high specificity (assuming you can’t have both) depends upon the values you ascribe to the various outcomes. Though these are many, it really comes down to this: what do you think is worse (or more fearsome), prolonged mechanical ventilation, or reintubation?

What we fear today we may not seem so fearsome in the future. Surgeons classically struggled with the sensitivity and specificity trade-off in the decision to operate for suspected appendicitis. “If you never have a negative laparotomy, you’re not operating enough” was the heuristic. But this was based on the notion that failure to operate on a true appendicitis would lead to serious untoward outcomes. More recent data suggest that this may not be so, and many of those inflamed appendices could have been treated with antibiotics in lieu of surgery. This is what I’m suggesting with reintubation. I don’t think the Epstein odds ratio (~4) of mortality for reintubation from 1996 applies today, at least not in my practice.

Trial of Extubation: An Informed Empiricist’s Approach to Ventilator Weaning

“The only way of discovering the limits of the possible is to venture a little way past them into the impossible.” –Clark’s Second Law

In the first blog post, Dr. Manthous invited Drs. Ely, Brochard, and Esteban to respond to a simple vignette about a patient undergoing weaning from mechanical ventilation. Each responded with his own variation of a cogent, evidence based, and well-referenced/supported approach. I trained with experts of similar ilk using the same developing evidence base, but my current approach has evolved to be something of a different animal altogether. It could best be described as a “trial of extubation”. This approach recently allowed me to successfully extubate a patient 15 minutes into a trial of spontaneous breathing, not following commands, on CPAP 5, PS 5, FiO2 0.5 with the vital parameters in the image accompanying this post (respiratory rate 38, tidal volume 350, heart rate 129, SpO2 88%, temperature 100.8). I think that any account of the “best” approach to extubation should offer an explanation as to how I can routinely extubate patients similar to this one, who would fail most or all of the conventional prediction tests, with a very high success rate.

A large part of the problem lies in shortcomings of the data upon which conventional prediction tests rely. For example, in the landmark Yang and Tobin report and many reports that followed, sensitivity and specificity were calculated considering physicians’ “failure to extubate” a patient as equivalent to an “extubation failure”. This conflation of two very different endpoints makes estimates of sensitivity and specificity unreliable. Unless every patient with a prediction test is extubated, the sensitivity of a test for successful extubation is going to be an overestimate, as suggested by Epstein in 1995. Furthermore, all studies have exclusion criteria for entry, with the implicit assumption that excluded patients would not be extubatable with the same effect of increasing the apparent sensitivity of the tests.

Even if we had reliable estimates of sensitivity and specificity of prediction tests, the utility calculus has traditionally been skewed towards favoring specificity for extubation success, largely on the basis of a single 20-year old observational study suggesting that patients who fail extubation have a higher odds of mortality. I do not doubt that if patients are allowed to “flail” after it becomes clear that they will not sustain unassisted ventilation, untoward outcomes are likely. However, in my experience and estimation, this concern can be obviated by bedside vigilance by nurses and physicians in the several hours immediately following extubation (with the caveat that a highly skilled airway manager is present or available to reintubate if necessary). Furthermore, this period of observation provides invaluable information about the cause of failure in the event failure ensues. There need be no further guesswork about whether the patient can protect her airway, clear her secretions, maintain her saturations, or handle the work of breathing. With the tube removed, what would otherwise be a prediction about these abilities becomes an observation, a datapoint that can be applied directly to the management plan for any subsequent attempt at extubation should she fail – that is, the true weak link in the system can be pinpointed after extubation.

The specificity-heavy utility calculus, as I have opined before, will fail patients if I am correct that an expeditious reintubation is not harmful, but each additional day spent on the ventilator confers incremental harm. Why don’t I think reintubations are harmful? Because when my patients fail, I am diligent about rapid recognition, I reintubate without observing complications, and often I can extubate successfully the next day, as I did a few months ago in a patient with severe ARDS. She had marginal performance (i.e., she failed all prediction tests) and was extubated, failed, was reintubated, then successfully extubated the next day. (I admit that it was psychologically agonizing to extubate her the next day. They say that a cat that walks across a hot stove will never do so again. It also will not walk on a cold stove again. This psychology deserves a post of its own.)

When I tweeted the image attached to this post announcing that the patient (and many like her) had been successfully extubated, there was less incredulity than I expected, but an astute follower asked – “Well, then, how do you decide whom and when to extubate?” I admit that I do not have an algorithmic answer to this question. Experts in opposing camps of decision psychology such as Kahneman and his adherents in the heuristics and biases camp and Gary Klein, Gird Gigerenzer and others in the expert intuition camp could have a heyday here, and perhaps some investigation is in order. I can summarize by saying that it has been an evolution over the past 10 or so years. I use everything I learned from the conventional, physiologic, algorithmic, protocolized, data-driven, evidence-based approach to evaluate a patient. But I have gravitated to being more sensitive, to capture those patients that the predictors say should fail, and I give them a chance – a “trial of extubation.” If they fail, I reintubate quickly. I pay careful attention to respiratory parameters, mental status, and especially neuromuscular weakness, but I integrate this information into my mental map of the natural history of the disease and the specific patient’s position along that course to judge whether they have even a reasonable modicum of a chance of success. If they do, I “bite the bullet and pull it.”

I do not eschew data, I love data. But I am quick to recognize their limitations. Data are generated for many reasons and have different values to different people with different prerogatives. From the clinician’s and the patient’s perspective, the data are valuable if they reduce the burden of illness. I worry that the current data and the protocols predicated on them are failing to capture many patients who are able to breathe spontaneously but are not being given the chance. Hard core evidence based medicine proponents and investigators need not worry though, because I have outlined a testable hypothesis: that a “trial of extubation” in the face of uncertainty is superior to the use of prediction tests and protocols. The difficult part will be determining the inclusion and exclusion criteria, and no matter what compromise is made uncertainty will remain, reminding us that science is an iterative, evolving enterprise, with conclusions that are always tentative.

Monday, May 2, 2016

Hope: The Mother of Bias in Research

I realized the other day that underlying every slanted report or overly-optimistic interpretation of a trial's results, every contorted post hoc analysis, every Big Pharma obfuscation, is hope. And while hope is generally a good, positive emotion, it engenders great bias in the interpretation of medical research. Consider this NYT article from last month: "Dashing Hopes, Study Shows Cholesterol Drug Had No Effect on Heart Health." The title itself reinforces my point, as do several quotes in the article.

“All of us would have put money on it,” said Dr. Peter Libby, a Harvard cardiologist. The drug, he said, “was the great hope.”

Again, hope is wonderful, but it blinds people to the truth in everyday life and I'm afraid researchers are no more immune to its effects than the laity. In my estimation, three main categories of hope creep into the evaluation of research and foments bias:

Hope for a cure, prevention, or treatment for a disease (on the part of patients, investigators, or both)
Hope for career advancement, funding, notoriety, being right (on the part of investigators) and related sunk cost bias
Hope for financial gain (usually on the part of Big Pharma and related industrial interests)

Consider prone positioning for ARDS. For over 20 years, investigators have hoped that prone positioning improves not only oxygenation but also outcomes (mostly mortality). So is it any wonder that after the most recent trial, in spite of the 4 or 5 previous failed trials, the community enthusiastically declared "success!" "Prone Positioning works!" Of course it is no wonder - this has been the hope for decades.

But consider what the most recent trial represents through the lens of replicability: a failure to replicate previous results showing that prone positioning does not improve mortality. The recent trial is the outlier. It is the "false positive" rather than the previous trials being the "false negatives."

This way of interpreting the trials of prone positioning in the aggregate should be an obvious one, and it astonishes me that it took me so long to see the results this way - as a single failure to replicate previously replicable negative results. But it hearkens to the underlying bias - we view results through the magnifying glass of hope, and it distorts our appraisal of the evidence.

Indeed, I have been accused of being a nihilist because of my views on this blog, which some see as derogating the work of others or an attempt to dash their hopes. But these critics engage, or wish me to engage in a form of outcome bias - the value of the research lies in the integrity of its design, conduct, analysis, and reporting, not in its results. One can do superlative research and get negative results, or shoddy research and get positive results. My goal here is and always has been to judge the research on its merits, regardless of the results or the hopes that impel it.

(Aside: Cholesterol researchers have a faith or hope in the cholesterol hypothesis - that cholesterol is a causal factor in pathways to cardiovascular outcomes. Statin data corroborate this, and preliminary PCSK9 inhibitor data do, too. But how quickly we engage in hopeful confirmation bias! If cholesterol is a causal factor, it should not matter how you manipulate it - lower the cholesterol, lower cardiovascular events. The fact that it does appear to matter how you lower it suggests that either there are multiplicity of agent effects (untoward and unknown effects of some agents negate some their beneficial effects in the cholesterol causal pathway) or that cholesterol levels are epiphenomena - markers of the effects of statins and PCSK9 inhibitors on the real, but as yet undelineated causal pathways. Maybe the fact that we can easily measure cholesterol and that it is associated with outcomes in untreated individuals is a convenient accident of history that led us to trial statins which work in ways that we do not yet understand.)

Tuesday, February 23, 2016

Much Ado About Nothing? The Relevance of New Sepsis Definitions for Clinical Care Versus Research

What's in a name? That which we call a rose, by any other name would smell as sweet. - Shakespeare, Romeo and Juliet Act II Scene II

The Society of Critical Care Medicine is meeting this week, JAMA devoted an entire issue to sepsis and critical illness, and my twitter feed is ablaze with news of release of a new consensus definition of sepsis. Much laudable work has been done to get to this point, even as the work is already generating controversy (Is this a "first world" definition that will be forced upon second and third world countries where it may have less external validity? Why were no women on the panel?). Making the definition of sepsis more reliable, from a sensitivity and specificity standpoint (more accurate) is a step forward for the sepsis research enterprise, for it will allow improved targeting of inclusion criteria for trials of therapies for sepsis, and better external validity when those therapies are later applied in a population that resembles those enrolled. But what impact will/should the new definition have on clinical care? Are the-times-they-are-a-changing?

Diagnosis, a fundamental goal of clinical medicine is important for several reasons, chief among them:

To identify the underlying cause of symptoms and signs so that treatments specific to that illness can be administered
To provide information on prognosis, natural history, course, etc for patients with or without treatment
To reassure the physician and patients that there is an understanding of what is going on; information itself has value even if it is not actionable

Thus redefining sepsis (or even defining it in the first place) is valuable if it allows us to institute treatments that would not otherwise be instituted, or provides prognostic or other information that is valuable to patients. Does it do either of those two things?

A Focus on Fees: Why I Practice Evidence Based Medicine Like I Invest for Retirement

He is the best physician who knows the worthlessness of the most medicines." - Ben Franklin

This blog has been highly critical of evidence, taking every opportunity to strike at any vulnerability of a trial or research program. That is because this is serious business. Lives and limbs hang in the balance, pharmaceutical companies stand to gain billions from "successful" trials, investigators' careers and funding are on the line if chance findings don't pan out in subsequent investigations, sometimes well-meaning convictions blind investigators and others to the truth; in short, the landscape is fertile for bias, manipulation, and even fraud. To top it off, many of the questions about how to practice or deal with a particular problem have scant or no evidence to bear upon them, and practitioners are left to guesswork, convention, or pathophysiological reasoning - and I'm not sure which among these is most threatening. So I am often asked, how do you deal with the uncertainty that arises from fallible evidence or paucity of evidence when you practice?

I have ruminated about this question and how to summarize the logic of my minimalist practice style for some time but yesterday the answer dawned on me: I practice medicine like I invest in stocks, with a strategy that comports with the data, and with precepts of rational decision making.

Investors make numerous well-described and wealth destroying mistakes when they invest in stocks. Experts such as John Bogle, Burton Malkiel, David Swenson and others have written influential books on the topic, utilizing data from studies in economics (financial and behavioral). Key among the mistakes that investors make are trying to select high performers (such as mutual funds or hedge fund managers), chasing performance, and timing the market. The data suggest that professional stock pickers fare little better than chance over the long run, that you cannot discern who will beat the average over the long run, and that the excess fees you are charged by high performers will negate any benefit they might otherwise have conferred to you. The experts generally recommend that you stick with strategies that are proven beyond a reasonable doubt: a heavy concentration in stocks with their long track record of superior returns, diversification, and strict minimization of fees. Fees are the only thing you can guarantee about your portfolio's returns.

Diamox Results in Urine: General and Specific Lessons from the DIABOLO Acetazolamide Trial

The trial of acetazolamide to reduce duration of mechanical ventilation in COPD patients was published in JAMA this week. I will use this trial to discuss some general principles about RCTs and make some comments specific to this trial.

My arguable but strong prior belief, before I even read the trial, is that Diamox (acetazolamide) is ineffectual in acute and chronic respiratory failure, or that it is harmful. Its use is predicated on a "normalization fallacy" which guides practitioners to try attempt to achieve euboxia (normal numbers). In chronic respiratory acidosis, the kidneys conserve bicarbonate to maintain normal pH. There was a patient we saw at OSU in about 2008 who had severe COPD with a PaCO2 in the 70s and chronic renal failure with a bicarbonate under 20. A well-intentioned but misguided resident checked an ABG and the patient's pH was on the order of 7.1. We (the pulmonary service) were called to evaluate the patient for MICU transfer and intubation, and when we arrived we found him sitting at the bedside comfortably eating breakfast. So it would appear that if the kidneys can't conserve enough bicarbonate to maintain normal pH, patients can get along with acidosis, but obviously evolution has created systems to maintain normal pH. Why you would think that interfering with this highly conserved system to increase minute ventilation in a COPD patient you are trying to wean is beyond the reach of my imagination. It just makes no sense.

This brings us to a major problem with a sizable proportion of RCTs that I read: the background/introduction provides woefully insufficient justification for the hypothesis that the RCT seeks to test. In the background of this paper, we are sent to references 4-14. Here is a summary of each:

4.) A review of metabolic alkalosis in a general population of critically ill patients
5.) An RCT of acetazolamide for weaning COPD patients showing that it doesn't work
6.) Incidence of alkalosis in hospitalized patients in 1980
7.) A 1983 translational study to delineate the effect of acetazolamide on acid base parameters in 10 paitnets
8.) A 1982 study of hemodynamic parameters after acetazolamide administration in 12 patients
9.) A study of metabolic and acid base parameters in 14 patients with cystic fibrosis

10.) A retrospective epidemiological descriptive study of serum bicarbonate in a large cohort of critically ill patients

11.) A study of acetazolamide in anesthetized cats

12 - 14). Commentary and pharmacodynamic studies of acetazolamide by the authors of the current study

Narrated and Abridged: There is (No) Evidence for That: Epistemic Problems in Critical Care Medicine

Below is the narrated video of my powerpoint presentation on Epistemic Problems in Critical Care Medicine, which provides a framework for understanding why we have both false positives and false negatives in clinical trials in critical care medicine and why we should be circumspect about our "evidence base" and our "knowledge". This is not trivial stuff, and is worth the 35 minutes required to watch the narration of the slideshow. It is a provocative presentation which gives compelling reasons to challenge our "evidence base" in critical care and medicine in general, in ways that are not widely recognized but perhaps should be, with several suggestions about assumptions that need to be challenged and revised to make our models of reality more reliable. Please contact me if you would like me to give an iteration of this presentation at your institution.

Tuesday, November 10, 2015

Peersnickety Review: Rant on My Recent Battle With Peer Reviewers

I'd like to relate a tale of exasperation with the peer review process that I recently experienced and that is probably all too familiar - but one that most folks are too timid to complain publicly about.

Nevermind that laypersons think that peer review means that your peers are reviewing your actual data for accuracy and fidelity (they are not, they are reviewing only your manuscript, final analyses, and conclusions), which causes them to be perplexed when revelations of fraudulent data published in top journals are reported. Nevermind that the website Retraction Watch, which began as a small side project now has daily and twice daily postings of retracted papers. Nevermind that some scientists have built entire careers on faked data. Nevermind that the fact that something has been peer reviewed provides very little in the way of assurance that the report contains anything other than rubbish. Nevermind that leading investigators publish the same reviews over and over in different journals with the same figures and sometimes the same text.

The entire process is cumbersome, time consuming, frustrating, and of dubious value as currently practiced.

Last year I was invited by the editors of Chest to write a "contemporary review of ionized calcium in the ICU - should it be measured? should it be treated?" I am not aware of why I was selected for this, but I infer that someone suggested me as the author because of my prior research in medical decision making and because of the monograph we wrote several years back called Laboratory Testing in the ICU which applied principles of rational decision making such as Bayesian methods and back-of-the-envelope cost benefit analyses to make a framework of rational laboratory testing in the ICU. I accepted the invitation, even knowing it would entail a good deal of work for me that would be entirely uncompensated, save for buttressing my fragile ego, he said allegorically.

Now, consider for an instant the extra barriers that I, as a non-academic physician faced in agreeing to do this. As a non-academic physician, I do not have access to a medical library, and of course the Chest editors do not have a way to grant me access. That is, non-academic physicians doing scholarly work such as this are effectively disenfranchised from the infrastructure that they need to do scholarly work. Fortunately for me, my wife was a student at the University of Utah during this time so I was able to access the University library with her help. Whether academic centers and peer-reviewed journals ought to have a monopoly on this information is a matter for debate elsewhere, and not a trivial one.

When Hell Freezes Over: Trials of Temperature Manipulation in Critical Illness

The bed is on fire

Two articles published online ahead of print in the NEJM last week deal with actual and attempted temperature manipulation to improve outcomes in critically ill patients.

The Eurotherm3235 trial was stopped early because of concerns of harm or futility. This trial enrolled patients with traumatic brain injury (TBI) and elevated intracranial pressure (ICP) and randomized them to induced hypothermia (which reduces ICP) versus standard care. There was a suggestion of worse outcomes in the hypothermia group. I know that the idea that we can help the brain with the simple maneuver of lowering body temperature has great appeal and what some would call "biological plausibility" a term that I henceforth forsake and strike from my vocabulary. You can rationalize the effect of an intervention any way you want using theoretical biological reasoning. So from now on I'm not going to speak of biological plausibility, I will call it biological rationalizing. A more robust principle, as I have claimed before, is biological precedent - that is, this or that pathway has been successfully manipulated in a similar way in the past. It is reasonable to believe that interfering with LDL metabolism will improve cardiovascular outcomes because of decades of trials of statins (though agents used to manipulate this pathway are not all created equal). It is reasonable to believe that intervening with platelet aggregation will improve outcomes from cardiovascular disease because of decades of trials of aspirin and plavix and others. It is reasonable to doubt that manipulation of body temperature will improve any outcome because there is no unequivocal precedent for this, save for warming people with hypothermia from exposure - which basically amounts to treating the known cause of their ailment. This is one causal pathway that we understand beyond a reasonable doubt. If you get exposure, you freeze to death. If we find you still alive and warm you, you may well survive.

Early Mobility in the ICU: The Trial That Should Not Be

I learned via twitter yesterday that momentum is building to conduct a trial of early mobility in critically ill patients. While I greatly respect many of the investigators headed down this path, forthwith I will tell you why this trial should not be done, based on principles of rational decision making.

A trial is a diagnostic test of a hypothesis, a complicated and costly test of a hypothesis, and one that entails risk. Diagnostic tests should not be used indiscriminately. That the RCT is a "Gold Standard" in the hierarchy of testing hypotheses does not mean that we should hold it sacrosanct, nor does it follow that we need a gold standard in all cases. Just like in clinical medicine, we should be judicious in our ordering of diagnostic tests.

The first reason that we should not do a trial of early mobility (or any mobility) in the ICU is because in the opinion of this author, experts in critical care, and many others, early mobility works. We have a strong prior probability that this is a beneficial thing to be doing (which is why prominent centers have been doing it for years, sans RCT evidence). When the prior probability is high enough, additional testing has decreasing yield and risks false negative results if people are not attuned to the prior. Here's my analogy - a 35 year old woman with polycystic kidney disease who is taking birth control presents to the ED after collapsing with syncope. She had shortness of breath and chest pain for 12 hours prior to syncope. Her chest x-ray is clear and bedside ultrasound shows a dilated right ventricle. The prior probability of pulmonary embolism is high enough that we don't really need further testing, we give anticoagulants right away. Even if a V/Q scan (creatnine precludes CT) is "low probability" for pulmonary embolism, we still think she has it because the prior probability is so high. Indeed, the prior probability is so high that we're willing to make decisions without further testing, hence we gave heparin. This process follows the very rational Threshold Approach to Decision Making approach proposed by Pauker and Kasirrer in the NEJM in 1980, which is basically a reformulation of VonNeumann and Morganstern's Expected Utility Theory to adapt it to medical decisions. Distilled it states in essence, "when you get to a threshold probability of disease where the benefits of treatment exceed the risks, you treat." And so let it be with early mobility. We already think the benefits exceed the risks, which is why we're doing it. We don't need a RCT. As I used to ask the housestaff over and over until I was cyanotic: "How will the results of that test influence what you're going to do?"

Notice that this logical approach to clinical decision making shines a blinding light upon "evidence based medicine" and the entire enterprise of testing hypotheses with frequentist methods that are deaf to prior probabilities. Can you imagine using V/Q scanning to test for PE without prior probabilities? Can you imagine what a mess you would find yourself in with regard to false negatives and false positives? You would be the neophyte medical student who thinks "test positive, disease present; test negative, disease absent." So why do we continue ad nauseum in critical care medicine to dismiss prior probabilities and decision thresholds and blindly test hypotheses in a purist vacuum?

The next reasons this trial should not be conducted flow from the first. The trial will not have a high enough likelihood ratio to sway the high prior below the decision threshold; if the trial is "positive" we will have spent millions of dollars to "prove" something we already knew at a threshold above our treatment threshold; if the trial is positive, some will squawk "It wasn't blinded" yada yada yada in an attempt to dismiss the results as false positives; if the trial is negative, some will, like the tyro medical student, declare that "there is no evidence for early mobility" and similar hoopla and poppycock; or the worst case: the trial shows harm from early mobility, which will get the naysayers of early mobility very agitated. But of course, our prior probability that early mobility is harmful is hopelessly low, making such a result highly likely to be spurious. When we clamor about "evidence" we are in essence clamoring about "testing hypotheses with RCTs" and eschewing our responsibility to use clinical judgment, recognize the limits of testing, and practice in the face of uncertainty using our "untested" prior probabilities.

Consider a trial of exercise on cardiovascular outcomes in community dwelling adults - what good can possibly come of such a trial? Don't we already know that exercise is good for you? If so, a positive trial reinforces what we already know (but does little to convince sedentary folks to exercise, as they too already know they should exercise), but a negative trial risks sending the message to people that exercise is of no use to you, or that the number needed to treat is too small for you to worry about.

Or consider the recent trials of EGDT which "refuted" the Rivers trial from 14 years ago. Now, everybody is saying, "Well, we know it works, maybe not the catheters and the ScVO2 and all those minutaie , but in general, rapid early resuscitation works. And the trials show that we've already incorporated what works into general practice!"

I don't know the solutions to these difficult quandries that we repeatedly find ourselves in trial after trial in critical care medicine. I'm confused too. That's why I'm thinking very hard and very critically about the limits of our methods and our models and our routines. But if we can anticipate not only the results of the trials, but also the community reaction to them, then we have guidance about how to proceed in the future. Because what value does a mega-trial have, if not to guide care after its completion? And even if that is not its goal, (maybe its goal is just to inform the science), can we turn a blind eye to the fact that it will guide practice after its completion, even if that guidance is premature?

It is my worry that, given the high prior probability that a trial in critical care medicine will be "negative", the most likely result is a negative trial which will embolden those who wish to dismiss the probable benefits of early mobility and give them an excuse to not do it.

Diagnostic tests have risks. A false negative test is one such risk.

Sunday, April 22, 2018

Thursday, February 15, 2018

Wednesday, November 1, 2017

Tuesday, September 26, 2017

Sunday, August 27, 2017

Thursday, April 6, 2017

Tuesday, April 4, 2017

Friday, February 10, 2017

Wednesday, January 11, 2017

Thursday, January 5, 2017

Thursday, September 8, 2016

Saturday, June 11, 2016

Wednesday, June 8, 2016

Tuesday, May 31, 2016

Monday, May 2, 2016

Tuesday, February 23, 2016

Wednesday, February 10, 2016

Thursday, February 4, 2016

Wednesday, December 23, 2015

Tuesday, November 10, 2015

Sunday, October 11, 2015

Wednesday, October 7, 2015