Sunday, May 22, 2022

Common Things Are Common, But What is Common? Operationalizing The Axiom

"Prevalence [sic: incidence] is to the diagnostic process as gravity is to the solar system: it has the power of a physical law." - Clifton K Meador, A Little Book of Doctors' Rules


We recently published a paper with the same title as this blog post here. The intent was to operationalize the age-old "common things are common" axiom so that it is practicable to employ it during the differential diagnosis process to incorporate probability information into DDx. This is possible now in a way that it never has been before because there are now troves of epidemiological data that can be used to bring quantitative (e.g., 25 cases/100,000 person-years) rather than mere qualitative (e.g., very common, uncommon, rare, etc) information to bear on the differential diagnosis. I will briefly summarize the main points of the paper and demonstrate how it can be applied to real-world diagnostic decision making.

First is that the proper metric for "commonness" is disease incidence (standardized as cases/100,000 person-years) not disease prevalence. Incidence is the number of new cases per year - those that have not been previously diagnosed - whereas prevalence is the number of already diagnosed cases. It the disease is already present, there is no diagnosis to be made (see article for more discussion of this). Prevalence is approximately equal the product of incidence & disease duration, so it will be higher (oftentimes a lot higher) than incidence for diseases with a chronic component; this will lead to overestimation of the likelihood of diagnosing a new case. Furthermore, your intuitions about disease commonness are mostly based on how frequently you see patients with the disease (e.g., SLE) but most of these are prevalent not incident cases so you will think SLE is more common than it really is, diagnostically. If any of this seems counterintuitive, see our paper for details (email me for pdf copy if you can't access it).

Second is that commonness exists on a continuum spanning 5 or more orders of magnitude, so it is unwise to dichotomize diseases as common or rare as information is lost in doing so. If you need a rule of thumb though, it is this: if the disease you are considering has single-digit (or less) incidence in 100,000 p-y, that disease is unlikely to be the diagnosis out of the gate (before ruling out more common diseases). Consider that you have approximately a 15% chance of ever personally diagnosing a pheochromocytoma (incidence <1/100,000 P-Y) during an entire 40 year career as there are only 2000 cases diagnosed per year in the USA, and nearly one million physicians in a position to initially diagnose them. (Note also that if you're rounding with a team of 10 physicians, and a pheo gets diagnosed, you can't each count this is an incident diagnosis of pheo. If it's a team effort, you each diagnosed 1/10th of a pheochromocytoma. This is why "personally diagnosing" is emphasized above.)  A variant of the common things axiom states "uncommon presentations of common diseases are more common than common presentations of uncommon diseases" - for more on that, see this excellent paper about the range of presentations of common diseases.

Third is that you cannot take a raw incidence figure and use it as a pre-test probability of disease. The incidence in the general population does not represent the incidence of diseases presenting to the clinic or the emergency department. What you can do however, is take what you do know about patients presenting with a clinical scenario,  and about general population incidence, and make an inference about relative likelihoods of disease. For example, suppose a 60-year-old man presents with fever, hemoptysis and a pulmonary opacity that may be a cavity on CXR. (I'm intentionally simplifying the case so that the fastidious among you don't get bogged down in the details.) The most common cause of this presentation hands down is pneumonia. But, it could also represent GPA (formerly Wegener's, every pulmonologist's favorite diagnosis for hemoptysis) or TB (tuberculosis, every medical student's favorite diagnosis for hemoptysis). How could we use incidence data to compare the relative probabilities of these 3 diagnostic possibilities?


Suppose we were willing to posit that 2/3rds of the time we admit a patient with fever and opacities, it's pneumonia. Using that as a starting point, we could then do some back-of-the-envelope calculations. CAP has an incidence on the order of 650/100k P-Y; GPA and TB have incidences on the order of 2 to 3/100k PY respectively - CAP is 200-300x more common than these two zebras. (Refer to our paper for history and references about the "zebra" metaphor.)  If CAP occupies 65% of the diagnostic probability space (see image and this paper for an explication), then it stands to reason that, ceteris paribus (and things are not always ceteris paribus), the TB and GPA occupy on the order of 1/200th of 65%, or about 0.25% of the probability space. From an alternative perspective, a provider will admit 200 cases of pneumonia for every case of TB or GPA she admits - there's just more CAP out there to diagnose! Ask yourself if this passes muster - when you are admitting to the hospital for a day, how many cases of pneumonia do you admit, and when is the last time you yourself admitted and diagnosed a new case of GPA or TB? Pneumonia is more than two orders of magnitude more common than GPA and TB and, barring a selection or referral bias, there just aren't many of the latter to diagnose! If you live in a referral area of one million people, there will only be 20-30 cases of GPA diagnosed in that locale during in a year (spread amongst hospitals/clinics), whereas there will be thousands of cases of pneumonia.

As a parting shot, these are back-of-the-envelope calculations, and their several limitations are described in our paper. Nonetheless, they are grounding for understanding the inertial pull of disease frequency in diagnosis. Thus, the other day I arrived in the AM to hear that a patient was admitted with supposed TTP (thrombotic thrombocytopenic purpura) overnight. With an incidence of about 0.3 per 100,000 PY, that is an extraordinary claim - a needle in the haystack has been found! - so, without knowing anything else, I wagered that the final diagnosis would not be TTP. (Without knowing anything else about the case, I was understandably squeamish about giving long odds against it, so I wagered at even odds, a $10 stake.) Alas, the final diagnosis was vitamin B12 deficiency (with an incidence on the order of triple digits per 100k PY), with an unusual (but well recognized) presentation that mimics TTP & MAHA.

Incidence does indeed have the power of a physical law; and as Hutchison said in an address in 1928, the second  commandment of diagnosis (after "don't be too clever") is "Do not diagnose rarities." Unless of course the evidence demands it - more on that later.

Saturday, October 30, 2021

CANARD: Coronavirus Associated Nonpathogenic Aspergillus Respiratory Disease

Cavitary MSSA disease in COVID

One of the many challenges of pulmonary medicine is unexplained dyspnea which, after an extensive investigation, has no obvious cause excepting "overweight and out of shape" (OWOOS). It can occur in thin and fit people too, and can also have a psychological origin: psychogenic polydyspnea, if you will. This phenomenon has been observed for quite some time. Now, overlay a pandemic and the possibility of "long COVID". Inevitably, people who would have been considered to have unexplained dyspnea in previous eras will come to attention after being infected with (or thinking they may have been infected with) COVID. Without making any comment on long COVID, it is clear that some proportion of otherwise unexplained dyspnea that would have come to attention had there been no pandemic will now come after COVID and thus be blamed on it. (Post hoc non ergo propter hoc.) When something is widespread (e.g., COVID), we should be careful about drawing connections between it and other highly prevalent (and pre-existing) phenomena (e.g., unexplained dyspnea).

Since early in the pandemic, reports have emerged of a purported association between aspergillus pulmonary infection and COVID. There are several lines of biological reasoning being used to support the association, including an analogy with aspergillosis and influenza (an association itself open to debate - I don't believe I've seen a case in 20 years), and sundry arguments predicated upon the immune and epithelial disturbances wrought by COVID. Many of these reports include patients who have both COVID and traditional risk factors for invasive pulmonary aspergillosis (IPA), viz, immunosuppression. The association in these patients is between immunosuppression and aspergillus (already known), in the setting of a pandemic; whether or not COVID adds additional risk would require an epidemiological investigation in a cohort of immunosuppressed patients with and without COVID. To my knowledge, such as study has not been done. Instead, they are like this one, where there were allegedly 42 patients among 279 in the discovery cohort with CAPA (Coronavirus Associated Pulmonary Aspergillosis; see Figure 1) and of those 42, 23 of them were immunosuppressed (Table 1). In addition, 5 had rheumatological disease, 3 had solid organ malignancy. In the validation cohort, there were 21 suspected CAPA among 209 patients; among these 21, there were at least 6 and perhaps 9 who were immunosuppressed (Table 2).

There are other problems (many of which we outlined here: Aspergillosis in the ICU: Hidden Enemy or Bogeyman?). The strikingly high proportion of COVID patients with aspergillosis being reported - a whopping 15% - includes patients with three levels of diagnostic certainty: proven, probable, and possible. I understand the difficulties inherent in the diagnosis of this disease. I also understand that relying on nonspecific tests for diagnosis will result in many false positives, especially in low base rate populations. Therefore it is imperative that we carefully establish the base rate, and that is not what most of these studies are doing. Rather they are - apparently intentionally - comingling traditional risk factors and COVID leading to what is surely a gross overestimation of the incidence of true IPA in patients with COVID. Thus we warned:

We worry that if the immanent methodological limitations of this and similar studies are not adequately acknowledged—they are not listed among the possible explanations for the results enumerated by the editorialist (2)—an avalanche of testing for aspergillosis in ICUs may ensue, resulting in an epidemic of overdiagnosis and overtreatment. We caution readers of this report that it cannot establish the true prevalence of Aspergillus infection in patients with ventilator-associated pneumonia in the ICU, but it does underscore the fact that when tests with imperfect specificity are applied in low-prevalence cohorts, most positive results are false positives (10).

In the discovery cohort of the study linked above, Figure 1 shows that only 6 patients of 279 (2%) had "proven" disease, 4 with "tracheobronchitis" (it is not mentioned how this unusual manifestation of IPA was diagnosed; presumably via biopsy; if not, it should have counted as "probable" disease; see these guidelines), and 2 others meeting the proven definition. The remaining 36 patients had probable CAPA (32) and possible CAPA (4). In the validation cohort, there were 21 of 209 patients with alleged CAPA, all of them in the probable category (2 with probable tracheobronchitis, 19 with probable CAPA). Thus the prevalence (sic: incidence) of IPA in patients with COVID is a hodgepodge of a bunch of different diseases across a wide spectrum of diagnostic certainty.

Future studies should - indeed, must - exclude patients on chronic immunosuppression and those with immunodeficiency from these cohorts and also describe the specific details that form the basis of the diagnosis and diagnostic certainty category. Meanwhile, clinicians should recognize that the purported 15% incidence rate of CAPA/IPA in COVID comprises mostly immunosuppressed patients and patients with probable or possible - not proven - disease. Some proportion of alleged CAPA is actually a CANARD.

Wednesday, April 14, 2021

Bias in Assessing Cognitive Bias in Forensic Pathology: The Dror Nevada Death Certificate "Study"

Following the longest hiatus in the history of the Medical Evidence Blog, I return to issues of forensic medicine, by happenstance alone. In today's issue of the NYT is this article about bias in forensic medicine, spurred by interest in the trial of the murder of George Floyd. Among other things, the article discusses a recently published paper in the Journal of Forensic Sciences for which there were calls for retraction by some forensic pathologists. According to the NYT article, the paper showed that forensic pathologists have racial bias, a claim predicated upon an analysis of death certificates in Nevada, and a survey study of forensic pathologists, using a methodology similar to that I have used in studying physician decisions and bias (viz, randomizing recipients to receiving one of two forms of a case vignette that differ in the independent variable of interest). The remainder of this post will focus on that study, which is sorely in need of some post-publication peer review.

The study was led by Itiel Dror, PhD, a Harvard trained psychologist now at University College London who studies bias, with a frequent focus on forensic medicine, if my cursory search is any guide. The other authors are a forensic pathologist (FP) at University of Alabama Birmingham (UAB), a FP and coroner in San Luis Obispo, California, a lawyer with the Clark County public defender's office in Las Vegas, Nevada, a PhD psychologist from Towson University in Towson, Maryland, an FP proprietor of a Forensics company who is a part time medical examiner for West Virginia, and an FP who is proprietor of a forensics and legal consulting company in San Francisco, California. The purpose of identifying the authors was to try to understand why the analysis of death certificates was restricted to the state of Nevada. Other than one author's residence there, I cannot understand why Nevada was chosen, and the selection is not justified in the paltry methods section of the paper.

Sunday, February 16, 2020

Misunderstanding and Misuse of Basic Clinical Decision Principles among Child Abuse Pediatricians

The previous post about Dr. Cox, ensnared in a CPT (Child Protection Team) witch hunt in Wisconsin, has led me to evaluate several more research reports on child abuse, including SBS (shaken baby syndrome), AHT (abusive head trauma), and sentinel injuries.  These reports are rife with critical assumptions, severe limitations, and gross errors which greatly limit the resulting conclusions in most studies I have reviewed.  However, one study that was pointed out to me today  takes the cake.  I don't know what the prevalence of this degree of misunderstanding is, but CPTs and child abuse pediatricians need make sure they have a proper understanding of sensitivity, specificity, positive and negative predictive value, base rates, etc.  And they should not be testifying about the probability of child abuse at all if they don't have this stuff down cold. And I think this means that some proportion of them needs to go back to school or stop testifying.

The article and associated correspondence at issue is entitled The Positive Predictive Value of Rib Fractures as an Indicator of Nonaccidental Trauma in Children published in 2004.  The authors looked at a series of rib fractures in children at a single Trauma Center in Colorado during a six year period and identified all patients with a rib fracture.  They then restricted their analysis to children less than 3 years of age.  There were 316 rib fractures among just 62 children in the series; the average number of rib fractures per child is ~5.  The proper unit of analysis for a study looking at positive predictive value is children, sorted into those with and without abuse, and with and without rib fracture(s) as seen in the 2x2 tables below.

Tuesday, January 28, 2020

Bad Science + Zealotry = The Wisconsin Witch Hunts. The Case of John Cox, MD

John Cox, MD
I stumbled upon a very disturbing report on NBC News today of a physician couple in Wisconsin accused of abusing their adopted infant daughter.  It is surreal and horrifying and worth a read - not because these physicians abused their daughter, but because they almost assuredly did not.  One driving force behind the case appears to be a well-meaning and perfervid, if misguided and perfidious, pediatrician at University of Wisconsin who, with her group, coined the term "sentinel injuries" (SI) to describe small injuries such as bruises and oral injuries that they posit portend future larger scale abuse.  It was the finding of SI on the adopted infant in the story that in part led to charges of abuse against the father, Dr. Cox, got his child put in protective services, got him arrested, and threatens his career.  Interested readers can reference the link above for sordid and sundry details of the case.

Before delving into the 2013 study in Pediatrics upon which many contentions about SI rest, we should start with the fundamentals.  First, is it plausible that the thesis is correct, that before serious abuse, minor abuse is detectable by small bruises or oral injuries?  Of course it is, and it sounds like a good narrative.  But being a good plausible narrative does not make it true and it is likewise possible that bruises seen in kids who are and are not abused reflect nothing more than accidental injuries from rolling off a support, something falling or dropping on them, somebody dropping them, a sibling jabbing at them with a toy, and a number of things.  To my knowledge, the authors offer no direct evidence that the SIs they or others report have been directly traced to abuse.  They are doing nothing more than inferring that facial bruising is a precursor to Abusive Head Trauma (AHT), and based on their bibliography they have gone out of their way to promote this notion.

Thursday, December 5, 2019

Noninferiority Trials of Reduced Intensity Therapies: SCORAD trial of Radiotherapy for Spinal Metastases


No mets here just my PTX
A trial in JAMA this week by Hoskin et al (the SCORAD trial) compared two different intensities of radiotherapy for spinal metastases.  This is a special kind of noninferiority trial, which we wrote about last year in BMJ Open.  When you compare the same therapy at two intensities using a noninferiority trail, you are in perilous territory.  This is because if the therapy works on a dose response curve, it is almost certain, a priori, that the lower dose is actually inferior - if you consider inferior to represent any statistically significant difference disfavoring a therapy.  (We discuss this position, which goes against the CONSORT grain, here.)  You only need a big enough sample size.  This may be OK, so long as you have what we call in the BMJ Open paper, "a suitably conservative margin of noninferiority."  Most margins of noninferiority (delta) are far from this standard.

The results of the SCORAD trial were consistent with our analysis of 30+ other noninferiority trials of reduced intensity therapies, and the point estimate favored - you guessed it - the more intensive radiotherapy.  This is fine.  It is also fine that the 1-sided 95% confidence interval crossed the 11% prespecified margin of noninferiority (P=0.06).  That just means you can't declare noninferiority.  What is not fine, in my opinion, is that the authors suggest that we look at how little overlap there was, basically an insinuation that we should consider it noninferior anyway.  I crafted a succinct missive to point this out to the editors, but alas I'm too busy to submit it and don't feel like bothering, so I'll post it here for those who like to think about these issues.

To the editor:  Hoskin et al report results of a noninferiority trial comparing two intensities of radiotherapy (single fraction versus multi-fraction) for spinal cord compression from metastatic cancer (the SCORAD trial)1.  In the most common type of noninferiority trial, investigators endeavor to show that a novel agent is not worse than an established one by more than a prespecified margin.  To maximize the chances of this, they generally choose the highest tolerable dose of the novel agent.  Similarly, guidelines admonish against underdosing the active control comparator as this will increase the chances of a false declaration of noninferiority of the novel agent2,3.  In the SCORAD trial, the goal was to determine if a lower dose of radiotherapy was noninferior to a higher dose. Assuming radiotherapy is efficacious and operates on a dose response curve, the true difference between the two trial arms is likely to favor the higher intensity multi-fraction regimen.  Consequently, there is an increased risk of falsely declaring noninferiority of single fraction radiotherapy4.  Therefore, we agree with the authors’ concluding statement that “the extent to which the lower bound of the CI overlapped with the noninferiority margin should be considered when interpreting the clinical importance of this finding.”  The lower bound of a two-sized 95% confidence interval (the trial used a 1-sided 95% confidence interval) extends to 13.1% in favor of multi-fraction radiotherapy.  Because the outcome of the trial was ambulatory status, and there were no differences in serious adverse events, our interpretation is that single fraction radiotherapy should not be considered noninferior to a multi-fraction regimen, without qualifications.

1.            Hoskin PJ, Hopkins K, Misra V, et al. Effect of Single-Fraction vs Multifraction Radiotherapy on Ambulatory Status Among Patients With Spinal Canal Compression From Metastatic Cancer: The SCORAD Randomized Clinical Trial. JAMA. 2019;322(21):2084-2094.
2.            Piaggio G, Elbourne DR, Pocock SJ, Evans SW, Altman DG, f CG. Reporting of noninferiority and equivalence randomized trials: Extension of the consort 2010 statement. JAMA. 2012;308(24):2594-2604.
3.            Jones B, Jarvis P, Lewis JA, Ebbutt AF. Trials to assess equivalence: the importance of rigorous methods. BMJ. 1996;313(7048):36-39.
4.            Aberegg SK, Hersh AM, Samore MH. Do non-inferiority trials of reduced intensity therapies show reduced effects? A descriptive analysis. BMJ open. 2018;8(3):e019494-e019494.

Saturday, November 23, 2019

Pathologizing Lipid Laden Macrophages (LLMs) in Vaping Associated Lung Injury (VALI)

It's time to weigh in on an ongoing debate being waged in the correspondence pages of the NEJM.  To wit, what is the significance of lipid laden macrophages (LLMs) in VALI?  As we stated, quite clearly, in our original research letter,

"Although the pathophysiological significance of these lipid-laden macrophages and their relation to the cause of this syndrome are not yet known, we posit that they may be a useful marker of this disease.3-5 Further work is needed to characterize the sensitivity and specificity of lipid-laden macrophages for vaping-related lung injury, and at this stage they cannot be used to confirm or exclude this syndrome. However, when vaping-related lung injury is suspected and infectious causes have been excluded, the presence of lipid-laden macrophages in BAL fluid may suggest vaping-related lung injury as a provisional diagnosis."
There, we outlined the two questions about their significance:  1.) any relation to the pathogenesis of the syndrome; and 2.) whether, after characterizing their sensitivity and specificity, they can be used in diagnosis.  I am not a lung biologist, so I will ignore the first question and focus on the second, where I actually do know a thing or two.

We still do not know the sensitivity or specificity of LLMs for VALI, but we can make some wagers based on what we do know.  First, regarding sensitivity.  In our ongoing registry at the University of Utah, we have over 30 patients with "confirmed" VALI (if you dont' have a gold standard, how do you "confirm" anything?), and to date all but one patient had LLMs in excess of 20% on BAL.  For the first several months we bronched everybody.  So, in terms of BAL and LLMs, I'm guessing we have the most extensive and consistent experience.  Our sensitivity therefore is over 95%.  In the Layden et al WI/IL series in NEJM, there were 7 BAL samples and all 7 had "lipid Layden macrophages" (that was a pun).  In another Utah series, Blagev et al reported that 8 of 9 samples tested showed LLMs.  Combining those data (ours are not yet published, but soon will be) we can state the following:  "Given the presence of VALI, the probability of LLM on Oil Red O staining (OROS) is 96%."  You may recognize that as a statement of sensitivity.  It is unusual to not find LLMs on OROS of BAL fluid in cases of VALI, and because of that, their absence makes the case atypical, just as does the absence of THC vaping.  Some may go so far as to say their absence calls into question the diagnosis, and I am among them.  But don't read between the lines.  I did not say that bronchoscopy is indicated to look for them.  I simply said that their absence makes the case atypical and calls it into question.

Sunday, September 1, 2019

Pediatrics and Scare Tactics: From Rock-n-Play to Car Safety Seats

Is sleeping in a car seat dangerous?
Earlier this year, the Fisher-Price company relented to pressure from the AAP (American Academy of Pediatrics) and recalled 4.7 million of Rock 'n Play (RnP) baby rockers, which now presumably occupy landfills.  This recommendation stemmed from an "investigation" by consumer reports showing that since 2011, 32 babies died while sleeping in the RnP.  These deaths are tragic, but what does it mean?  In order to make sense of this "statistic" we need to determine a rate, based on the exposure period, something like "the rate of infant death in the RnP is 1 per 10 million RnP occupied hours" or something like that.  Then we would compare it to the rate of infant death sleeping in bed.  If it was higher, we would have a starting point for considering whether ceteris paribus, maybe it's the RnP that is causing the infant deaths.  We would want to know the ratio of observed deaths in the RnP to expected deaths sleeping in some other arrangement for the same amount of time.  Of course, even if we found the observed rate was higher than the expected rate, other possibilities exist, i.e., it's an association, a marker for some other factor, rather than a cause of the deaths.  A more sophisticated study would, through a variety of methods, try to control for those other factors, say, socioeconomic status, infant birth weight, and so on.  The striking thing to me and other evidence minded people was that this recall did not even use the observed versus expected rate, or any rate at all!  Just a numerator!  We could do some back of the envelope calculations with some assumptions about rate ratios, but I won't bother here.  Suffice it to say that we had an infant son at that time and we kept using the RnP until he outgrew it and then we gave it away.

Last week, the AAP was at it again, playing loose with the data but tight with recommendations based upon them.  This time, it's car seats.  In an article in the August, 2019 edition of the journal Pediatrics, Liaw et al present data showing that, in a cohort of 11,779 infant deaths, 3% occurred in "sitting devices", and in 63% of this 3%, the sitting device was a car safety seat (CSS).  In the deaths in CSSs, 51.6% occurred in the child's home rather than in a car.  What was the rate of infant death per hour in the CSS?  We don't know.  What is the expected rate of death for the same amount of time sleeping, you know, in the recommended arrangement?  We don't know!  We're at it again - we have a numerator without a denominator, so no rate and no rate to compare it to.  It could be that 3% of the infant deaths occurred in car seats because infants are sleeping in car seats 3% of the time!

Sunday, July 21, 2019

Move Over Feckless Extubation, Make Room For Reckless Extubation

Following the theme of some recent posts on Status Iatrogenicus (here and here) about testing and treatment thresholds, one of our stellar fellows Meghan Cirulis MD and I wrote a letter to the editor of JAMA about the recent article by Subira et al comparing shorter duration Pressure Support Ventilation to longer duration T-piece trials.  Despite adhering to my well hewn formula for letters to the editor, it was not accepted, so as is my custom, I will publish it here.

Spoiler alert - when the patients you enroll in your weaning trial have a base rate of extubation success of 93%, you should not be doing an SBT - you should be extubating them all, and figuring out why your enrollment criteria are too stringent and how many extubatable patients your enrollment criteria are missing because of low sensitivity and high specificity.

Tuesday, May 7, 2019

Etomidate Succs: Preventing Dogma from Becoming Practice in RSI

The editorial about the PreVent trial in the NEJM a few months back is entitled "Preventing Dogma from Driving Practice".  If we are not careful, we will let the newest dogma replace the old dogma and become practice.

The PreVent trial compared bagging versus no bagging after induction of anesthesia for rapid sequence intubation (RSI).  Careful readers of this and another recent trial testing the dogma of videolaryngoscopy will notice several things that may significantly limit the external validity of the results.
  • The median time from induction to intubation was 130 seconds in the no bag ventilation group, and 158 seconds in the bag ventilation group (NS).  That's 2 to 2.5 minutes.  In the Lascarrou 2017 JAMA trial of direct versus video laryngoscopy, it was three minutes.  Speed matters.  The time that a patient is paralyzed and non-intubated is a very dangerous time and it ought to be as short as possible
  • The induction agent was Etomidate (Amidate) in 80% of the patients in the PreVent trial and 90% of patients in the Larascarrou trial (see supplementary appendix of PreVent trial)
  • The intubations were performed by trainees in approximately 80% of intubations in both trials (see supplementary appendix of PreVent trial)
I don't think these trials are directly relevant to my practice.  Like surgeon Robert Liston who operated in the pre-anesthesia era and learned that speed matters (he could amputate a leg in 2.5 minutes), I have learned that the shorter the time from induction to intubation, the better - it is a vulnerable time and badness occurs during it:  atelectasis, hypoxemia, aspiration, hypotension, secretion accumulation, etc.

Thursday, April 25, 2019

The EOLIA ECMO Bayesian Reanalysis in JAMA

A Hantavirus patient on ECMO, circa 2000
Spoiler alert:  I'm a Bayesian decision maker (although maybe not a Bayesian trialist) and I "believe" in ECMO as documented here.

My letter to the editor of JAMA was published today (and yeah I know, I write too many letters, but hey, I read a lot and regular peer review often doesn't cut it) and even when you come at them like a spider monkey, the authors of the original article still get the last word (and they deserve it - they have done far more work than the post-publication peer review hecklers with their quibbles and their niggling letters.)

But to set some thing clear, I will need some more words to elucidate some points about the study's interpretation.  The authors' response to my letter has five points.
  1. I (not they) committed confirmation bias, because I postulated harm from ECMO.  First, I do not have a personal prior for harm from ECMO, I actually think it is probably beneficial in properly selected patients, as is well documented in the blog post from 2011 describing my history of experience with it in hantavirus, and as well in a book chapter I wrote in Cardiopulmonary Bypass Principles and Practice circa 2006.  There is irony here - I "believe in" ECMO, I just don't think their Bayesian reanalysis supports my (or anybody's) beliefs in a rational way!  The point is that it was a post hoc unregistered Bayesian analysis after a pre-registered frequentist study which was "negative" (for all that's worth and not worth), and the authors clearly believe in the efficacy of ECMO as do I.  In finding shortcomings in their analysis, I seek to disconfirm or at least challenge no only their but my own beliefs.  And I think that if the EOLIA trial had been positive, that we would not be publishing Bayesian reanalyses showing how the frequentist trial may be a type I error.  We know from long experience that if EOLIA had been "positive" that success would have been declared for ECMO as it has been with prone positioning for ARDS.  (I prone patients too.)  The trend is to confirm rather than to disconfirm, but good science relies more on the latter.
  2. That a RR of 1.0 for ECMO is a "strongly skeptical" prior.  It may seem strong from a true believer standpoint, but not from a true nonbeliever standpoint.  Those are the true skeptics (I know some, but I'll not mention names - I'm not one of them) who think that ECMO is really harmful on the net, like intensive insulin therapy (IIT) probably is.  Regardless of all the preceding trials, if you ask the NICE-SUGAR investigators, they are likely to maintain that IIT is harmful.  Importantly, the authors skirt the issue of the emphasis they place on the only longstanding and widely regarded as positive ARDS trial (of low tidal volume).  There are three decades of trials in ARDS patients, scores of them, enrolling tens of thousands of patients, that show no effect of the various therapies.  Why would we give primacy to the the one trial which was positive, and equate ECMO to low tidal volume?  Why not equate it to high PEEP, or corticosteroids for ARDS?  A truly skeptical prior would have been centered on an aggregate point estimate and associated distribution of 30 years of all trials in ARDS of all therapies (the vast majority of them "negative").  The sheer magnitude of their numbers would narrow the width of the prior distribution with RR centered on 1.0 (the "severely skeptical" one), and it would pull the posterior more towards zero benefit, a null result.  Indeed, such a narrow prior distribution may have shown that low tidal volume is an outlier and likely to be a false positive (I won't go any farther down that perilous path).  The point is, even if you think a RR of 1.0 is severely skeptical, the width of the distribution counts for a lot too, and the uninitiated are likely to miss that important point.
  3. Priors are not used to "boost" the effect of ECMO.  (My original letter called it a Bayesian boost, borrowing from Mayo, but the adjective was edited out.) Maybe not always, but that was the effect in this case, and the respondents did not cite any examples of a positive frequentist result that was reanalyzed with Bayesian methods to "dampen" the observed effect.  It seems to only go one way, and that's why I alluded to confirmation bias.  The "data-driven priors" they published were tilted towards a positive result, as described above.
  4. Evidence and beliefs.  But as Russell said "The degree to which beliefs are based on evidence is very much less than believers suppose."  I support Russell's quip with the aforementioned.
  5. Judgment is subjective, etc.  I would welcome a poll, in the spirit of crowdsourcing, as we did here to better understand what the community thinks about ECMO (my guess is it's split ratherly evenly, with a trend, perhaps strong, for the efficacy of ECMO).  The authors' analysis is laudable, but it is not based on information not already available to the crowd; rather it transforms it in ways may not be transparent to the crowd and may magnify it in a biased fashion if people unfamiliar with Bayesian methods do not scrutinize the chosen prior distributions.

Sunday, April 21, 2019

A Finding of Noninferiority Does Not Show Efficacy - It Shows Noninferiority (of short course rifampin for MDR-TB)

An image of two separated curves from Mayo's book SIST
Published in the March 28th, 2019 issue of the NEJM is the STREAM trial of a shorter regimen for Rifampin-resistant TB.  I was interested in this trial because if fits the pattern of a "reduced intensity therapy", a cohort of which we recently analyzed and published last year.  The basic idea is this:  if you want to show efficacy of a therapy, you choose the highest dose of the active drug to compare to placebo, to improve the chances that you will get "separation" of the two populations and statistically significant results.  Sometimes, the choice of the "dose" of something, say tidal volume in ARDS, is so high that you are accused of harming one group rather than helping the other.  The point is if you want positive results, use the highest dose so the response curves will separate further, assuming efficacy.

Conversely, in a noninferiority trial, your null hypothesis is not that there is no difference between the groups as it is in a superiority trial, but rather it is that there is a difference bigger than delta (the pre-specified margin of noninferiority.  Rejection of the null hypothesis a leads you to conclude that there is no difference bigger than delta, and you then conclude noninferiority.  If you are comparing a new antibiotic to vancomycin, and you want to be able to conclude noninferiority, you may intentionally or subconsciously dose vancomycin at the lower end of the therapeutic range, or shorten the course of therapy.  Doing this increases the chances that you will reject the null hypothesis and conclude that there is no difference greater than delta in favor of vancomycin and that your new drug is noninferior.  However, this increases your type 1 error rate - the rate at which you falsely conclude noninferiority.

Sunday, December 23, 2018

Do Doctors and Medical Errors Kill More People than Guns?

Recently released stats showing over 40,000 deaths due to firearms in the US this year have led to the usual hackneyed comparisons between those deaths and deaths due to medical errors, the tired refrain something like "Doctors kill more people than that!"  These claims were spreading among gun aficionados on social media last week, with references to this 2016 BMJ editorial by Makary and Michael, from my alma mater Johns Hopkins Bloomberg SPH, claiming that "Medical Error is the Third Leading Cause of Death."  I have been incredulous about this claim when I have encountered it in the past, because it just doesn't jibe with my 20 years of working in these dangerous slaughterhouses we call hospitals.  I have no intention to minimize medical errors - they certainly occur and are highly undesirable - but I think gross overestimates do a disservice too.  Since this keeps coming up, I decided to delve further.

First, just for the record, I'm going to posit that the 40,000 firearms deaths is a reliable figure because they will be listed as homicides and suicides in the "manner of death" section of death certificates, and they're all going to be medical examiner cases.  So I have confidence in this figure.

Contrarily, the Makary paper has no new primary data.  It is simply an extrapolation of existing data and the source is a paper by James in the Journal of Patient Safety in 2013.  (Consider for a moment whether you may have any biases if your career depended upon publishing articles in the Journal of Patient Safety.)  This paper also has no new primary data but relies on data from 4 published studies, two of them not peer-reviewed but Office of the Inspector General (OIG) reports.  I will go through each of these in turn so we can see where these apocalyptic estimates come from.

OIG pilot study from 2008.  This is a random sample of 278 Medicare beneficiaries hospitalized in 2 unspecified and nonrandom counties.  All extrapolations are made from this small sample which has wide confidence intervals because of its small size (Appendix F, Table F1, page 33).  A harm scale is provided on page 3 of the document where the worst category on the letter scale is "I" which is:
"An error occurred that may have contributed to or resulted in patient death."  [Italics added.]

Thursday, May 24, 2018

You Have No Idea of the Predictive Value of Weaning Parameters for Extubation Success, and You Probably Never Will

As Dr. O'brien eloquently described in this post, many people misunderstand the Yang-Tobin (f/Vt) index as being a "weaning parameter" that is predictive of extubation success.  Far from that, it's sensitivity and specificity and resultant ROC curve relate to the ability of f/Vt after one minute of spontaneous ventilation to predict the success of a prolonged (~ one hour) spontaneous breathing trial.  But why would I want to predict the result of a test (the SBT), and introduce error, when I can just do the test and get the result an hour later?  It makes absolutely no sense.  What we want is a parameter that predicts extubation success.  But we don't have that, and we probably will never have that.

In order to determine the sensitivity and specificity of a test for extubation success, we will need to ascertain the outcome in all patients regardless of their performance on the test of interest.  That means we would have to extubate patients that failed the weaning parameter test.  In the original Yang & Tobin article, their cohort consisted of 100 patients.  60(%) of the 100 were said to have passed the weaning test and were extubated, and 40(%) failed and were not extubated.  (There is some over-simplification here based on how Yang & Tobin classified and reported events - its not at all transparent in their article - the data to resolve the issues are not reported and the differences are likely to be small.  Suffice it to say that about 60% of their patients were successfully weaned and the remainder were not.)  Let's try to construct a 2x2 table to determine the sensitivity and specificity of a weaning parameter using a population like theirs.  The top row of the 2x2 table would look something like this, assuming an 85% extubation success rate - that is, of the 60 patients with a positive or "passing" SBT score (based on whatever parameter), all were extubated and the positive predictive value of the test is 85% (the actual rate of reintubation in patients with a passing weaning test is not reported, so this is a guess):



Thursday, May 17, 2018

Increasing Disparities in Infant Mortality? How a Narrative Can Hinge on the Choice of Absolute and Relative Change

An April, 11th, 2018 article in the NYT entitled "Why America's Black Mothers and Babies are in a Life-or-Death Crisis" makes the following alarming summary statement about racial disparities in infant mortality in America:
Black infants in America are now more than twice as likely to die as white infants — 11.3 per 1,000 black babies, compared with 4.9 per 1,000 white babies, according to the most recent government data — a racial disparity that is actually wider than in 1850, 15 years before the end of slavery, when most black women were considered chattel.
Racial disparities in infant mortality have increased since 15 years before the end of the Civil War?  That would be alarming indeed.  But a few paragraphs before, we are given these statistics:

In 1850, when the death of a baby was simply a fact of life, and babies died so often that parents avoided naming their children before their first birthdays, the United States began keeping records of infant mortality by race. That year, the reported black infant-mortality rate was 340 per 1,000; the white rate was 217 per 1,000.
The white infant mortality rate has fallen 217-4.9 = 212.1 infants per 1000.  The black infant mortality rate has fallen 340-11.3 = 328.7 infants per 1000.  So in absolute terms, the terms that concern babies (how many of us are alive?), the black infant mortality rate has fallen much more than the white infant mortality rate.  In fact, in absolute terms, the disparity is almost gone:  in 1850, the absolute difference was 340-217 = 123 more black infants per 1000 births dying to 11.3-4.9 = 6.4 more black infants per 1000 births dying.

Analyzed a slightly different way, the proportion of white infants dying has been reduced by (217-4.9/217) 97.7%, and the proportion of black infants dying has been reduced by (340-11.3/340)= 96.7%.  So, within 1%, black and white babies shared almost equally in the improvements in infant mortality that have been seen since 15 years before the end of the Civil War.  Or, we could do a simple reference frame change and look at infant survival rather than mortality.  If we did that, the current infant survival rate is 98.87% for black babies and .9951% for white babies.  The rate ratio for black:white survival is .994 - almost parity depending on your sensitivity to variances from unity.

It's easy to see how the author of the article arrived at different conclusions by looking only at the rate ratios in 1850 and contemporaneously.  But doing the math that way makes it seem as if a black baby is worse off today than in 1850!  Nothing could be farther from the truth.

You might say that this is just "fuzzy math" as our erstwhile president did in the debates of 2000.  But there could be important policy implications also.  Suppose that I have an intervention that I could apply across the US population and I estimate that it will save an additional 5 black babies per 1000 and an additional 3 white babies per 1000.  We implement this policy and it works as projected.  The black infant mortality rate is 6.3/1000 and the white infant mortality rate is 1.9/1000.  We have saved far more black babies than white babies.  But the rate ratio for black:white mortality has increased from 2.3 to 3.3!  Black babies are now 3 (three!) times as likely to die as white babies!  The policy has increased disparities even though black babies are far better off after the policy change than before it.

It reminds me of the bias where people would rather take a smaller raise if it increased their standing relative to their neighbor.  Surprisingly, when presented with two choices:

  1. you make $50,000 and your peers make $25,000 per year
  2. You make $100,000 and your peers make $250,000 per year
many people choose 1, as if relative social standing is worth $50,000 per year in income.  (Note that relative social standing is just that, relative, and could change if you arbitrarily change the reference class.)

So, relative social standing has value and perhaps a lot of it.  But as regards the hypothetical policy change above, I'm not sure we should be focusing on relative changes in infant mortality.  We just want as few babies dying as possible.

Wednesday, May 2, 2018

Hollow Hegemony: The Opportunity Costs of Overemphasizing Sepsis


Protocols are to make complex tasks simple, not simple tasks complex. - Scott K Aberegg

Yet here we find ourselves some 16 years after the inauguration of the Surviving Sepsis Campaign, and their influence continues to metastasize, even after the message has been hollowed out like a piece of fallen, old-growth timber.

Surviving sepsis was the brainchild of Eli Lilly, who, in the year after the ill-fated FDA approval of drotrecogin-alfa, worried that the drug would not sell well if clinicians did not have an increased awareness of sepsis. That aside, in those days, there were legitimate questions surrounding the adoption and implementation of several new therapies such as EGDT, corticosteroids for septic shock, Xigris for those with APACHE scores over 25, intensive insulin therapy, etc.

Those questions are mostly answered. Sepsis is now, quite simply, a complex of systemic manifestations of infection almost all of which will resolve with treatment of the infection and general supportive care. The concept of sepsis could vanish entirely, and nothing about the clinical care of the patient would change: an infection would be diagnosed, the cause/source identified and treated, and hemodynamics and laboratory dyscrasias supported meanwhile. There is nothing else to do (because lactic acidosis does not exist.)

But because of the hegemony of the sepsis juggernaut (the spawn of the almighty dollar), we are now threatened with a mandate to treat patients carrying the sepsis label (oftentimes assigned by a hospital coder after the fact) with antibiotics and a fluid bolus within one hour of triage in the ED. Based on what evidence?

Weak recommendation, "Best Practice Statement" and some strong recommendations based on low and moderate quality evidence.  So if we whittle it down to just moderate quality of evidence, what do we have?  Give antibiotics for infections, and give vasopressors if MAP less than 65.  But now we have to hurry up and do the whole kit and caboodle boiler plate style within 60 minutes?

Sepsis need not be treated any differently than a gastrointestinal hemorrhage, or for that matter, any other disease.  You make the diagnosis, determine and control the cause (source), give appropriate treatments, and support the physiology in the meantime, all while prioritizing the sickest patients.  But that counts for all diseases, not just sepsis, and there is only so much time in an hour.  When every little old lady with fever and a UTI suddenly rises atop the priorities of the physician, this creates an opportunity cost/loss for the poor bastard bleeding next door who doesn't have 2 large-bore IVs or a type and cross yet because grandma is being flogged with 2 liters of fluid, and in a hurry.  If only somebody had poured mega-bucks into increased recognition and swift treatment of GI bleeds....


Petition to retire the surviving sepsis campaign guidelines:

(Sign the Petition Here.)

Friends,

Concern regarding the Surviving Sepsis Campaign (SSC) guidelines dates back to their inception.  Guideline development was sponsored by Eli Lilly and Edwards Life Sciences as part of a commercial marketing campaign (1).  Throughout its history, the SSC has a track record of conflicts of interest, making strong recommendations based on weak evidence, and being poorly responsive to new evidence (2-6).

The original backbone of the guidelines was a single-center trial by Rivers defining a protocol for early goal-directed therapy (7).  Even after key elements of the Rivers protocol were disproven, the SSC continued to recommend them.  For example, SSC continued to recommend the use of central venous pressure and mixed venous oxygen saturation after the emergence of evidence that they were nonbeneficial (including the PROCESS and ARISE trials).  These interventions eventually fell out of favor, despite the slow response of SSC that delayed knowledge translation. 

SSC has been sponsored by Eli Lilly, manufacturer of Activated Protein C.  The guidelines continued recommending Activated Protein C until it was pulled from international markets in 2011.  For example, the 2008 Guidelines recommended this, despite ongoing controversy and the emergence of neutral trials at that time (8,9).  Notably, 11 of 24 guideline authors had financial conflicts of interest with Eli Lilly (10).

The Infectious Disease Society of America (IDSA) refused to endorse the SSC because of a suboptimal rating system and industry sponsorship (1).  The IDSA has enormous experience in treating infection and creating guidelines.  Septic patients deserve a set of guidelines that meet the IDSA standards.


Guidelines should summarize evidence and provide recommendations to clinicians.  Unfortunately, the SSC doesn’t seem to trust clinicians to exercise judgement.  The guidelines infantilize clinicians by prescribing a rigid set of bundles which mandate specific interventions within fixed time frames (example above)(10).  These recommendations are mostly arbitrary and unsupported by evidence (11,12).  Nonetheless, they have been adopted by the Centers for Medicare & Medicaid Services as a core measure (SEP-1).  This pressures physicians to administer treatments despite their best medical judgment (e.g. fluid bolus for a patient with clinically obvious volume overload).

We have attempted to discuss these issues with the SSC in a variety of forums, ranging from personal communications to formal publications (13-15).  We have tried to illuminate deficiencies in the SSC bundles and the consequent SEP-1 core measures.  Our arguments have fallen on deaf ears. 

We have waited patiently for years in hopes that the guidelines would improve, but they have not.  The 2018 SSC update is actually worse than prior guidelines, requiring the initiation of antibiotics and 30 cc/kg fluid bolus within merely sixty minutes of emergency department triage (16).  These recommendations are arbitrary and dangerous.  They will likely cause hasty management decisions, inappropriate fluid administration, and indiscriminate use of broad-spectrum antibiotics.  We have been down this path before with other guidelines that required antibiotics for pneumonia within four hours, a recommendation that harmed patients and was eventually withdrawn (17).

It is increasingly clear that the SSC guidelines are an impediment to providing the best possible care to our septic patients.  The rigid framework mandated by SSC doesn’t help experienced clinicians provide tailored therapy to their patients.  Furthermore, the hegemony of these guidelines prevents other societies from developing better guidelines.

We are therefore petitioning for the retirement of the SSC guidelines.  In its place, we would call for the development of separate sepsis guidelines by the United States, Europe, ANZICS, and likely other locales as well.  There has been a monopoly on sepsis guidelines for too long, leading to stagnation and dogmatism.  We would hope that these new guidelines are written by collaborations of the appropriate professional societies, based on the highest evidentiary standards.  The existence of several competing sepsis guidelines could promote a diversity of opinions, regional adaptation, and flexible thinking about different approaches to sepsis. 

We are disseminating an international petition that will allow clinicians to express their displeasure and concern over these guidelines.  If you believe that our septic patients deserve more evidence-based guidelines, please stand with us.  

Sincerely,

Scott Aberegg MD MPH
Jennifer Beck-Esmay MD
Steven Carroll DO MEd
Joshua Farkas MD
Jon-Emile Kenny MD
Alex Koyfman MD
Michelle Lin MD
Brit Long MD
Manu Malbrain MD PhD
Paul Marik MD
Ken Milne MD
Justin Morgenstern MD
Segun Olusanya MD
Salim Rezaie MD
Philippe Rola MD
Manpreet Singh MD
Rory Speigel MD
Reuben Strayer MD
Anand Swaminathan MD
Adam Thomas MD
Lauren Westafer DO MPH
Scott Weingart MD

References
  1. Eichacker PQ, Natanson C, Danner RL.  Surviving Sepsis – Practice guidelines, marketing campaigns, and Eli Lilly.  New England Journal of Medicine  2006; 16: 1640-1642.
  2. Pepper DJ, Jaswal D, Sun J, Welsch J, Natanson C, Eichacker PQ.  Evidence underpinning the Centers for Medicare & Medicaid Services’ Severe Sepsis and Septic Shock Management Bundle (SEP-1): A systematic review.  Annals of Internal Medicine 2018; 168:  558-568. 
  3. Finfer S.  The Surviving Sepsis Campaign:  Robust evaluation and high-quality primary research is still needed.  Intensive Care Medicine  2010; 36:  187-189.
  4. Salluh JIF, Bozza PT, Bozza FA.  Surviving sepsis campaign:  A critical reappraisal.  Shock 2008; 30: 70-72. 
  5. Eichacker PQ, Natanson C, Danner RL.  Separating practice guidelines from pharmaceutical marketing.  Critical Care Medicine 2007; 35:  2877-2878. 
  6. Hicks P, Cooper DJ, Webb S, Myburgh J, Sppelt I, Peake S, Joyce C, Stephens D, Turner A, French C, Hart G, Jenkins I, Burrell A.  The Surviving Sepsis Campaign:  International guidelines for management of severe sepsis and septic shock: 2008.  An assessment by the Australian and New Zealand Intensive Care Society.  Anaesthesia and Intensive Care 2008; 36: 149-151.
  7. Rivers ME et al.  Early goal-directed therapy in the treatment of severe sepsis and septic shock.  New England Journal of Medicine 2001; 345: 1368-1377.
  8. Wenzel RP, Edmond MB.  Septic shock – Evaluating another failed treatment.  New England Journal of Medicine 2012; 366:  2122-2124.  
  9. Savel RH, Munro CL.  Evidence-based backlash:  The tale of drotrecogin alfa.  American Journal of Critical Care  2012; 21: 81-83. 
  10. Dellinger RP, Levy MM, Carlet JM et al.  Surviving sepsis campaign:  International guidelines for management of severe sepsis and septic shock:  2008.  Intensive Care Medicine 2008; 34:  17-60. 
  11. Allison MG, Schenkel SM.  SEP-1:  A sepsis measure in need of resuscitation?  Annals of Emergency Medicine 2018; 71: 18-20.
  12. Barochia AV, Xizhong C, Eichacker PQ.  The Surviving Sepsis Campaign’s revised sepsis bundles.  Current Infectious Disease Reports 2013; 15:  385-393. 
  13. Marik PE, Malbrain MLNG.  The SEP-1 quality mandate may be harmful: How to drown a patient with 30 ml per kg fluid!  Anesthesiology and Intensive Therapy 2017; 49(5) 323-328.
  14. Faust JS, Weingart SD.  The past, present, and future of the centers for Medicare and Medicaid Services quality measure SEP-1:  The early management bundle for severe sepsis/septic shock.  Emergency Medicine Clinics of North America 2017; 35:  219-231.
  15. Marik PE.  Surviving sepsis:  going beyond the guidelines.  Annals of Intensive Care 2011; 1: 17.
  16. Levy MM, Evans LE, Rhodes A.  The surviving sepsis campaign bundle:  2018 update.  Intensive Care Medicine.  Electronic publication ahead of print, PMID 29675566.
  17. Kanwar M, Brar N, Khatib R, Fakih MG.  Misdiagnosis of community-acquired pneumonia and inappropriate utilization of antibiotics: side effects of the 4-h antibiotic administration rule.  Chest 2007; 131: 1865-1869.