Wednesday, April 14, 2021

Bias in Assessing Cognitive Bias in Forensic Pathology: The Dror Nevada Death Certificate "Study"

Following the longest hiatus in the history of the Medical Evidence Blog, I return to issues of forensic medicine, by happenstance alone. In today's issue of the NYT is this article about bias in forensic medicine, spurred by interest in the trial of the murder of George Floyd. Among other things, the article discusses a recently published paper in the Journal of Forensic Sciences for which there were calls for retraction by some forensic pathologists. According to the NYT article, the paper showed that forensic pathologists have racial bias, a claim predicated upon an analysis of death certificates in Nevada, and a survey study of forensic pathologists, using a methodology similar to that I have used in studying physician decisions and bias (viz, randomizing recipients to receiving one of two forms of a case vignette that differ in the independent variable of interest). The remainder of this post will focus on that study, which is sorely in need of some post-publication peer review.

The study was led by Itiel Dror, PhD, a Harvard trained psychologist now at University College London who studies bias, with a frequent focus on forensic medicine, if my cursory search is any guide. The other authors are a forensic pathologist (FP) at University of Alabama Birmingham (UAB), a FP and coroner in San Luis Obispo, California, a lawyer with the Clark County public defender's office in Las Vegas, Nevada, a PhD psychologist from Towson University in Towson, Maryland, an FP proprietor of a Forensics company who is a part time medical examiner for West Virginia, and an FP who is proprietor of a forensics and legal consulting company in San Francisco, California. The purpose of identifying the authors was to try to understand why the analysis of death certificates was restricted to the state of Nevada. Other than one author's residence there, I cannot understand why Nevada was chosen, and the selection is not justified in the paltry methods section of the paper.

Sunday, February 16, 2020

Misunderstanding and Misuse of Basic Clinical Decision Principles among Child Abuse Pediatricians

The previous post about Dr. Cox, ensnared in a CPT (Child Protection Team) witch hunt in Wisconsin, has led me to evaluate several more research reports on child abuse, including SBS (shaken baby syndrome), AHT (abusive head trauma), and sentinel injuries.  These reports are rife with critical assumptions, severe limitations, and gross errors which greatly limit the resulting conclusions in most studies I have reviewed.  However, one study that was pointed out to me today  takes the cake.  I don't know what the prevalence of this degree of misunderstanding is, but CPTs and child abuse pediatricians need make sure they have a proper understanding of sensitivity, specificity, positive and negative predictive value, base rates, etc.  And they should not be testifying about the probability of child abuse at all if they don't have this stuff down cold. And I think this means that some proportion of them needs to go back to school or stop testifying.

The article and associated correspondence at issue is entitled The Positive Predictive Value of Rib Fractures as an Indicator of Nonaccidental Trauma in Children published in 2004.  The authors looked at a series of rib fractures in children at a single Trauma Center in Colorado during a six year period and identified all patients with a rib fracture.  They then restricted their analysis to children less than 3 years of age.  There were 316 rib fractures among just 62 children in the series; the average number of rib fractures per child is ~5.  The proper unit of analysis for a study looking at positive predictive value is children, sorted into those with and without abuse, and with and without rib fracture(s) as seen in the 2x2 tables below.

Tuesday, January 28, 2020

Bad Science + Zealotry = The Wisconsin Witch Hunts. The Case of John Cox, MD

John Cox, MD
I stumbled upon a very disturbing report on NBC News today of a physician couple in Wisconsin accused of abusing their adopted infant daughter.  It is surreal and horrifying and worth a read - not because these physicians abused their daughter, but because they almost assuredly did not.  One driving force behind the case appears to be a well-meaning and perfervid, if misguided and perfidious, pediatrician at University of Wisconsin who, with her group, coined the term "sentinel injuries" (SI) to describe small injuries such as bruises and oral injuries that they posit portend future larger scale abuse.  It was the finding of SI on the adopted infant in the story that in part led to charges of abuse against the father, Dr. Cox, got his child put in protective services, got him arrested, and threatens his career.  Interested readers can reference the link above for sordid and sundry details of the case.

Before delving into the 2013 study in Pediatrics upon which many contentions about SI rest, we should start with the fundamentals.  First, is it plausible that the thesis is correct, that before serious abuse, minor abuse is detectable by small bruises or oral injuries?  Of course it is, and it sounds like a good narrative.  But being a good plausible narrative does not make it true and it is likewise possible that bruises seen in kids who are and are not abused reflect nothing more than accidental injuries from rolling off a support, something falling or dropping on them, somebody dropping them, a sibling jabbing at them with a toy, and a number of things.  To my knowledge, the authors offer no direct evidence that the SIs they or others report have been directly traced to abuse.  They are doing nothing more than inferring that facial bruising is a precursor to Abusive Head Trauma (AHT), and based on their bibliography they have gone out of their way to promote this notion.

Thursday, December 5, 2019

Noninferiority Trials of Reduced Intensity Therapies: SCORAD trial of Radiotherapy for Spinal Metastases

No mets here just my PTX
A trial in JAMA this week by Hoskin et al (the SCORAD trial) compared two different intensities of radiotherapy for spinal metastases.  This is a special kind of noninferiority trial, which we wrote about last year in BMJ Open.  When you compare the same therapy at two intensities using a noninferiority trail, you are in perilous territory.  This is because if the therapy works on a dose response curve, it is almost certain, a priori, that the lower dose is actually inferior - if you consider inferior to represent any statistically significant difference disfavoring a therapy.  (We discuss this position, which goes against the CONSORT grain, here.)  You only need a big enough sample size.  This may be OK, so long as you have what we call in the BMJ Open paper, "a suitably conservative margin of noninferiority."  Most margins of noninferiority (delta) are far from this standard.

The results of the SCORAD trial were consistent with our analysis of 30+ other noninferiority trials of reduced intensity therapies, and the point estimate favored - you guessed it - the more intensive radiotherapy.  This is fine.  It is also fine that the 1-sided 95% confidence interval crossed the 11% prespecified margin of noninferiority (P=0.06).  That just means you can't declare noninferiority.  What is not fine, in my opinion, is that the authors suggest that we look at how little overlap there was, basically an insinuation that we should consider it noninferior anyway.  I crafted a succinct missive to point this out to the editors, but alas I'm too busy to submit it and don't feel like bothering, so I'll post it here for those who like to think about these issues.

To the editor:  Hoskin et al report results of a noninferiority trial comparing two intensities of radiotherapy (single fraction versus multi-fraction) for spinal cord compression from metastatic cancer (the SCORAD trial)1.  In the most common type of noninferiority trial, investigators endeavor to show that a novel agent is not worse than an established one by more than a prespecified margin.  To maximize the chances of this, they generally choose the highest tolerable dose of the novel agent.  Similarly, guidelines admonish against underdosing the active control comparator as this will increase the chances of a false declaration of noninferiority of the novel agent2,3.  In the SCORAD trial, the goal was to determine if a lower dose of radiotherapy was noninferior to a higher dose. Assuming radiotherapy is efficacious and operates on a dose response curve, the true difference between the two trial arms is likely to favor the higher intensity multi-fraction regimen.  Consequently, there is an increased risk of falsely declaring noninferiority of single fraction radiotherapy4.  Therefore, we agree with the authors’ concluding statement that “the extent to which the lower bound of the CI overlapped with the noninferiority margin should be considered when interpreting the clinical importance of this finding.”  The lower bound of a two-sized 95% confidence interval (the trial used a 1-sided 95% confidence interval) extends to 13.1% in favor of multi-fraction radiotherapy.  Because the outcome of the trial was ambulatory status, and there were no differences in serious adverse events, our interpretation is that single fraction radiotherapy should not be considered noninferior to a multi-fraction regimen, without qualifications.

1.            Hoskin PJ, Hopkins K, Misra V, et al. Effect of Single-Fraction vs Multifraction Radiotherapy on Ambulatory Status Among Patients With Spinal Canal Compression From Metastatic Cancer: The SCORAD Randomized Clinical Trial. JAMA. 2019;322(21):2084-2094.
2.            Piaggio G, Elbourne DR, Pocock SJ, Evans SW, Altman DG, f CG. Reporting of noninferiority and equivalence randomized trials: Extension of the consort 2010 statement. JAMA. 2012;308(24):2594-2604.
3.            Jones B, Jarvis P, Lewis JA, Ebbutt AF. Trials to assess equivalence: the importance of rigorous methods. BMJ. 1996;313(7048):36-39.
4.            Aberegg SK, Hersh AM, Samore MH. Do non-inferiority trials of reduced intensity therapies show reduced effects? A descriptive analysis. BMJ open. 2018;8(3):e019494-e019494.

Saturday, November 23, 2019

Pathologizing Lipid Laden Macrophages (LLMs) in Vaping Associated Lung Injury (VALI)

It's time to weigh in on an ongoing debate being waged in the correspondence pages of the NEJM.  To wit, what is the significance of lipid laden macrophages (LLMs) in VALI?  As we stated, quite clearly, in our original research letter,

"Although the pathophysiological significance of these lipid-laden macrophages and their relation to the cause of this syndrome are not yet known, we posit that they may be a useful marker of this disease.3-5 Further work is needed to characterize the sensitivity and specificity of lipid-laden macrophages for vaping-related lung injury, and at this stage they cannot be used to confirm or exclude this syndrome. However, when vaping-related lung injury is suspected and infectious causes have been excluded, the presence of lipid-laden macrophages in BAL fluid may suggest vaping-related lung injury as a provisional diagnosis."
There, we outlined the two questions about their significance:  1.) any relation to the pathogenesis of the syndrome; and 2.) whether, after characterizing their sensitivity and specificity, they can be used in diagnosis.  I am not a lung biologist, so I will ignore the first question and focus on the second, where I actually do know a thing or two.

We still do not know the sensitivity or specificity of LLMs for VALI, but we can make some wagers based on what we do know.  First, regarding sensitivity.  In our ongoing registry at the University of Utah, we have over 30 patients with "confirmed" VALI (if you dont' have a gold standard, how do you "confirm" anything?), and to date all but one patient had LLMs in excess of 20% on BAL.  For the first several months we bronched everybody.  So, in terms of BAL and LLMs, I'm guessing we have the most extensive and consistent experience.  Our sensitivity therefore is over 95%.  In the Layden et al WI/IL series in NEJM, there were 7 BAL samples and all 7 had "lipid Layden macrophages" (that was a pun).  In another Utah series, Blagev et al reported that 8 of 9 samples tested showed LLMs.  Combining those data (ours are not yet published, but soon will be) we can state the following:  "Given the presence of VALI, the probability of LLM on Oil Red O staining (OROS) is 96%."  You may recognize that as a statement of sensitivity.  It is unusual to not find LLMs on OROS of BAL fluid in cases of VALI, and because of that, their absence makes the case atypical, just as does the absence of THC vaping.  Some may go so far as to say their absence calls into question the diagnosis, and I am among them.  But don't read between the lines.  I did not say that bronchoscopy is indicated to look for them.  I simply said that their absence makes the case atypical and calls it into question.

Sunday, September 1, 2019

Pediatrics and Scare Tactics: From Rock-n-Play to Car Safety Seats

Is sleeping in a car seat dangerous?
Earlier this year, the Fisher-Price company relented to pressure from the AAP (American Academy of Pediatrics) and recalled 4.7 million of Rock 'n Play (RnP) baby rockers, which now presumably occupy landfills.  This recommendation stemmed from an "investigation" by consumer reports showing that since 2011, 32 babies died while sleeping in the RnP.  These deaths are tragic, but what does it mean?  In order to make sense of this "statistic" we need to determine a rate, based on the exposure period, something like "the rate of infant death in the RnP is 1 per 10 million RnP occupied hours" or something like that.  Then we would compare it to the rate of infant death sleeping in bed.  If it was higher, we would have a starting point for considering whether ceteris paribus, maybe it's the RnP that is causing the infant deaths.  We would want to know the ratio of observed deaths in the RnP to expected deaths sleeping in some other arrangement for the same amount of time.  Of course, even if we found the observed rate was higher than the expected rate, other possibilities exist, i.e., it's an association, a marker for some other factor, rather than a cause of the deaths.  A more sophisticated study would, through a variety of methods, try to control for those other factors, say, socioeconomic status, infant birth weight, and so on.  The striking thing to me and other evidence minded people was that this recall did not even use the observed versus expected rate, or any rate at all!  Just a numerator!  We could do some back of the envelope calculations with some assumptions about rate ratios, but I won't bother here.  Suffice it to say that we had an infant son at that time and we kept using the RnP until he outgrew it and then we gave it away.

Last week, the AAP was at it again, playing loose with the data but tight with recommendations based upon them.  This time, it's car seats.  In an article in the August, 2019 edition of the journal Pediatrics, Liaw et al present data showing that, in a cohort of 11,779 infant deaths, 3% occurred in "sitting devices", and in 63% of this 3%, the sitting device was a car safety seat (CSS).  In the deaths in CSSs, 51.6% occurred in the child's home rather than in a car.  What was the rate of infant death per hour in the CSS?  We don't know.  What is the expected rate of death for the same amount of time sleeping, you know, in the recommended arrangement?  We don't know!  We're at it again - we have a numerator without a denominator, so no rate and no rate to compare it to.  It could be that 3% of the infant deaths occurred in car seats because infants are sleeping in car seats 3% of the time!

Sunday, July 21, 2019

Move Over Feckless Extubation, Make Room For Reckless Extubation

Following the theme of some recent posts on Status Iatrogenicus (here and here) about testing and treatment thresholds, one of our stellar fellows Meghan Cirulis MD and I wrote a letter to the editor of JAMA about the recent article by Subira et al comparing shorter duration Pressure Support Ventilation to longer duration T-piece trials.  Despite adhering to my well hewn formula for letters to the editor, it was not accepted, so as is my custom, I will publish it here.

Spoiler alert - when the patients you enroll in your weaning trial have a base rate of extubation success of 93%, you should not be doing an SBT - you should be extubating them all, and figuring out why your enrollment criteria are too stringent and how many extubatable patients your enrollment criteria are missing because of low sensitivity and high specificity.

Tuesday, May 7, 2019

Etomidate Succs: Preventing Dogma from Becoming Practice in RSI

The editorial about the PreVent trial in the NEJM a few months back is entitled "Preventing Dogma from Driving Practice".  If we are not careful, we will let the newest dogma replace the old dogma and become practice.

The PreVent trial compared bagging versus no bagging after induction of anesthesia for rapid sequence intubation (RSI).  Careful readers of this and another recent trial testing the dogma of videolaryngoscopy will notice several things that may significantly limit the external validity of the results.
  • The median time from induction to intubation was 130 seconds in the no bag ventilation group, and 158 seconds in the bag ventilation group (NS).  That's 2 to 2.5 minutes.  In the Lascarrou 2017 JAMA trial of direct versus video laryngoscopy, it was three minutes.  Speed matters.  The time that a patient is paralyzed and non-intubated is a very dangerous time and it ought to be as short as possible
  • The induction agent was Etomidate (Amidate) in 80% of the patients in the PreVent trial and 90% of patients in the Larascarrou trial (see supplementary appendix of PreVent trial)
  • The intubations were performed by trainees in approximately 80% of intubations in both trials (see supplementary appendix of PreVent trial)
I don't think these trials are directly relevant to my practice.  Like surgeon Robert Liston who operated in the pre-anesthesia era and learned that speed matters (he could amputate a leg in 2.5 minutes), I have learned that the shorter the time from induction to intubation, the better - it is a vulnerable time and badness occurs during it:  atelectasis, hypoxemia, aspiration, hypotension, secretion accumulation, etc.

Thursday, April 25, 2019

The EOLIA ECMO Bayesian Reanalysis in JAMA

A Hantavirus patient on ECMO, circa 2000
Spoiler alert:  I'm a Bayesian decision maker (although maybe not a Bayesian trialist) and I "believe" in ECMO as documented here.

My letter to the editor of JAMA was published today (and yeah I know, I write too many letters, but hey, I read a lot and regular peer review often doesn't cut it) and even when you come at them like a spider monkey, the authors of the original article still get the last word (and they deserve it - they have done far more work than the post-publication peer review hecklers with their quibbles and their niggling letters.)

But to set some thing clear, I will need some more words to elucidate some points about the study's interpretation.  The authors' response to my letter has five points.
  1. I (not they) committed confirmation bias, because I postulated harm from ECMO.  First, I do not have a personal prior for harm from ECMO, I actually think it is probably beneficial in properly selected patients, as is well documented in the blog post from 2011 describing my history of experience with it in hantavirus, and as well in a book chapter I wrote in Cardiopulmonary Bypass Principles and Practice circa 2006.  There is irony here - I "believe in" ECMO, I just don't think their Bayesian reanalysis supports my (or anybody's) beliefs in a rational way!  The point is that it was a post hoc unregistered Bayesian analysis after a pre-registered frequentist study which was "negative" (for all that's worth and not worth), and the authors clearly believe in the efficacy of ECMO as do I.  In finding shortcomings in their analysis, I seek to disconfirm or at least challenge no only their but my own beliefs.  And I think that if the EOLIA trial had been positive, that we would not be publishing Bayesian reanalyses showing how the frequentist trial may be a type I error.  We know from long experience that if EOLIA had been "positive" that success would have been declared for ECMO as it has been with prone positioning for ARDS.  (I prone patients too.)  The trend is to confirm rather than to disconfirm, but good science relies more on the latter.
  2. That a RR of 1.0 for ECMO is a "strongly skeptical" prior.  It may seem strong from a true believer standpoint, but not from a true nonbeliever standpoint.  Those are the true skeptics (I know some, but I'll not mention names - I'm not one of them) who think that ECMO is really harmful on the net, like intensive insulin therapy (IIT) probably is.  Regardless of all the preceding trials, if you ask the NICE-SUGAR investigators, they are likely to maintain that IIT is harmful.  Importantly, the authors skirt the issue of the emphasis they place on the only longstanding and widely regarded as positive ARDS trial (of low tidal volume).  There are three decades of trials in ARDS patients, scores of them, enrolling tens of thousands of patients, that show no effect of the various therapies.  Why would we give primacy to the the one trial which was positive, and equate ECMO to low tidal volume?  Why not equate it to high PEEP, or corticosteroids for ARDS?  A truly skeptical prior would have been centered on an aggregate point estimate and associated distribution of 30 years of all trials in ARDS of all therapies (the vast majority of them "negative").  The sheer magnitude of their numbers would narrow the width of the prior distribution with RR centered on 1.0 (the "severely skeptical" one), and it would pull the posterior more towards zero benefit, a null result.  Indeed, such a narrow prior distribution may have shown that low tidal volume is an outlier and likely to be a false positive (I won't go any farther down that perilous path).  The point is, even if you think a RR of 1.0 is severely skeptical, the width of the distribution counts for a lot too, and the uninitiated are likely to miss that important point.
  3. Priors are not used to "boost" the effect of ECMO.  (My original letter called it a Bayesian boost, borrowing from Mayo, but the adjective was edited out.) Maybe not always, but that was the effect in this case, and the respondents did not cite any examples of a positive frequentist result that was reanalyzed with Bayesian methods to "dampen" the observed effect.  It seems to only go one way, and that's why I alluded to confirmation bias.  The "data-driven priors" they published were tilted towards a positive result, as described above.
  4. Evidence and beliefs.  But as Russell said "The degree to which beliefs are based on evidence is very much less than believers suppose."  I support Russell's quip with the aforementioned.
  5. Judgment is subjective, etc.  I would welcome a poll, in the spirit of crowdsourcing, as we did here to better understand what the community thinks about ECMO (my guess is it's split ratherly evenly, with a trend, perhaps strong, for the efficacy of ECMO).  The authors' analysis is laudable, but it is not based on information not already available to the crowd; rather it transforms it in ways may not be transparent to the crowd and may magnify it in a biased fashion if people unfamiliar with Bayesian methods do not scrutinize the chosen prior distributions.

Sunday, April 21, 2019

A Finding of Noninferiority Does Not Show Efficacy - It Shows Noninferiority (of short course rifampin for MDR-TB)

An image of two separated curves from Mayo's book SIST
Published in the March 28th, 2019 issue of the NEJM is the STREAM trial of a shorter regimen for Rifampin-resistant TB.  I was interested in this trial because if fits the pattern of a "reduced intensity therapy", a cohort of which we recently analyzed and published last year.  The basic idea is this:  if you want to show efficacy of a therapy, you choose the highest dose of the active drug to compare to placebo, to improve the chances that you will get "separation" of the two populations and statistically significant results.  Sometimes, the choice of the "dose" of something, say tidal volume in ARDS, is so high that you are accused of harming one group rather than helping the other.  The point is if you want positive results, use the highest dose so the response curves will separate further, assuming efficacy.

Conversely, in a noninferiority trial, your null hypothesis is not that there is no difference between the groups as it is in a superiority trial, but rather it is that there is a difference bigger than delta (the pre-specified margin of noninferiority.  Rejection of the null hypothesis a leads you to conclude that there is no difference bigger than delta, and you then conclude noninferiority.  If you are comparing a new antibiotic to vancomycin, and you want to be able to conclude noninferiority, you may intentionally or subconsciously dose vancomycin at the lower end of the therapeutic range, or shorten the course of therapy.  Doing this increases the chances that you will reject the null hypothesis and conclude that there is no difference greater than delta in favor of vancomycin and that your new drug is noninferior.  However, this increases your type 1 error rate - the rate at which you falsely conclude noninferiority.

Sunday, December 23, 2018

Do Doctors and Medical Errors Kill More People than Guns?

Recently released stats showing over 40,000 deaths due to firearms in the US this year have led to the usual hackneyed comparisons between those deaths and deaths due to medical errors, the tired refrain something like "Doctors kill more people than that!"  These claims were spreading among gun aficionados on social media last week, with references to this 2016 BMJ editorial by Makary and Michael, from my alma mater Johns Hopkins Bloomberg SPH, claiming that "Medical Error is the Third Leading Cause of Death."  I have been incredulous about this claim when I have encountered it in the past, because it just doesn't jibe with my 20 years of working in these dangerous slaughterhouses we call hospitals.  I have no intention to minimize medical errors - they certainly occur and are highly undesirable - but I think gross overestimates do a disservice too.  Since this keeps coming up, I decided to delve further.

First, just for the record, I'm going to posit that the 40,000 firearms deaths is a reliable figure because they will be listed as homicides and suicides in the "manner of death" section of death certificates, and they're all going to be medical examiner cases.  So I have confidence in this figure.

Contrarily, the Makary paper has no new primary data.  It is simply an extrapolation of existing data and the source is a paper by James in the Journal of Patient Safety in 2013.  (Consider for a moment whether you may have any biases if your career depended upon publishing articles in the Journal of Patient Safety.)  This paper also has no new primary data but relies on data from 4 published studies, two of them not peer-reviewed but Office of the Inspector General (OIG) reports.  I will go through each of these in turn so we can see where these apocalyptic estimates come from.

OIG pilot study from 2008.  This is a random sample of 278 Medicare beneficiaries hospitalized in 2 unspecified and nonrandom counties.  All extrapolations are made from this small sample which has wide confidence intervals because of its small size (Appendix F, Table F1, page 33).  A harm scale is provided on page 3 of the document where the worst category on the letter scale is "I" which is:
"An error occurred that may have contributed to or resulted in patient death."  [Italics added.]

Thursday, May 24, 2018

You Have No Idea of the Predictive Value of Weaning Parameters for Extubation Success, and You Probably Never Will

As Dr. O'brien eloquently described in this post, many people misunderstand the Yang-Tobin (f/Vt) index as being a "weaning parameter" that is predictive of extubation success.  Far from that, it's sensitivity and specificity and resultant ROC curve relate to the ability of f/Vt after one minute of spontaneous ventilation to predict the success of a prolonged (~ one hour) spontaneous breathing trial.  But why would I want to predict the result of a test (the SBT), and introduce error, when I can just do the test and get the result an hour later?  It makes absolutely no sense.  What we want is a parameter that predicts extubation success.  But we don't have that, and we probably will never have that.

In order to determine the sensitivity and specificity of a test for extubation success, we will need to ascertain the outcome in all patients regardless of their performance on the test of interest.  That means we would have to extubate patients that failed the weaning parameter test.  In the original Yang & Tobin article, their cohort consisted of 100 patients.  60(%) of the 100 were said to have passed the weaning test and were extubated, and 40(%) failed and were not extubated.  (There is some over-simplification here based on how Yang & Tobin classified and reported events - its not at all transparent in their article - the data to resolve the issues are not reported and the differences are likely to be small.  Suffice it to say that about 60% of their patients were successfully weaned and the remainder were not.)  Let's try to construct a 2x2 table to determine the sensitivity and specificity of a weaning parameter using a population like theirs.  The top row of the 2x2 table would look something like this, assuming an 85% extubation success rate - that is, of the 60 patients with a positive or "passing" SBT score (based on whatever parameter), all were extubated and the positive predictive value of the test is 85% (the actual rate of reintubation in patients with a passing weaning test is not reported, so this is a guess):

Thursday, May 17, 2018

Increasing Disparities in Infant Mortality? How a Narrative Can Hinge on the Choice of Absolute and Relative Change

An April, 11th, 2018 article in the NYT entitled "Why America's Black Mothers and Babies are in a Life-or-Death Crisis" makes the following alarming summary statement about racial disparities in infant mortality in America:
Black infants in America are now more than twice as likely to die as white infants — 11.3 per 1,000 black babies, compared with 4.9 per 1,000 white babies, according to the most recent government data — a racial disparity that is actually wider than in 1850, 15 years before the end of slavery, when most black women were considered chattel.
Racial disparities in infant mortality have increased since 15 years before the end of the Civil War?  That would be alarming indeed.  But a few paragraphs before, we are given these statistics:

In 1850, when the death of a baby was simply a fact of life, and babies died so often that parents avoided naming their children before their first birthdays, the United States began keeping records of infant mortality by race. That year, the reported black infant-mortality rate was 340 per 1,000; the white rate was 217 per 1,000.
The white infant mortality rate has fallen 217-4.9 = 212.1 infants per 1000.  The black infant mortality rate has fallen 340-11.3 = 328.7 infants per 1000.  So in absolute terms, the terms that concern babies (how many of us are alive?), the black infant mortality rate has fallen much more than the white infant mortality rate.  In fact, in absolute terms, the disparity is almost gone:  in 1850, the absolute difference was 340-217 = 123 more black infants per 1000 births dying to 11.3-4.9 = 6.4 more black infants per 1000 births dying.

Analyzed a slightly different way, the proportion of white infants dying has been reduced by (217-4.9/217) 97.7%, and the proportion of black infants dying has been reduced by (340-11.3/340)= 96.7%.  So, within 1%, black and white babies shared almost equally in the improvements in infant mortality that have been seen since 15 years before the end of the Civil War.  Or, we could do a simple reference frame change and look at infant survival rather than mortality.  If we did that, the current infant survival rate is 98.87% for black babies and .9951% for white babies.  The rate ratio for black:white survival is .994 - almost parity depending on your sensitivity to variances from unity.

It's easy to see how the author of the article arrived at different conclusions by looking only at the rate ratios in 1850 and contemporaneously.  But doing the math that way makes it seem as if a black baby is worse off today than in 1850!  Nothing could be farther from the truth.

You might say that this is just "fuzzy math" as our erstwhile president did in the debates of 2000.  But there could be important policy implications also.  Suppose that I have an intervention that I could apply across the US population and I estimate that it will save an additional 5 black babies per 1000 and an additional 3 white babies per 1000.  We implement this policy and it works as projected.  The black infant mortality rate is 6.3/1000 and the white infant mortality rate is 1.9/1000.  We have saved far more black babies than white babies.  But the rate ratio for black:white mortality has increased from 2.3 to 3.3!  Black babies are now 3 (three!) times as likely to die as white babies!  The policy has increased disparities even though black babies are far better off after the policy change than before it.

It reminds me of the bias where people would rather take a smaller raise if it increased their standing relative to their neighbor.  Surprisingly, when presented with two choices:

  1. you make $50,000 and your peers make $25,000 per year
  2. You make $100,000 and your peers make $250,000 per year
many people choose 1, as if relative social standing is worth $50,000 per year in income.  (Note that relative social standing is just that, relative, and could change if you arbitrarily change the reference class.)

So, relative social standing has value and perhaps a lot of it.  But as regards the hypothetical policy change above, I'm not sure we should be focusing on relative changes in infant mortality.  We just want as few babies dying as possible.

Wednesday, May 2, 2018

Hollow Hegemony: The Opportunity Costs of Overemphasizing Sepsis

Protocols are to make complex tasks simple, not simple tasks complex. - Scott K Aberegg

Yet here we find ourselves some 16 years after the inauguration of the Surviving Sepsis Campaign, and their influence continues to metastasize, even after the message has been hollowed out like a piece of fallen, old-growth timber.

Surviving sepsis was the brainchild of Eli Lilly, who, in the year after the ill-fated FDA approval of drotrecogin-alfa, worried that the drug would not sell well if clinicians did not have an increased awareness of sepsis. That aside, in those days, there were legitimate questions surrounding the adoption and implementation of several new therapies such as EGDT, corticosteroids for septic shock, Xigris for those with APACHE scores over 25, intensive insulin therapy, etc.

Those questions are mostly answered. Sepsis is now, quite simply, a complex of systemic manifestations of infection almost all of which will resolve with treatment of the infection and general supportive care. The concept of sepsis could vanish entirely, and nothing about the clinical care of the patient would change: an infection would be diagnosed, the cause/source identified and treated, and hemodynamics and laboratory dyscrasias supported meanwhile. There is nothing else to do (because lactic acidosis does not exist.)

But because of the hegemony of the sepsis juggernaut (the spawn of the almighty dollar), we are now threatened with a mandate to treat patients carrying the sepsis label (oftentimes assigned by a hospital coder after the fact) with antibiotics and a fluid bolus within one hour of triage in the ED. Based on what evidence?

Weak recommendation, "Best Practice Statement" and some strong recommendations based on low and moderate quality evidence.  So if we whittle it down to just moderate quality of evidence, what do we have?  Give antibiotics for infections, and give vasopressors if MAP less than 65.  But now we have to hurry up and do the whole kit and caboodle boiler plate style within 60 minutes?

Sepsis need not be treated any differently than a gastrointestinal hemorrhage, or for that matter, any other disease.  You make the diagnosis, determine and control the cause (source), give appropriate treatments, and support the physiology in the meantime, all while prioritizing the sickest patients.  But that counts for all diseases, not just sepsis, and there is only so much time in an hour.  When every little old lady with fever and a UTI suddenly rises atop the priorities of the physician, this creates an opportunity cost/loss for the poor bastard bleeding next door who doesn't have 2 large-bore IVs or a type and cross yet because grandma is being flogged with 2 liters of fluid, and in a hurry.  If only somebody had poured mega-bucks into increased recognition and swift treatment of GI bleeds....

Petition to retire the surviving sepsis campaign guidelines:

(Sign the Petition Here.)


Concern regarding the Surviving Sepsis Campaign (SSC) guidelines dates back to their inception.  Guideline development was sponsored by Eli Lilly and Edwards Life Sciences as part of a commercial marketing campaign (1).  Throughout its history, the SSC has a track record of conflicts of interest, making strong recommendations based on weak evidence, and being poorly responsive to new evidence (2-6).

The original backbone of the guidelines was a single-center trial by Rivers defining a protocol for early goal-directed therapy (7).  Even after key elements of the Rivers protocol were disproven, the SSC continued to recommend them.  For example, SSC continued to recommend the use of central venous pressure and mixed venous oxygen saturation after the emergence of evidence that they were nonbeneficial (including the PROCESS and ARISE trials).  These interventions eventually fell out of favor, despite the slow response of SSC that delayed knowledge translation. 

SSC has been sponsored by Eli Lilly, manufacturer of Activated Protein C.  The guidelines continued recommending Activated Protein C until it was pulled from international markets in 2011.  For example, the 2008 Guidelines recommended this, despite ongoing controversy and the emergence of neutral trials at that time (8,9).  Notably, 11 of 24 guideline authors had financial conflicts of interest with Eli Lilly (10).

The Infectious Disease Society of America (IDSA) refused to endorse the SSC because of a suboptimal rating system and industry sponsorship (1).  The IDSA has enormous experience in treating infection and creating guidelines.  Septic patients deserve a set of guidelines that meet the IDSA standards.

Guidelines should summarize evidence and provide recommendations to clinicians.  Unfortunately, the SSC doesn’t seem to trust clinicians to exercise judgement.  The guidelines infantilize clinicians by prescribing a rigid set of bundles which mandate specific interventions within fixed time frames (example above)(10).  These recommendations are mostly arbitrary and unsupported by evidence (11,12).  Nonetheless, they have been adopted by the Centers for Medicare & Medicaid Services as a core measure (SEP-1).  This pressures physicians to administer treatments despite their best medical judgment (e.g. fluid bolus for a patient with clinically obvious volume overload).

We have attempted to discuss these issues with the SSC in a variety of forums, ranging from personal communications to formal publications (13-15).  We have tried to illuminate deficiencies in the SSC bundles and the consequent SEP-1 core measures.  Our arguments have fallen on deaf ears. 

We have waited patiently for years in hopes that the guidelines would improve, but they have not.  The 2018 SSC update is actually worse than prior guidelines, requiring the initiation of antibiotics and 30 cc/kg fluid bolus within merely sixty minutes of emergency department triage (16).  These recommendations are arbitrary and dangerous.  They will likely cause hasty management decisions, inappropriate fluid administration, and indiscriminate use of broad-spectrum antibiotics.  We have been down this path before with other guidelines that required antibiotics for pneumonia within four hours, a recommendation that harmed patients and was eventually withdrawn (17).

It is increasingly clear that the SSC guidelines are an impediment to providing the best possible care to our septic patients.  The rigid framework mandated by SSC doesn’t help experienced clinicians provide tailored therapy to their patients.  Furthermore, the hegemony of these guidelines prevents other societies from developing better guidelines.

We are therefore petitioning for the retirement of the SSC guidelines.  In its place, we would call for the development of separate sepsis guidelines by the United States, Europe, ANZICS, and likely other locales as well.  There has been a monopoly on sepsis guidelines for too long, leading to stagnation and dogmatism.  We would hope that these new guidelines are written by collaborations of the appropriate professional societies, based on the highest evidentiary standards.  The existence of several competing sepsis guidelines could promote a diversity of opinions, regional adaptation, and flexible thinking about different approaches to sepsis. 

We are disseminating an international petition that will allow clinicians to express their displeasure and concern over these guidelines.  If you believe that our septic patients deserve more evidence-based guidelines, please stand with us.  


Scott Aberegg MD MPH
Jennifer Beck-Esmay MD
Steven Carroll DO MEd
Joshua Farkas MD
Jon-Emile Kenny MD
Alex Koyfman MD
Michelle Lin MD
Brit Long MD
Manu Malbrain MD PhD
Paul Marik MD
Ken Milne MD
Justin Morgenstern MD
Segun Olusanya MD
Salim Rezaie MD
Philippe Rola MD
Manpreet Singh MD
Rory Speigel MD
Reuben Strayer MD
Anand Swaminathan MD
Adam Thomas MD
Lauren Westafer DO MPH
Scott Weingart MD

  1. Eichacker PQ, Natanson C, Danner RL.  Surviving Sepsis – Practice guidelines, marketing campaigns, and Eli Lilly.  New England Journal of Medicine  2006; 16: 1640-1642.
  2. Pepper DJ, Jaswal D, Sun J, Welsch J, Natanson C, Eichacker PQ.  Evidence underpinning the Centers for Medicare & Medicaid Services’ Severe Sepsis and Septic Shock Management Bundle (SEP-1): A systematic review.  Annals of Internal Medicine 2018; 168:  558-568. 
  3. Finfer S.  The Surviving Sepsis Campaign:  Robust evaluation and high-quality primary research is still needed.  Intensive Care Medicine  2010; 36:  187-189.
  4. Salluh JIF, Bozza PT, Bozza FA.  Surviving sepsis campaign:  A critical reappraisal.  Shock 2008; 30: 70-72. 
  5. Eichacker PQ, Natanson C, Danner RL.  Separating practice guidelines from pharmaceutical marketing.  Critical Care Medicine 2007; 35:  2877-2878. 
  6. Hicks P, Cooper DJ, Webb S, Myburgh J, Sppelt I, Peake S, Joyce C, Stephens D, Turner A, French C, Hart G, Jenkins I, Burrell A.  The Surviving Sepsis Campaign:  International guidelines for management of severe sepsis and septic shock: 2008.  An assessment by the Australian and New Zealand Intensive Care Society.  Anaesthesia and Intensive Care 2008; 36: 149-151.
  7. Rivers ME et al.  Early goal-directed therapy in the treatment of severe sepsis and septic shock.  New England Journal of Medicine 2001; 345: 1368-1377.
  8. Wenzel RP, Edmond MB.  Septic shock – Evaluating another failed treatment.  New England Journal of Medicine 2012; 366:  2122-2124.  
  9. Savel RH, Munro CL.  Evidence-based backlash:  The tale of drotrecogin alfa.  American Journal of Critical Care  2012; 21: 81-83. 
  10. Dellinger RP, Levy MM, Carlet JM et al.  Surviving sepsis campaign:  International guidelines for management of severe sepsis and septic shock:  2008.  Intensive Care Medicine 2008; 34:  17-60. 
  11. Allison MG, Schenkel SM.  SEP-1:  A sepsis measure in need of resuscitation?  Annals of Emergency Medicine 2018; 71: 18-20.
  12. Barochia AV, Xizhong C, Eichacker PQ.  The Surviving Sepsis Campaign’s revised sepsis bundles.  Current Infectious Disease Reports 2013; 15:  385-393. 
  13. Marik PE, Malbrain MLNG.  The SEP-1 quality mandate may be harmful: How to drown a patient with 30 ml per kg fluid!  Anesthesiology and Intensive Therapy 2017; 49(5) 323-328.
  14. Faust JS, Weingart SD.  The past, present, and future of the centers for Medicare and Medicaid Services quality measure SEP-1:  The early management bundle for severe sepsis/septic shock.  Emergency Medicine Clinics of North America 2017; 35:  219-231.
  15. Marik PE.  Surviving sepsis:  going beyond the guidelines.  Annals of Intensive Care 2011; 1: 17.
  16. Levy MM, Evans LE, Rhodes A.  The surviving sepsis campaign bundle:  2018 update.  Intensive Care Medicine.  Electronic publication ahead of print, PMID 29675566.
  17. Kanwar M, Brar N, Khatib R, Fakih MG.  Misdiagnosis of community-acquired pneumonia and inappropriate utilization of antibiotics: side effects of the 4-h antibiotic administration rule.  Chest 2007; 131: 1865-1869.

Sunday, April 22, 2018

The Respiratory Rosenhan Experiment on Obese Patients

Normally this post would be on Status Iatrogenicus, but the implications for "evidence" and the EBM movement are too important, so it goes here.

For those not inclined to read about the Rosenhan Experiment, a brief history.  In the early 1970s, Dr. Rosenhan a Stanford psychologist, was concerned about the validity of psychiatric diagnoses.  So he and a half dozen confederates faked mental illness and presented themselves to several prominent psychiatric facilities.  Their feigned symptoms were sufficient to have them admitted with psychiatric diagnoses, usually paranoid schizophrenia.  After admission, they behaved normally.  Nonetheless, they had been diagnosed, put on psychotropic medications, and not allowed to leave until they signed documents swearing to continue the medications upon their release.  The damning report of this experiment was published in Science Magazine in 1973 with the title "On Being Sane in Insane Places."

Psychiatric hospitals were blindsided, having been caught unawares and humiliated.  They challenged Rosenhan to "do it again", but the second time they would be vigilant about these disimulations.  Send us some more fakes, they said.  On the second experiment, they determined that 40 some patients were confederates.  But Rosenhan had the upper RosenHand - he had send no confederates.  Sensitivity, if it can be called that, suffered to specificity on round two.

Previously I have complained about patients who are intubated but needn't have been (see here, and here.)  Oftentimes, after the fact, it is difficult to determine whether the intubation was necessary, especially with alleged upper airway compromise, and with obese patients.  With the latter, roentgenograms of the chest are difficult to interpret because of "fatelectasis" or atelectasis in obese persons,  "flatelectasis" due to recumbency, fluid loading after paralysis and intubation, all which may require high PEEP to counteract.  If you were not present prior to intubation, it is very difficult to determine if respiratory distress preceded the intubation, or if "won't breathe" was mistaken for "can't breathe".  The differentiation between those two entities is "critical".

A man in his late 40s was sent to us recently for "acute respiratory failure" on the ventilator receiving 100% FiO2 and PEEP of 16.  EMS responded to a call to his 18-wheeler that he could not breathe.  His SpO2 was in the 50% range and he was admitted to a local hospital.  There were basilar opacities, and oxygen was administered.   He weighed near 500#.  The opacities were said to represent pneumonia and he was given antibiotics.  Not long after admission, in the middle of the night, he could not be aroused, an ABG was obtained, and his PaCO2 was 90-something with a pH of 7.10 or thereabouts.  This was interpreted to represent acute hypercapneic respiratory failure, on top of his "acute hypoxemic respiratory failure" and an hour-long intubation ensued.  Afterwards he was sent to us on the aforementioned high ventilator settings.