Medical Evidence Blog: Therapeutic Agnosticism: Stochastic Dominance of the Null Hypothesis

Here are some more thoughts on the epistemology of medical science and practice that were stimulated by reading three articles this week relating to monitoring interventions: monitoring respiratory muscle function in the ICU (AJRCCM, January 1, 2013); monitoring intracranial pressure in traumatic brain injury (NEJM, December 27, 2013); and monitoring of gastric residual volume in the ICU (JAMA, January 16th, 2013).

In my last post about transfusion thresholds, I mused that overconfidence in their understanding of complex pathophysiological phenomena (did I say arrogance?) leads investigators and practitioners to overestimate their ability to discern the value and efficacy of a therapy in medicine. Take, for instance, the vascular biologist studying pulmonary hypertension who, rounding in the ICU, elects to give sildenafil to a patient with acute right heart failure, and who proffers a plethora of complex physiological explanations for this selection. Is there really any way for anyone to know the effects of sildenafil in this scenario?

I start with the idea that very little or nothing we do works. Bedrest? Bunkum. Daily labs? Nonsense. Paralysis? Poppycock. Therapeutic paracentesis? Tripe. Erythropoesis stimulating agents? Hoopla. A "balloon pump"? Even the name begs derision. This blog has chronicled countless promising therapies that didn't pan out, beautiful scientific underpinnings notwithstanding. So with everything we do, we should start with a personal null hypothesis and accompanying agnosticism about the effects of our interventions. Regardless of how much we learn about pathophysiology, we should assume ignorance about the effects of our interventions, with two notable categories of exceptions:

Category 1. When the observed effect is so robust, dramatic, predictable, and consistent, that denying it would be irrational. Therapies that conform to these qualifications do not really need RCTs to prove them and indeed RCTs may be unethical. The effect size of the intervention is so great that the number needed to treat (NNT) is very small or approaches 1. When I think of such therapies, I'm thinking of cholecystectomy for septic cholecystitis, insulin for DKA, lasix for CHF, angioplasty for AMI, mechanical ventilation for respiratory extremis and ARDS. (In the same vein, I might include paradigm shifts in medical practice that have strong experiential support - such as the reduction in sedation and aggressive physical therapy and mobility in the ICU in the last 10 years. But caution is warranted here - we are prone to getting swept up in any cultural current and carried away. I am more confident in these two "interventions" because they are, in reality, a scaling back of harmful interventions (sedation and paralysis) that have been impeding patients' recovery for decades!)

Category 2. When a trial free from bias demonstrates an unequivocal and repeatable/repeated effect on a meaningful outcome. In this category, I'm thinking of low tidal volume ventilation for ARDS; Aspirin for AMI; ACE inhibitors for CHF; and Statins for cardiovascular disease. Characteristic of this category is the idea that the effect is smaller or takes longer to accrue, so the dramatic obviousness of the first category is absent and longer and closer monitoring is required to document the effect with statistical analyses. The NNT of interventions in this category is larger.

Note that in Category 1, scientific knowledge of pharmacodynamics and pathophysiology was necessary to bring lasix to the armamentarium of agents for the treatment of disease. Lasix was designed by investigators with a scientific rationale. And they may have it right when they wax prolific about its mechanisms of action in the treatment of disease. My point is that if they're right about its effects on disease, I ought to be able to see the effects with my own eyes. When I use it, the urine should flow, the oxygen should wean, the ankles should shrink, the rales should abate, and most importantly, the patient should tell me s/he feels better and should get out of bed and walk. The observations ought to bolster the physiological explanation. If they don't, an RCT is in order.

But too often - and this is my gripe about the pathophysiological approach to clinical practice in internal medicine - we think we understand the physiology so we apply a therapy and nothing discernible happens. But we persist because we're so smitten with what we think we know about pathophysiology. And the next thing you know, medication lists swell, testing is rampant, devices are inserted, complications arise, and....well that's a different blog [status iatrogenicus].

What I'm talking about here is a different religion - one that worships the null hypothesis, and whose followers believe that nothing works unless we can see it with our own eyes, or unless we have convincing RCT data showing that it works.

A particularly pernicious effect of pathphysiology-based practice is the notion that we can bring laboratory style monitoring devices to the bedside and apply them to clinical care. It's not that we can't apply them, but rather that their application often does little except create a false sense of certainty for the doctor, and complications for the patient. The best known example is the Swan-Ganz catheter (SGC) which was in vogue in the ICU and elsewhere for several decades, prior to the 1996 landmark study by Connors et al in which it was shown that SGC use was associated with increased mortality and resource use. Caveats of this observational study notwithstanding, it served as a wakeup call for the medicine and critical care community. We had all sorts of hard-earned physiological knowledge about pressures and flows in the cardiovascular system in health and disease, and all kinds of therapies to manipulate these variables, so why would a device that allows us to measure the pressures and flows not improve outcomes?

First, there are too many leaps of faith that need to be made with monitoring devices. We have to have the pathophysiology right; measurement error in the device and the clinician has to be nominal; the interventions we employ, guided by the monitoring device, have to have net benefit; and the guidance of the device must provide benefit that exceeds what would be achieved by employing the therapies without any guidance. And all of the pieces of this multi-step pathway have to go right or the whole thing falls apart. Plus, I would argue, our ability to see clearly the result of a complex monitoring and intervention scheme will be lacking and we will never achieve the standards I set forth in Category 1. We're almost always going to need an RCT of monitoring devices.

So, two weeks ago in AJRCCM, scientists interested in respiratory muscles report in a perspective article their enthusiasm for measuring (or attempting to measure) respiratory muscle function (RMF) in the critically ill. The article is a great review of pathophysiological aspects of RMF in the ICU, but for the clinician it is nothing but grandiose wishful thinking about how this information can be applied for the benefit of the patient. The authors tacitly admit this, but persist:

"There is scarce literature that directly demonstrates improved outcome with close monitoring (and action) of the respiratory muscles. However, over the last years, circumstantial evidence suggests that respiratory muscle monitoring can affect clinical care in the ICU.....the absence of sound scientific data for clinical benefit should not discourage clinicians from having a closer look at RMF in critically ill patients."

I beg to differ. The last thing we need are legions of newly minted pulmonologists freshly released from fellowship running around ICUs with M-mode ultrasound machines, divining patients' RMF and making clinical predictions and prognostications on the basis of this information prior to guidance from properly conducted outcomes trials. We do not need another analog of the SGC. We do not need to unleash another runaway train. Please, cease and desist. Those fellows would be better off spending 5 minutes helping physical therapy get patients out of bed and into a chair, or personally conducting awakening and breathing trials.

And just 4 days earlier in the December 27th NEJM, we find an article about intracranial-pressure monitoring in traumatic brain injury that shows that invasive monitoring is not superior to clinical and radioligical examination (in Bolivia and Ecuador). So much for the beloved bolt - the null hypothesis wins again.

Then, in the January 16th, 2013 issue of JAMA, we have a very interesting article about the monitoring of gastric residual volume (GRV) in patients receiving enteral feeding in the ICU. Despite all of its limitations including a non-inferiority design (ugh), lack of blinding, an aggressive feeding regimen (that's a whole other post), and a primary outcome with significant risk of ascertainment bias (VAP, the most dodgy outcome known to critical care), this study showed no benefit of monitoring GRV in ICU patients receiving tube feeds in terms of the incidence of VAP. They vomited less, yes, and this in itself may be of value, but the authors and the editorialist concluded that we should rethink GRV monitoring.

So, I'm back to where I started - nothing works. Unless I can see the effect immediately with mine own eyes (Category 1), or unless you show me the results of an unbiased trial demonstrating a direct effect of the intervention on a meaningful outcome (Category 2). And monitoring devices? Forget it, they're always Category 2.

The null hypothesis, in clinical practice as in clinical trials, has stochastic dominance. If you're the betting type, bet on the null.

2 comments:

Scott K. Aberegg, M.D., M.P.H.January 27, 2013 at 4:26 PM
Here's a Category 1 and Category 2 analogy for laypersons who may be tuning in.

Suppose you drive a turbocharged automobile. Among the ways you may improve its performance or its longevity are:

1.) Buying a tuning ECU (of a "chip") to reprogram the function of the turbos to increase power (often at the expense of emissions). After you do this, you will notice an immediate and dramatic increase in power and performance. You need not know anything else and no other data are required to make your assessment. (You may want to check the boards online to see if people are breaking parts from the increased power after they install the device.
Scott K. Aberegg, M.D., M.P.H.January 27, 2013 at 4:29 PM
2.) A materials scientist advises you to buy and put in the engine some Slick 50 which he says will coat the cylinder walls and pistons with a polymer that will reduce friction, improve performance and gas mileage, and increase engine longevity. You pour in a quart. You notice no difference. You may wish to request more data before you make a habit of treating your vehicle with this substance. It's all a big guess whether it works as advertised.

Sunday, January 27, 2013

Therapeutic Agnosticism: Stochastic Dominance of the Null Hypothesis

2 comments: