Medical Evidence Blog: December 2007

Monday, December 31, 2007

Is there any place for the f/Vt (the Yang-Tobin index) in today's ICU?

Recently, Tobin and Jubran performed an eloquent re-analysis of the value of “weaning predictor tests” (Crit Care Med 2008; 36: 1). In an accompanying editorial, Dr. MacIntyre does an admirable job of disputing some of the authors’ contentions (Crit Care Med 2008; 36: 329). However, I suspect space limited his ability to defend the recommendations of the guidelines for weaning and discontinuation of ventilatory support.

Tobin and Jubran provide a whirlwind tour of the limitations of meta-analyses. These are important considerations when interpreting the reported results. However, lost in this critique of the presumed approach used by the McMaster group and the joint tack force are the limitations of the studies on which the meta-analysis was based. Tobin and Jubran provide excellent points about systematic error limiting the internal validity of the study but, interestingly, do not apply such criticism to studies of f/Vt.

For the sake of simplicity, I will limit my discussion to the original report by Yang and Tobin (New Eng J Med 1991; 324: 1445). As a reminder, this was a single center study which included 36 subjects in a “training set” and 64 subjects in a “prospective-validation set.” Patients were selected if “clinically stable and whose primary physicians considered them ready to undergo a weaning trial.” The authors then looked a variety of measures to determine predictors of those “able to sustain spontaneous breathing for ≥24 hours after extubation” versus those “in whom mechanical ventilation was reinstituted at the end of a weaning trial or who required reintubation within 24 hours.” While not explicitly stated, it looks as if all the patients who failed a weaning trial had mechanical ventilation reinstituted, rather than failing extubation.

In determining the internal validity of a diagnostic test, one important consideration is that all subjects have the “gold standard” test performed. In the case of “weaning predictor tests,” what is the condition we are trying to diagnose? I would argue that it is the presence of respiratory failure requiring continued ventilatory support. Alternatively, it is the absence of respiratory failure requiring continued ventilatory support. I would also argue that the gold standard test for this condition is the ability to sustain spontaneous breathing. Therefore, to determine the test performance of “weaning predictor tests,” all subjects should undergo a trial of spontaneous breathing regardless of the results of the predictor tests. Now, some may argue that the self-breathing trial (or spontaneous breathing trial) is, indeed, this gold standard. I would agree if SBTs were perfectly accurate in predicting removal of the endotracheal tube and spontaneous breathing without a ventilator in the room. This is, however, not the case. So, truly, what Yang and Tobin are assessing is the ability of these tests to predict the performance on a subsequent SBT.

Dr. MacIntyre argues that “since the outcome of an SBT is the outcome of interest, why waste time and effort trying to predict it?” I would agree with this within limits. Existing literature supports the use of very basic parameters (e.g., hemodynamic stability, low levels of FiO2 and PEEP, etc.) as screens for identifying patients for whom an SBT is appropriate. Uncertain is the value of daily SBTs in all patients, regardless of passing this screen or not. One might hypothesize that simplifying this step even further might provide incremental benefit. Yang and Tobin, however, must consider a failure on an SBT to have deleterious effects. They consider “weaning trials undertaken either prematurely or after an unnecessary delay…equally deleterious to a patient’s health.” There is no reference supporting this assertion. Recent data suggest that inclusion of “weaning predictor tests” do not save patients from harm due to avoiding SBTs destined to fail (Tanios et al. Crit Care Med, 2006; 34: 2530). On the contrary, inclusion of the f/Vt as the first in Tobin’s and Jubran’s “three diagnostic tests in sequence” resulted in prolonged weaning time.

Tobin and Jubran also note the importance of prior probabilities in determining the performance of a diagnostic test. In the original study, Yang and Tobin selected patients who “were considered ready to undergo a weaning trial” by their primary physicians. Other studies have reported that such clinician assessments are very unreliable with predictive values marginally better than a coin-flip (Stroetz et al, Am J Resp Crit Care Med, 1995; 152: 1034). Perhaps, the clinicians whose patients were in this study are better than this. However, we are not provided with strict clinical rules which define this candidacy for weaning but can probably presume that “readiness” is at least a 50% prior probability of success. Using Yang and Tobin’s sensitivity of 0.97 and specificity of 0.64 for f/Vt, we can generate a range of posterior probabilities of success on a weaning trial:

As one can see, the results of the f/Vt assessment have a dramatic effect on the posterior probabilities of successful SBTs. However, is there a threshold below which one would advocate not performing an SBT if one’s prior probability is 50% or higher? I doubt it. Even with a pre-test probability of successful SBT of 50% and a failed f/Vt, 1 in 25 patients would actually do well on an SBT. I am not willing to forego an SBT with such data since, in my mind, SBTs are not as dangerous as continued, unneeded mechanical ventilation. I would consider low f/Vt values as completely non-informative since they do not instruct me at all regarding the success of extubation – the outcome for which I am most interested.

Other studies have used f/Vt to predict extubation failure (rather than SBT failure) and these are nicely outlined in a recent summary by Tobin and Jubran (Intensive Care Medicine 2006; 32: 2002). Even if we ignore different cut-points of f/Vt and provide the most optimistic specificities (96% for f/Vt <100, Uusaro et al, Crit Care Med 2000; 28: 2313) and sensitivities (79% for f/VT <88, Zeggwagh et al., Intens Care Med 1999; 25:1077), the f/Vt may not help much. As with the prior table, using prior probabilities and the results of the f/Vt testing, we can generate posterior probabilities of successful extubation:

As with the predictions of SBT failure, a high f/Vt lowers the posterior probability of successful extubation greatly. However, one must consider the cut off for posterior probabilities in which one would not even attempt an SBT. Even with a 1% posterior probability, 1 in 100 patients will be successfully extubated. This is the rate when the prior probability of successful extubation is only 20% AND the patient has a high f/Vt! What rate of failed extubation is acceptable or, even, preferable? Five percent? Ten percent? If one never reintubates a patient, it is more likely that he is waiting “too long” to extubate rather than possessing perfect discrimination. Furthermore, what is the likelihood that patients with poor performance on an f/Vt will do well on an SBT? I suspect this failure will prohibit extubation and the high f/Vt values will only spare the effort of performing the SBT. Is the incremental effort of performing SBTs on those who are destined to fail such that it requires more time than the added complexity of using the f/Vt to determine if a patient should receive an SBT at all? Presuming that we require an SBT prior to extubation, low f/Vt values remain non-informative. One could argue that with a posterior probability of >95%, we should simply extubate the patient, but I doubt many would take this approach, except in those intubated for reasons not related to respiratory problems (e.g. mechanical ventilation for surgery or drug overdose).

Drs. Tobin, Jubran and Marini (who writes an additional, accompanying editorial, Crit Care Med 2008; 36: 328) are master clinicians and physiologists. When they are at the bedside, I do not doubt that their “clinical experience and firm grasp of pathophysiology” (as Dr. Marini mentions), can match or even exceed the performance of protocolized care. Indeed, expert clinicians at Johns Hopkins have demonstrated that protocolized care did not improve the performance of the clinical team (Krishnan et al., Am J Resp Crit Care Med 2004; 169: 673). I have heard Dr. Tobin argue that this indicates that protocols do not provide benefit for assessment of liberation (American Thoracic Society, 2007). I doubt that the authors would strictly agree with his interpretation of their data since several of the authors note in a separate publication that “the regularity of steps enforced by a protocol as executed by nurses or therapists trumps the rarefied individual decisions made sporadically by busy physicians” (Fessler and Brower, Crit Care Med 2005; 33: S224). What happens to the first patient who is admitted after Dr. Tobin leaves service? What if the physician assuming the care of his patients is more interested in sepsis than ventilatory physiology? What about the patient admitted to a small hospital in suburban Chicago rather than one of the Loyola hospitals? Protocols do not intend to set the ceiling on clinical decision-making and performance, but they can raise the floor.

Friday, December 28, 2007

Results of the Poll - Large Trials are preferred

The purpose of the poll that has been running alongside the posts on this blog for some months now was to determine if physicians/researchers (a convenience sample of folks visiting this site) intuitively are Bayesian when they think about clinical trials.

To summarize the results, 43/68 respondents (63%) reported that they preferred the larger 30-center RCT. This differs significantly from the hypothesized value of 50% (p=0.032).

From a purely mathematical and Bayesian perspective, physicians should be ambivalent about the choice between a large(r) 30-center RCT involving 2100 patients showing a 5% mortality reduction at p=0.0005, and 3 small(er) 10-center RCTs involving 700 patients each showing the same 5% mortality reduction at p=0.04. In essence, unless respondents were reading between the lines somewhere, the choice is between two options with identical posterior probabilities. That is, if the three smaller trials are combined, they are equal to the larger trial and the meta-analytic p-value is 0.0005. Looked at from a different perspective, the large 30-center trial could have been analyzed as 3 10-center trials based on the region of the country in which the centers were located or any other arbitrary classification of centers.

Why this result? I obviously can't say based on this simple poll, but here are some guesses: 1.) People are more comfortable with larger multicenter studies, perhaps because they are accustomed to seeing cardiology mega-trials in journals such as NEJM; or 2.) The p-value of 0.04 associated with the small(er) studies seems "marginal" and the combination of the three studies is non-intuitive, and/or it is not possible to see that the combination p-value will be the same. However, I have some (currently unpublished) data which show that [paradoxically] for the same study, physicians are more willing to adopt a therapy with a higher rather than a lower p-value.
Further research is obviously needed to determine how physicians respond to evidence from clinical trials and whether or not their responses are normative. In this poll, it appears that they were not.

Friday, December 21, 2007

Patients and Physicians should BOYCOTT Zetia and Vytorin: Forcing MRK and SGP to come clean with the data

You wouldn't believe it - or would you? The NYT reports today that SGP has data from a number of - go figure - unpublished studies that may contain important data about increased [and previously undisclosed] risks of liver toxicity with Zetia and Vytorin: http://www.nytimes.com/2007/12/21/business/21drug.html Unproven benefits, undisclosed risks? If I were a patient, I would want to be taken off this drug and be put on atorvastatin or simvastatin or a similar agent. If teh medical community would get on board and take patients off of this unproven and perhaps risky drug, that might at least force the companies to come clean with their data.

In fact, I'm astonished at the medical community's reluctance to challenge the status quo which is represented by widespread use of drugs such as this and Avandia, for which there is no proof of efficacy save for surrogate endpoints, and for which there is evidence of harm. These drugs are not good bets unless alternatives do not exist, and of course they do. I am astonished in my pulmonary clinic to see many patients referred for dyspnea, with a history of heart disease and/or cardiomyopathy who remain on Avandia. Apparently, protean dyspnea is not a sufficient wake-up call to change the diabetes management of a patient who is receiving an agent of unproven efficacy and which is known to cause fluid retention and CHF. This just goes to show how effective pharmaceutical marketing campaigns are, how out-of-control things have become, and how non-normative physicians' approach to the data are.

The profit motive impels them forward. The evidence does not support the agents proffered. Evidence of harm is available. Alternatives exist. Why aren't physicians taking patients off drugs such as vioxx, avandia, zetia, and vytorin, and using alternative agents until the confusion is resolved?

Sunday, December 16, 2007

Dexmedetomidine: a New Standard in Critical Care Sedation?

In last week's JAMA, Wes Ely's group at Vanderbilt report the results of a trial comparing dexmedetomidine to lorazepam for the sedation of critically ill patients:
http://jama.ama-assn.org/cgi/content/short/298/22/2644
This group, along with others, has taken the lead as innovators in research related to sedation and delirium in the ICU (in addition to other topics), and this is a very important article in this area. In short, the authors found that, when compared to lorazepam, dexmed led to better targeted sedation and less time in coma, with a trend toward improved mortality.

One of the most impressive things about this study is stated as a post-script:

“This investigator-initiated study was aided by receipt of study drug and an unrestricted research grant for laboratory and investigational studies from Hospira Inc….Hospira Inc had no role in the design or conduct of the study; in the collection, analysis, and interpretation of the data; in the preparation, review, or approval of this manuscript; or in the publication strategy of the results of this study. These data are not being used to generate FDA label changes for this medication, but rather to advance the science of sedation, analgesia, and brain dysfunction in critically ill patients….”

Investigator-initiated....investigator-controlled design and publication, investigators as stewards of the data.....music to my ears.

But is dexmed going to be the new standard in critical care sedation? For that question, it would appear that it is too early for answers. I have the following observations:
• This study used higher doses of dexmed for longer durations than what the product labeling advises. Should practitioners use the doses studied or the approved doses? My very small experience with this drug so far at the labelled doses is that it is difficult to use in that it does not achieve adequate sedation in the most agitated patients - those receiveing the highest doses of benzos and narcotics, in whom lightenting of sedationl is assigned the highest priority.
• The most impressive primary endpoint achieved by the drug was days alive without delirium or coma, but most of it was driven by coma-free days. Perhaps this is not surprising given two aspects of the study's design
1. Patients did not have daily interruptions of sedative infusions, a difficult-to-employ, but evidence-based practice to reduce oversedation and coma
2. lorazepam was titrated upwards without boluses between dose increases. Given the long half-life of this drug, we would expect overshoot by the time steady state pharmacokinetics were achieved.
So is it surprising that patients in the dexmed group had fewer coma-free days?
• We are not told about the tracheostomy practices in this study. Getting a trach earlier may lead to both sedation reduction and improved mortality (See http://ccmjournal.org/pt/re/ccm/abstract.00003246-200408000-00009.htm;jsessionid=HlfG93Qfvb113sCpnD10053YzKqMB3zFfDTdbGvgCQPdlMZ3S8kV!1219373867!181195629!8091!-1?index=1&database=ppvovft&results=1&count=10&searchid=1&nav=search).
• We are not told the proportion of patients in each group who had withdrawal of support. Anecdotally, I have found that families have greater willingness to withdraw support for patients who are comatose, regardless of other underlying physiological variables or organ failures. Can the trend towards improved mortality with dexmed be attributed to differrences in willingness of families to WD support?
• In spite of substantial data that delirium is associated with mortality (http://jama.ama-assn.org/cgi/content/abstract/291/14/1753?maxtoshow=&HITS=10&hits=10&RESULTFORMAT=&fulltext=delirium&searchid=1&FIRSTINDEX=0&resourcetype=HWCIT ), and these data showing that there is a TREND towards fewer delirium-free days with dexmed, the hypothesis that dexmed improves mortality via improvement in delirium is one that can only be tested by a study with mortality as a primary endpoint.
The data from the current study are compelling, and Ely and investigators are to be commended for the important research they are doing (this article is only the tip of that iceberg of research). However, it remains to be seen if one sedative compared to others can lead to improvements in mortality or more rapid recovery from critical illness, or whether limitation of sedation in general with whatever agent is used is primarily responsible for improved outcomes.

Wednesday, December 12, 2007

ENHANCE trial faces congressional scrutiny

Merck and Shering-Plough had better get their houses in order. Congress is on the case:

http://www.nytimes.com/2007/12/12/business/12zetia.html?_r=1&oref=slogin

Apparently, representatives of the US populus, which pays for a substantial portion of the Zetia sold, are not pleased by the delays in release of the data from the ENHANCE trial. The chicanery is going to be harder to sustain.

I certainly hope for everyone's sake (especially patients') that there is no foul play afoot with this trial or ezetimibe - Merck can hardly withstand another round of Vioxx-type suits, can it? Or can it. Merck's stock price (MRK: http://finance.yahoo.com/q/bc?s=MRK&t=5y&l=on&z=m&q=l&c=) is at the same level as it was in Jan, 2004. Some high price to pay for obfuscating the truth, concealing evidence of harm, bilking insurers and the American public and government for billions of $$$ for a prescription painkiller when equivalent non-branded products were available, and causing thousands of heart attacks in the process....

The consequences should be harsher the second time around.....

Type rest of the post here

Tuesday, December 11, 2007

Pronovost, Checklists, and Putting Evidence into Practice

In this week's New Yorker:
http://www.newyorker.com/reporting/2007/12/10/071210fa_fact_gawande
Atul Gawande, a popular physician writer who may be familiar to readers from his columns in the NEJM and the NYT, chronicles the hurculean efforts by Peter Pronovost, MD, PhD at Johns Hopkins Hospital to make sure that the mundane but effective does not always take back seat to the heroic but largely symbolic efforts of critical care doctors.

One of my chronic laments is that evidence is not utilized and that physician efforts do not appear to be rationally apportioned to what counts most. There appears to be too much emphasis on developing evidence and too little emphasis on making sure it is expeditiously adopted and employed; to much emphasis on diagnosis, too little emphasis on evidence-based treatment; too much focus on the "rule of rescue" too little focus on the "power of prevention". Pronovost has demonstrated that simple checklists can have bountiful yields in terms of teamwork, prevention, and delivery of effective care - then why aren't we all familiar with his work? Why doesn't every ICU use his checklists?

My own experience at the Ohio State University Medical Center is emblematic of the challenges of getting an unglamorous thing like a checklist accepted as a routine part of clinical practice in the ICU. In spite of evidence supporting it, its obvious rational basis, widespread recognition that we often miss things if we aren't rigorous and systematic, adopting an adapted version of Pronovost's checklist at OSUMC has proven challenging (albeit possible). As local champion of a checklist that I largely plagarized from Pronovost's original, I have been told by colleagues that it is "cumbersome", but RNs that it is "superfluous", by fellows that it is a "pain", by people of all disciplines that they "don't seen the point" and have been frustrated that when I do not personally assure that it is being done daily (by woaking through the ICU and checking), that it is abandoned as yet another "chore", another piece of bureaucratic red tape that hampers the delivery of more important "patient-centered" care - such as procudures and ordering of tests.

All of these criticisms are delivered despite my admonition that the checklist, like a fishing expedition, is not expected to yield a "catch" on every cast, but that if it is cast enough, things will be caught that would otherwise be missed; desipte my reminder that it is an opportunity to improve our communication with our multi-disciplinary ICU team (and to learn the names of its constituents); despite producing evidence of its benefit and evidence of underutilization of evidence-based therapies which the checklist reminds practitioners to consider. If I were not personally committed to making sure that the checklist is photocopied/available and consistently filled out (by our fellows, who deserve great credit for filling it out), it would quicly fall by the wayside, another relic of a well-meaning effort to encourage concsientiousness through bureaucracy and busy-work (think HIPPA here -the intent is noble, but the practical result an abject failure).

So what is the solution? How are we to increase acceptance of Pronovost's checklist and recognition of its utility and its necessity? It could be through fiat, through education, through a variety of means. But it appears that it has survived at Hopkins because of Pronovost's ongoing efforts to promote it and extol its benefits and its virtues and to get "buy-in" from other stake-holders: RNs, patients, adminitrators, the public, and other physicians. This is not an easy task - but then again, rarely is anything that is worth it. Hopefully other champions of this and other unglamorous innovations will continue to advocate for mundane but effective interventions to improve communication among members of multidisciplinary healthcare teams, the utilzation of evidence-based therapies, and outcomes for patients.