Saturday, March 14, 2009

"Statistical Slop": What billiards can teach us about multiple comparisons and the need to assign primary endpoints

Anyone who has played pool knows that you have to call your shots before you make them. This rule is intended to decrease probability of "getting lucky" from just hitting the cue ball as hard as you can, expecting that the more it bounces around the table, the more likely it is that one of your many balls will fall through chance alone. Sinking a ball without first calling it is referred to coloquially as "slop" or a "slop shot".

The underlying logic is that you know best which shot you're MOST likely to successfully make, so not only does that increase the prior probability of a skilled versus a lucky shot (especially if it is a complex shot, such as one "off the rail"), but also it effectively reduces the number of chances the cue ball has to sink one of your balls without you losing your turn. It reduces those multiple chances to one single chance.

Likewise, a clinical trialist must focus on one "primary outcome" for two reasons: 1.) because preliminary data, if available, background knowledge, and logic will allow him to select the variable with the highest "pre-test probability" of causing the null hypothesis to be rejected, meaning that the post-test probability of the alternative hypothesis is enhanced; and 2.) because it reduces the probaility to find "significant" associations among multiple variables through chance alone. Today I came across a cute little experiment that drives this point home quite well. The abstract can be found here on pubmed: http://www.ncbi.nlm.nih.gov/pubmed/16895820?ordinalpos=4&itool=EntrezSystem2.PEntrez.Pubmed.Pubmed_ResultsPanel.Pubmed_DefaultReportPanel.Pubmed_RVDocSum .


In it, the authors describe "dredging" a Canadian database and looking for correlations between astrological signs and various diagnoses. Significant associations were found between the Leo sign and gastrointestinal hemorrhage, and the Saggitarius sign and humerous fracture. With this "analogy of extremes" as I like to call them, you can clearly see how the failure to define a prospective primary endpoint can lead to statistical slop. (Nobody would have been able to predict a priori that it would be THOSE two diagnoses associated with THOSE two signs!) Failure to PROSPECTIVELY identify ONE primary endpoint led to multiple chances for chance associations. Moreover, because there were no preliminary data upon which to base a primary hypothesis, the prior probability of any given alternative hypothesis is markedly reduced, and thus the posterior probability of the alternative hypothesis remains low IN SPITE OF the statistically significant result.

It is for this very reason that "positive" or significant associations among non-primary endpoint variables in clinical trials are considered "hypothesis generating" rather than hypothesis confirming. Requiring additional studies of these associations as primary endpoints is like telling your slop shot partner in the pool hall "that's great, but I need to see you do that double rail shot again to believe that it's skill rather than luck."

Reproducibility of results is indeed the hallmark of good science.

Tuesday, March 10, 2009

PCI versus CABG - Superiority is in the heart of the angina sufferer

In the current issue of the NEJM, Serruys et al describe the results of a multicenter RCT comparing PCI with CABG for severe coronary artery disease: http://content.nejm.org/cgi/content/full/360/10/961. The trial, which was designed by the [profiteering] makers of drug-coated stents, was a non-inferiority trial intended to show the non-inferiority (NOT the equivalence) of PCI (new treatment) to CABG (standard treatment). Alas, the authors appear to misunderstand the design and reporting of non-inferiority trials, and mistakenly declare CABG as superior to PCI as a result of this study. This error will be the subject of a forthcoming letter to the editor of the NEJM.

The findings of the study can be summarized as follows: compared to PCI, CABG led to a 5.6% reduction in the combined endpoint of death from any cause, stroke, myocardial infarction, or repeat vascularization (P=0.002). The caveats regarding non-inferiority trials notwithstanding, there are other reasons to call into question the interpretation that CABG is superior to PCI, and I will enumerate some of these below.

1.) The study used a ONE-SIDED 95% confidence interval - shame, shame, shame. See: http://jama.ama-assn.org/cgi/content/abstract/295/10/1152 .
2.) Table 1 is conspicuous for the absence of cost data. The post-procedural hospital stay was 6 days longer for CABG than PCI, and the procedural time was twice as long - both highly statistically and clinically significant. I recognize that it would be somewhat specious to provide means for cost because it was a multinational study and there would likely be substantial dispersion of cost among countries, but it seems like neglecting the data altogether is a glaring omission of a very important variable if we are to rationally compare these two procedures.
3.) Numbers needed to treat are mentioned in the text for variables such as death and myocardial infarction that were not individually statistically significant. This is misleading. The significance of the composite endpoint does not allow one to infer that the individual components are significant (they were not) and I don't think it's conventional to report NNTs for non-significant outcomes.
4.) Table 2 lists significant deficencies and discrepancies between pharmocological medical management at discharge which are inadequately explained as mentioned by the editorialist.
5.) Table 2 also demonstrates a five-fold increase in amiodarone use and a three-fold increase in warfarin use at discharge among patients in the CABG group. I infer this to represent an increase in the rate of atrial fibrillation in the CABG patients, but because the rates are not reported, I am kept wondering.
6.) Neurocognitive functioning and the incidence of defecits (if measured), known complications of bypass, are not reported.
7.) It is mentioned in the discussion that after consent, more patients randomized to CABG compared to PCI withdrew consent, a tacit admission of the wariness of patients to submit to this more invasive procedure.

In all, what this trial does for me is to remind me to be wary of an overly-simplistic interpretation of complex data and a tendency toward dichotimous thinking - superior versus inferior, good versus bad, etc.

One interpretation of the data is that a 3.4 hour bypass surgery and 9 days in the hospital !MIGHT! save you from an extra 1.7 hour PCI and another 3 days in the hospital on top of your initial committment of 1.7 hours of PCI and 3 days in the hospital if you wind up requiring revascularization, the primary [only] driver of the composite endpoint. And in payment for this dubiously useful exchange, you must submit to a ~2% increase in the risk of stroke, have a cracked chest, risk surgical wound infection (rate of which is also not reported) pay an unknown (but probably large) increased financial cost, risk some probably large increased risk of atrial fibrillation and therefore be discharged on amiodarone and coumadin with their high rates of side effects and drug-drug interactions, while coincidentally risk being discharged on inadequate medical pharmacological management.

Looked at from this perspective, one sees that beauty is truly in the eye of the beholder.

Monday, March 9, 2009

Money talks and Chantix (varenicline) walks - the role of financial incentives in inducing healthful behavior

I usually try to keep the posts current, but I missed a WONDERFUL article a few weeks ago in the NEJM, one that is pivotal in its own right, but especially in the context of good decision making about therapeutic choices and opportunity costs.

The article, by Volpp et all entitled: A Randomized, Controlled Trial of Financial Incentives for Smoking Cessation can be found here: http://content.nejm.org/cgi/content/abstract/360/7/699
In summary, smokers at a large US company, where a smoking cessation program existed before the research began were randomized to receive additional information about the program, versus the same information plus a financial incentive of up to $750 for successfully stopping smoking. At 9-12 months, smoking cessation was 10% higher in the financial incentive group (14.7% vs. 5.0%, P<0.001).

In the 2006 JAMA article on varenicline (Chantix) by Gonzales et al (http://jama.ama-assn.org/cgi/reprint/296/1/47.pdf ), the cessation rates at weeks 9-52 were 8.4% for placebo and 21.9% for varenicline, an absolute gain of 13.5%. (Similar results were reported in the study by Jorenby et al: http://jama.ama-assn.org/cgi/content/abstract/296/1/56?maxtoshow=&HITS=10&hits=10&RESULTFORMAT=&fulltext=varenicline&searchid=1&FIRSTINDEX=0&resourcetype=HWCIT ) Now, given that this branded pharmaceutical sells for ~$120 for a 30 day supply, and that, based on the article by Tonstad (http://jama.ama-assn.org/cgi/reprint/296/1/64.pdf ), many patients are continued on varenicline for 24 weeks or more, the cost of a course of treatment with the drug is approximately $720, just about the same as the financial incentives used in the index article.

And all of this begs the question: Is it better to pay $750 for 6 months of treatment with a drug that has [potentially serious] side effects to achieve ~13% reduction in smoking, or to pay patients to quit smoking to achieve a 10% reduction in smoking without harmful side effects and in fact with POSITIVE side effects (money to spend on pleasurable alternatives to smoking or other necessities)?

The choice is clear to me, and, having failed Chantix, I now consider whether I should offer my brother payment to quit smoking. (I expect to receive a call as soon as he reads this, especially since I haven't mentioned the cotinine tests yet.)

And all of this begs the more important question of why we seek drugs to solve behavioral problems, when good old fashioned greenbacks will do the trick just fine. Why bother with Meridia and Rimonabant and all the other weight loss drugs when we might be able to pay people to lose weight? (See: http://jama.ama-assn.org/cgi/content/abstract/300/22/2631 .) Perhaps one part of Obama's stimulus bill can allocate funds to additional such an experiments, or better yet, to such a social program.

One answer to this question is that the financial incentive to study financial incentives is not as great as the financial incentive to find another profitable pill to treat social ills. (There is after all a "pipeline deficiency" in a number of Big Pharma companies that has led to several mergers and proposed mergers, such as the announcement today of a possible merger of MRK and SGP, two of my personal favorites.) Yet this study sets the stage for more such research. If we are going to pay one way or another, I for one would rather that we be paying people to volitionally change their behavior, rather than paying via third party to reinforce the notion that there is "a pill for everything". As Ben Franklin said, "S/He is the best physician who knows the worthlessness of the most medicines."

Wednesday, March 4, 2009

The Normailzation Heuristic: how an untested hypothesis may misguide medical decisions

Here is an article that may be of interest written by two perspicacious young fellows:
http://www.sciencedirect.com/science?_ob=ArticleURL&_udi=B6WN2-4VP175C-1&_user=10&_rdoc=1&_fmt=&_orig=search&_sort=d&view=c&_acct=C000050221&_version=1&_urlVersion=0&_userid=10&md5=0067dfb6094ecc27303ccd6939257200


In this article, we describe how the general clinical hypothesis that "normalizing" abnormal laboratory values and physiological parameters will improve patient outcomes is unreliably accurate, and use historical examples of practices such as hormone replacement therapy, and the CAST trial to buttress this argument. We further suggest that many ongoing practices that rely on normalizing values should be called into question because the normalization hypothesis is a fragile one. We also operationally define the "normalization heuristic" and define four general ways in which it can fail clinical decision makers. Lastly, we make suggestions for empirical testing of existence of this heuristic and caution clinicians and medical educators to be wary of reliance on the normalization hypothesis and the normalization heuristic. This paper is an expansion of the idea of the normalization heuristic that was mentioned previously on this blog.