What’s The Score On Surgeon Scorecards?

— Public reporting of surgeon performance remains a "work in progress"

by Hilda Bastian October 2, 2015

Last Updated October 19, 2015

The theory behind a public scorecard on surgeons is simple: Knowing surgeons’ performance scores could improve choice of surgeon and get lower scoring ones to up their game (or hang up their scalpels).

But for a scorecard to improve patient outcomes, there are several hoops it has to jump through. First, it has to be reliable. Then referrers, patients, or influential third parties have to use it. And surgeons have to respond in the way people would like. So what are the odds that ProPublica’s new Surgeon Scorecard can score all that?

Let’s start with the question of reliability. The Surgeon Scorecard is a searchable database billed as “death and complication rates” of 17,000 U.S. surgeons for several procedures. More accurately, though, it’s the rate of deaths and hospital readmissions known to be associated with complications. That’s based on administrative data, not clinical review.

It’s only Medicare data, not necessarily the surgeon’s whole practice — and only a few particular procedures, too. The data isn’t all that timely: (getting close to two years). And since there is too little data to calculate meaningful rates for so many of the surgeons, realistically, it’s not 17,000 surgeons either.

A lot has been said about those aspects of the data. For critical takes on what these limitations mean, and are good places to start. And for more positive takes on what ProPublica aims to achieve, try and . Mark Friedberg and colleagues at the methodology, and say the ProPublica Scorecard shouldn’t be considered a valid or reliable predictor of health outcomes.

But let’s consider a consumer perspective on its usability, and the likelihood of success of the ProPublica Scorecard based on what we know from research on other attempts.

Whether public report cards are understandable and usable to consumers might have an impact on who uses them and whether they have the effect intended. It’s offered for why these interventions typically have little uptake. And the communication part is where the producers have the best chance of achieving the optimum.

The technical presentation of this scorecard is outstanding. But it gets a very low score from me on communication for two reasons. The first is because the language and graphical display are often misleading. (I’ve uploaded of the elements of the display I’m talking about.)

It’s misleading because, for example, within the scorecard it says it’s showing all the times a surgeon has done the procedure and rates of complications. These problems are even there when you click the “Learn more” link from the Scorecard.

The data is displayed with , given how fuzzy the data actually is. There are clear lines between low, medium, and high. The data are given to a decimal point — not about 5%, but precisely 5.5%. The high score is a way longer bar than the low or medium scores, and medium may not mean what most of us would think it means. (I’ve written about the risks of misleading graphics .) There are confidence intervals — but you won’t see them unless you click at a particular surgeon’s score.

The second reason I give it a low consumer usability score is low understandability. Key explanations are too hard for a tool intended for the general public. “95% Conf. Interval” is not a widely understood concept — and even if you know to hover over the words so an explanation pops up (or do so accidentally), you still won’t be likely to get an accurate understanding. The rest of the language used doesn’t calibrate with the level of uncertainty anyway.

Nor is the confidence interval explained at all in the about how the data come about. That post is daunting. I clocked the readability level at a of nearly 17: that’s roughly equivalent to 17 years of education — not just College level, but post-graduate level.

Which bring us to the question, will it be used — and if so, by whom and how? The launch of the Scorecard got a lot of attention. Within a couple of weeks of being online, that there had been 1.3 million views. But that doesn’t mean it will be widely used.

Some things build in interest over time. Public reporting of health performance data tends to do the reverse, though. There’s a of the few strong comparative studies of public reporting systems from 2011. In those studies, public reporting, including of individual practitioners’ results, didn’t have much impact. And when it did, the impact wore off after a couple of months. (Since that review was done, a comparing states with and without public reporting of percutaneous coronary intervention (PCI) found reduced use of the procedure and increased mortality in public reporting states.)

A found no strong intervention studies of what happens when you give a patient considering elective surgery individual surgeons’ performance data. But the authors said what evidence there is doesn’t suggest that public report cards shift the numbers of patients between surgeons.

That is in part at least because the overwhelming majority of referrers and patients are either unaware of, or uninterested in, surgeon report cards. After 20 years of in New York state, cardiologists “made little use of this information and rarely discussed it with their patients.”

Primary care practitioners might be even less interested in . The same seems to be so for most patients. A of surgical patients in the U.S. found that only 11% knew of available surgeon report data, and of those, very few used it to make their selection. In of surgical patients in the Netherlands last year and a in the U.S., the number who looked for information like this was 12-13%.

That seems to be true of choosing practitioners generally. As a concluded: “Comparative information seems to have a relatively limited influence on the choices made by many patients.” People weigh a variety of issues when they make these choices, and others’ assumptions about what matters to patients tend to be over-simplified.

If you’re interested in understanding more about this complexity in surgery, I found these studies particularly valuable:

A with very high response rates from representative patients and physicians in fertility care;
Two studies on choices in joint replacement, one published in and another in ; and
One on the views of older people on options.

Immersing myself in this literature left me with two very striking impressions. Surgical patients and most of those in the performance data reporting world are moving in very different orbits. For all it’s meant to be about consumer rights, this is not a patient-centered field.

Secondly, the principal net effect of these schemes so far might have been to increase inequities. That’s not just because the 10% of patients who might use them are particularly advantaged. There’s enough reason to be very concerned about increasing harm to the already disadvantaged.

The section on unintended clinical outcome consequences of public reporting schemes in a broad is chastening. A that found signs of racial profiling in the first years after introduction of coronary artery bypass graft (CABG) report cards in New York starkly underlines the risks of introducing well-intentioned interventions that haven’t been shown to benefit.

Writing critically about the 2015 Surgeon Scorecard in the , Lisa Rosenbaum said: “ProPublica has…migrated from the realm of data journalism to scientific analysis,” and that required a different approach. But I’d go further.

It’s not just that this moves past data journalism into science. It’s moved into implementing an intervention whose target is clinical outcomes. And it’s an intervention that doesn’t have a good track record. Once you’re in that territory, at this kind of level, then there’s an onus on those intervening to demonstrate that they’re doing more good than harm. If ProPublica has a plan for doing that, they haven’t shared it. In their they concentrated on a single anecdote — and that’s something they wouldn’t find acceptable if, say, a drug company did it.

: “So we began with the view that the taxpayers who pay the costs of Medicare should be able to use its data to make the best possible decisions about their healthcare.” I agree with this principle — if the data can help make the best possible decisions. I was a health consumer advocate for a couple of decades — and chairperson for years of a national health consumer rights task force. The consumer right aspect is not something I take at all lightly. But…. the risk of harm here is not evenly socially distributed. The issue of is always a juggling act: we have the right to information and choice, but also the right to safety. I think the public reporting of surgeon performance has yet to clamber over the “first, do no harm” hurdle.

(Ed: for a more favorable view of the scorecard — from a surgeon — click here. And for an exchange between the author of that piece and Bastian, click here.)

is a senior clinical research scientist. She works at the National Institutes of Health as editor for the clinical effectiveness resource PubMed Health and as , PubMed’s scientific publication commenting system. She is an academic editor at PLOS Medicine, and blogs for PLOS () as well as on a . The thoughts Hilda Bastian expresses here are personal, and do not necessarily reflect the views of the National Institutes of Health or the U.S. Department of Health and Human Services. The cartoon in this post is her own (): more at Statistically Funny.

51���˶���

What’s The Score On Surgeon Scorecards?

— Public reporting of surgeon performance remains a "work in progress"

51��˶��