Two decades ago, IBM's "Deep Blue" chess-playing algorithm defeated the reigning world champion, Garry Kasparov. Now, something similar appears to have happened in pathology: a machine-learning algorithm outperformed human experts in detecting breast cancer metastasis within sentinel lymph nodes. In this Deep Dive, F. Perry Wilson, MD, examines the data and what makes this machine-learning study so compelling compared to others we've seen recently.
Increasing levels of automation in every industry threaten jobs, but so far physicians have felt relatively comfortable. We'll never be replaced by algorithms, right? But this study should get pathologists, at least, a bit worried.
It is the best demonstration to date of how machine-learning is going to transform medical imaging.
The images we're talking about here are sentinel lymph node slides.
Take a look – in that little green box is a tiny area of metastatic breast cancer. Pathologists miss these from time to time. After all, they are only human.
For the first time, computers have done better.
Researchers sponsored a worldwide competition to develop an algorithm that would identify breast cancer cells on scanned lymph node slides.
Teams that signed up were sent 270 slides, 110 with nodal mets, and 160 without that had been painstakingly hand-labeled to show the computers where the diseased cells were.
After learning from that data, the algorithms were then unleashed on 129 brand new unlabeled slides. The winner was the algorithm that got the most slides right.
But let's start with the humans. Eleven trained pathologists were given 2 hours to look at the 129 test slides – a workflow that is pretty standard, I am told. Of the 49 test slides with metastatic disease, the pathologists found 31 on average. That's an important false negative rate. One pathologist was allowed to work without time constraints, unrealistic as that is -- he or she correctly identified 46 out of 49 slides with cancer and 79 out of 80 without.
Thirty-two machine-learning algorithms competed; the best came from a Harvard-MIT collaboration. The performance of this algorithm on the test images was nearly perfect, identifying cancer and non-cancerous slides with almost 100% accuracy, and highlighting the areas of concern like this.
This is pretty impressive, but there's something really special about this study which has me excited. In most of these image classification tasks, the gold-standard is human perception. Some human expert, or group of them, look at a slide or x-ray or retina image or something and say "yes, this is pulmonary edema." I am always left wondering, like, well, okay, but how can we ever beat humans if humans are the gold-standard?
In this study, the gold standard was immunohistochemical staining – staining neither the human pathologists NOR the machine algorithms had access to.
In other words, these algorithms were better than humans when held to a completely objective gold standard. That's pretty amazing.
Now pathologists shouldn't be hunting for new jobs quite yet. This was a small study, using slides from only two centers. I wish the researchers had thrown a third center into the test set – would different staining practices have thrown off the computer algorithms perhaps? Also the pathologists mostly missed micrometastases - areas of less than 2mm. With modern breast cancer therapy, it's not clear that missing such small areas would actually have a significant clinical impact.
With all the hype surrounding machine learning, it's easy to think it's just a fad. It's not. Mark my words, studies like this will redefine medical imaging in the near future. And if you don't believe me, ask your local area network.
, is an assistant professor of medicine at the Yale School of Medicine. He is a 51˶ reviewer, and in addition to his video analyses, he authors a blog, . You can follow .