Ductal carcinoma in situ (DCIS) is a type of preinvasive tumor that sometimes progresses to a very lethal form of breast cancer. It accounts for about 25% of all breast cancer diagnoses.
Because it is difficult for clinicians to determine the type and stage of DCIS, DCIS patients are often overtreated. To address this, an interdisciplinary team of researchers from MIT and ETH Zurich has developed an AI model that can identify the different stages of DCIS from inexpensive and easily obtained breast tissue images. The model shows that the cell state and arrangement of the tissue sample are important in determining the stage of DCIS.
Because these tissue images are so easy to obtain, the researchers were able to build one of the largest datasets of its kind, using which to train and test their models. When they compared the predictions with the pathologists’ conclusions, they found clear agreement in many cases.
In the future, this model may serve as a tool to help clinicians streamline the diagnosis of simpler cases without having to perform labor-intensive work-ups, and may provide more time to evaluate cases where it is unclear whether DCIS will become invasive.
“We’ve taken the first steps toward understanding that when we’re diagnosing DCIS, we need to look at the spatial organization of the cells, and now we have a scalable technique. That’s where the prospective studies really need to go. Being able to work with hospitals to bring this to the clinic would be a major advance,” says Caroline Uhler, director of the Eric and Wendy Schmidt Center at the Broad Institute of MIT and Harvard, and a research scientist in MIT’s Laboratory for Information and Decision Systems (LIDS), a professor in the Department of Electrical Engineering and Computer Science (EECS) and the Institute for Data, Systems, and Society (IDSS).
Uhler, co-senior author of the paper on this research, is joined by lead author Xinyi Zhang, a graduate student at EECS and the Eric and Wendy Schmidt Center; co-senior author GV Shivashankar, professor of mechogenomics at ETH Zurich in collaboration with the Paul Scherrer Institute; and other researchers from MIT, ETH Zurich, and the University of Palermo in Italy. The open access study was published July 20. Nature Communications.
Combining AI and Imaging
Between 30 and 50 percent of patients with DCIS will develop advanced cancer, but researchers don’t know of any biological markers that can tell clinicians which tumors will progress.
Researchers can use techniques such as multiplex staining or single-cell RNA sequencing to determine the stage of DCIS in tissue samples, but these tests are too expensive to be widely available, Shivashankar explains.
In previous studies, these researchers showed that an inexpensive, imaginative technique called chromatin staining can provide as much information as the much more expensive single-cell RNA sequencing.
In this study, they hypothesized that combining a single-staining technique with a carefully designed machine learning model could provide the same level of cancer staging information as more expensive techniques.
First, they created a dataset containing 560 tissue sample images from 122 patients at three different stages of the disease. They used this dataset to train an AI model that learned a representation of the state of each cell in the tissue sample images, and used this to infer the stage of the patient’s cancer.
But not all cells are cancerous, so researchers had to collect the cells in a meaningful way.
They designed a model that identifies eight states that are important markers of DCIS by generating clusters of cells of similar states. Some cell states are more indicative of invasive cancer than others. The model determines the percentage of cells of each state in a tissue sample.
Organization is important
“But in cancer, cell organization also changes. We found that it’s not enough to just have the right ratio of cells in every condition. We also need to understand how the cells are organized,” says Shivashankar.
Based on these insights, they designed a model that takes into account the ratio and arrangement of cell states, which greatly improved accuracy.
“What was interesting for us was to see how important spatial organization is. Previous studies have shown that cells close to the mammary ducts are important, but it’s also important to consider which cells are close to which other cells,” Zhang says.
When they compared the model’s results to samples evaluated by pathologists, there was a clear match in many cases. In cases where it wasn’t so clear, the model was able to provide information about features of the tissue sample, such as the cell structure, that the pathologist could use to make decisions.
This versatile model could also be used for other types of cancer or even neurodegenerative diseases, which is one area researchers are currently exploring.
“We’ve shown that this simple staining can be very powerful when used with the right AI techniques. There’s still a lot of work to be done, but more studies should take cell organization into account,” Uhler said.
This research was supported by the Broad Institute’s Eric and Wendy Schmidt Center, ETH Zurich, the Paul Scherrer Institute, the Swiss National Science Foundation, the National Institutes of Health, the U.S. Naval Research Laboratory, the MIT Jameel Machine Learning and Health Clinic, the MIT-IBM Watson AI Lab, and a Simons Investigator Award.