Since the first documented case of pancreatic cancer emerged in the 18th century, researchers have embarked on a complex journey to unravel the mysteries of this lethal disease. Early detection remains the most effective strategy for combating cancer; however, identifying pancreatic cancer early can be particularly challenging due to the pancreas’s deep-seated location in the abdomen.
To tackle this challenge, scientists from the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) and Limor Appelbaum, a staff scientist in the Department of Radiation Oncology at Beth Israel Deaconess Medical Center (BIDMC), collaborated to pinpoint potential high-risk patients. Their goal was to create two machine-learning models aimed at early detection of pancreatic ductal adenocarcinoma (PDAC), which is the most prevalent type of pancreatic cancer. By partnering with a federated network that aggregated electronic health records (EHR) from multiple American institutions, the team accessed a broad and diverse dataset. This extensive information base bolstered the models’ reliability and applicability across various demographic groups and geographical areas.
The two innovative models, known as the “PRISM” neural network and a logistic regression model, demonstrated superior performance compared to existing methods. Analysis revealed that while current screening criteria manage to identify approximately 10 percent of PDAC cases using a threshold five times higher than what is recommended, the PRISM model successfully detects 35 percent of PDAC cases at the same threshold.
It’s worth noting that the application of artificial intelligence in cancer risk detection is not a new venture—algorithms are already used to analyze mammograms, lung cancer CT scans, and even HPV tests. “The PRISM models distinguish themselves by being developed and validated on an expansive dataset of over 5 million patients, which surpasses the scale of most prior research in this area,” explains Kai Jia, an MIT PhD candidate in electrical engineering and computer science (EECS) and the lead author of an open-access paper published in eBioMedicine detailing this research. “By utilizing standard clinical and laboratory data for risk predictions, we’ve made significant strides by tapping into the diversity of the U.S. population—something many other PDAC models lack, as they often rely on data from specific healthcare systems. Furthermore, employing a unique regularization technique during training has enriched the models’ generalizability and interpretability.”
“This report outlines a powerful approach to harnessing big data and artificial intelligence algorithms to enhance cancer risk identification,” asserts David Avigan, a Harvard Medical School professor and the director of the cancer center at BIDMC, who did not participate directly in the study. “This methodology could lead to new strategies for pinpointing high-risk patients who might benefit from targeted screening and the possibility of early intervention.”
Prismatic Perspectives
The creation of the PRISM models began over six years ago, motivated by firsthand encounters with the shortcomings of traditional diagnostic practices. “Around 80–85 percent of pancreatic cancer patients receive diagnoses at advanced stages, wherein a cure is no longer feasible,” states Appelbaum, who is also an instructor at Harvard Medical School and a radiation oncologist. “This clinical challenge inspired us to explore the wealth of data hidden within electronic health records.”
The collaboration between the CSAIL team and Appelbaum facilitated a deeper understanding of the intersection between medical knowledge and machine learning, ultimately leading to a more accurate and transparent model. “The premise was that these health records concealed valuable clues—subtle signs that could serve as early warning indicators of pancreatic cancer,” she emphasizes. “This insight directed our approach to utilizing federated EHR networks, allowing for a scalable deployment of risk prediction tools within healthcare systems.”
Both the PrismNN and PrismLR models assess EHR information, including patient demographics, diagnoses, medications, and lab results, to evaluate PDAC risk. PrismNN leverages artificial neural networks to uncover complex patterns in data features like age, medical history, and lab results, culminating in a PDAC risk score. Conversely, PrismLR employs logistic regression to deliver a straightforward probability score based on these same factors. Together, these models provide a comprehensive evaluation of different methodologies in predicting PDAC risk using EHR data.
Crucially, to gain the trust of healthcare providers, the interpretability of the models is paramount. While logistic regression models are generally more transparent, advancements in deep neural networks have improved their explainability as well. This progress enabled the researchers to refine thousands of potentially predictive features from a single patient’s EHR down to about 85 essential indicators. These features, which include age, a diabetes diagnosis, and frequent doctor visits, are automatically identified by the models and align closely with physicians’ understanding of pancreatic cancer risk factors.
The Path Forward
While the promise of the PRISM models shines brightly, there are still aspects under development. Currently, the models rely solely on U.S. data, prompting the need for testing and adjustment for international applicability. The researchers aim to extend the model’s reach by incorporating international datasets and integrating additional biomarkers for more precise risk assessments.
“Our next goal is to facilitate the models’ implementation in routine healthcare practice,” Jia remarks. “We envision these models operating seamlessly within healthcare systems, analyzing patient data in the background and notifying physicians of high-risk cases without imposing additional workload. An integrated machine-learning model could empower healthcare providers with timely alerts concerning high-risk patients, potentially enabling interventions long before symptoms appear. We are enthusiastic about deploying our techniques in real-world settings to help people lead healthier, longer lives.”
Jia co-authored the paper alongside Appelbaum and Martin Rinard, an MIT EECS Professor and CSAIL Principal Investigator, who are both senior authors on this research. Their work was supported by various organizations, including the Defense Advanced Research Projects Agency, Boeing, and the National Science Foundation, among others. Additionally, TriNetX contributed resources for the project, with support also provided by the Prevent Cancer Foundation.
Photo credit & article inspired by: Massachusetts Institute of Technology