Machine Learning Model Predicts Risk of Gastric Cancer

Gastric cancer is relatively uncommon, but it is associated with poor survival and outcome differences across patient groups. Although not enough people are affected by gastric cancer in the U.S. to make routine screening practical, early detection significantly improves survival. As a result, being able to identify high-risk patients who would benefit from early screening could dramatically improve survival rates while simultaneously ensuring that screening resources are being used efficiently

Cleveland Clinic is a non-profit academic medical center. Advertising on our site helps support our mission. We do not endorse non-Cleveland Clinic products or services. Policy

New research from Cleveland Clinic describes how a machine learning model could help clinicians better identify these high-risk patients. The model, which used data from electronic health records (EHR), predicted patient risk of gastric cancer with surprising accuracy, says senior author, Michelle Kang Kim, MD, PhD, Chair of the Gastroenterology, Hepatology & Nutrition Department at Cleveland Clinic.

A need for better screening

In the United States, gastric cancer is only assessed for if the patient is having symptoms like abdominal pain or vomiting blood. The disease typically has poor outcomes with an overall 30% survival rate. However, in Asia, where the condition is more common, gastric cancer is screened for routinely, and survival rates are much higher, notes Dr. Kim.

“When you catch it early, the 5-year survival is as high as 95%,” she says, “so there’s real value to screening and detecting this cancer early when we can.”

Study design

With the study, the research group wanted to know if they could use EHR information to identify patients who would benefit from screening. They used machine learning to develop a model that could predict which patients were most at risk.

The strongest model achieved an Area Under the Receiver Operating Characteristic Curve (AUC ROC) score of 0.78. More than 11,000 Ohio-based patients were included in the development set, including 567 who had gastric cancer. Notably, the model did not include Helicobacter pylori (even though it is a known risk factor) because this data was missing as patients are not tested for it routinely.

“To our surprise, the model performed pretty well, even without such an important variable,” says Dr. Kim.

Even more surprising, researchers were able to validate the model using data from Cleveland Clinic patients in Florida.

“We were thrilled with that result because you’re not often able to see a model that has equal performance in a different population from where it was developed,” she says.

Looking ahead

Dr. Kim notes that the model was based only on “relatively crude” demographic data like age, sex and race as well as clinical variables like medical history and laboratory values.

“This demonstrated that relatively simple variables can still give you a reasonably performing model, and I think that’s something that deserves further exploration,” she explains. “In this model, those factors such as age were actually some of the most important ones.”

The group plans to continue refining the model, including working with other medical centers in different regions to expand the patient population.

“Ultimately, one of our biggest takeaways from the study is that this approach is translatable, and similar models could help improve screening for other diseases,” says Dr. Kim. “Our model shows how clinical data can guide screening recommendations for high-risk groups. If we can do this with gastric cancer, we can take this same approach with other relatively uncommon diseases.”

Dr. Kim presented the project, “Development and External Validation of a Machine Learning-Based Gastric Cancer Prediction Model using Electronic Health Record Data,” at the 2024 American College of Gastroenterology Annual Scientific Meeting.