Study Shows Machine Learning’s Potential to Predict Cancer Therapy-Related Cardiac Risk

Models developed with promising accuracy and generalizability to clinical practice

Machine learning-based approaches to risk assessment can be highly effective in predicting various types of cardiac dysfunction among cancer survivors who have received cardiotoxic cancer therapies. That’s the conclusion of a new retrospective longitudinal study from Cleveland Clinic investigators published in the Journal of the American Heart Association (2020;9:e019628).

Advertising Policy

Cleveland Clinic is a non-profit academic medical center. Advertising on our site helps support our mission. We do not endorse non-Cleveland Clinic products or services Policy

“The welcome recent improvements in cancer survival rates bring with them the challenge of increasing rates of cardiac dysfunction related to cardiotoxic cancer therapies,” says the study’s co-corresponding author, Patrick Collier, MD, PhD, Co-Director of Cleveland Clinic’s Cardio-Oncology Center. “We hypothesized that supervised machine learning models could reliably predict the risk of developing several cardiovascular outcomes in patients who had been treated for cancer, and we conducted this study to test that idea.”

Study design at a glance

He and colleagues from Cleveland Clinic’s Heart, Vascular & Thoracic Institute, Genomic Medicine Institute and Taussig Cancer Institute extracted clinical data for 4,309 cancer patients from 1997 to 2018 who had laboratory test and echocardiographic results in Cleveland Clinic’s electronic health record database. They developed and evaluated machine learning models to aid in risk assessment of six forms of cancer therapy-related cardiac dysfunction (CTRCD) of interest:

  • Heart failure
  • Atrial fibrillation
  • Coronary artery disease
  • Myocardial infarction
  • Stroke
  • De novo CTRCD

Models were built for each of these six outcomes based on systematic testing of five different machine learning algorithmic classification methods (see figure below) and three feature sets — laboratory tests only, echocardiography only, and both lab tests and echocardiography. The models’ predictive performance was evaluated in terms of two statistical metrics: area under the receiver operating characteristic curve (AUROC) and area under the precision recall curve (AUPR).

The models’ generalizability was confirmed using a time-based data split strategy in which models used for predicting new patients are built on past data. Specifically, patients receiving cancer therapies before 2017 served as the data training set, and those receiving therapies from Jan. 1, 2017, onward served as the test set.

The final models were inspected to identify clinically relevant variables that were associated with CTRCDs. The study design is summarized in the figure below.

Advertising Policy

Figure. Overview of the study design. Cardiovascular echocardiographic and laboratory testing variables were integrated from over 4,300 longitudinal cancer patients for the prediction of six outcomes: heart failure (HF), atrial fibrillation (AF), coronary artery disease (CAD), myocardial infarction (MI), stroke and de novo cancer therapy-related cardiac dysfunction (CTRCD). Five classification methods were systematically tested: k-nearest neighbors (k-NN), logistic regression (LR), support vector machine (SVM), random forest (RF) and gradient tree boosting (GB). Feature sets were tested as follows: laboratory test variables only, echocardiographic variables only, and lab test and echocardiographic variables combined. Reprinted from Zhou et al., J Am Heart Assoc. 2020;9:e019628. ©2020 The Authors. Reprinted under Creative Common Attribution-NonCommercial License.

Results in brief

Of the 4,309 cancer patients studied, 93% were treated with chemotherapy and 46% with radiation. Among the overall cohort, 1,560 patients (36%) were diagnosed with at least one of the six CTRCDs; 722 of these patients had preexisting cardiac disease before cancer therapy, while 838 developed de novo CTRCD after cancer therapy.

Based on 100 model iterations, all models achieved relatively high or high AUROC values, ranging from 0.882 for heart failure to 0.660 for stroke. All AUPR values were at least two times as high as their baselines of random classifiers, demonstrating moderate to high predictive performance. The time-based data split strategy verified the real-world generalizability of the models for prediction of CTRCD for new patients, with high AUROC and AUPR performance for all six outcomes assessed.

Interrogation of the models revealed several clinically relevant variables to be significantly associated with CTRCDs, including age, hypertension, glucose level, left ventricular ejection fraction, creatinine level and aspartate aminotransferase level.

Analysis of model performances using different feature sets showed that combining both laboratory test and echocardiographic variables yielded the highest predictive performance.

Advertising Policy

Practice implications, and what comes next

“This study represents the first reported large-scale use of a machine learning-based approach for evaluating complications from cancer therapies that can contribute to cardiovascular disease,” observes Dr. Collier. “We have shown that machine learning models can be developed with good accuracy and generalizability for predicting six types of cancer therapy-related cardiac dysfunction, and we have identified and validated several clinically relevant variables associated with these types of dysfunction. These models promise utility as much-needed tools for assessing risk of cardiac dysfunction related to cancer therapy in cardio-oncology practice.”

Moreover, a clear upside of machine learning approaches is that, by nature, they improve over time. “As additional longitudinal clinical data are accumulated for cancer survivors, machine learning can use these data to build and refine predictive models to guide clinical decision-making,” says the study’s other co-corresponding author, Feixiong Cheng, PhD, a researcher in Cleveland Clinic’s Genomic Medicine Institute with specialty interests in machine learning and cardio-oncology. 

In fact, the study authors note that they will continue to improve the models they developed for the present study as more data are gathered. “We also are now incorporating imaging data directly into convolutional neural networks to further enhance the performance of our machine learning models,” adds Dr. Cheng. “As a next step, we are working to develop new risk calculators that integrate our models into Cleveland Clinic’s electronic health record system to help provide cardiovascular care for cancer patients.”

“The findings from this study underscore the promise that machine learning methods hold for cardiac risk assessment for individuals before, during and after cancer treatment,” concludes Dr. Collier.