How Data Science Is Shaping Cardiothoracic Surgery

Its roots run deep, and its reach keeps growing


The uses of data science in cardiothoracic surgery seem to know no bounds:


Cleveland Clinic is a non-profit academic medical center. Advertising on our site helps support our mission. We do not endorse non-Cleveland Clinic products or services. Policy

  • Creation of comparative prediction models of survival with various surgical interventions for ischemic cardiomyopathy
  • Continuously updated estimates of mortality risk for patients on the heart transplant waitlist
  • Selection of variables for esophageal cancer staging

These wide-ranging examples are just from Cleveland Clinic alone, and they’re only a sampling of the applications of data science achieved by the multidisciplinary clinical research team directed by cardiac surgeon Eugene Blackstone, MD, Head of Clinical Investigations in Cleveland Clinic’s Miller Family Heart & Vascular Institute.

Seeds planted decades ago

Yet they all can be traced back to a handful of consequential developments several decades ago.

First was the launch of Cleveland Clinic’s Cardiovascular Information Registry, established in 1971 to collect data on every cardiovascular surgery patient at the institution. The result is the oldest and one of the largest computerized databanks of cardiovascular information, which has served ever since as the foundation of Cleveland Clinic’s pioneering cardiovascular surgery outcomes research. For instance, the registry was instrumental in confirming the internal thoracic artery as the conduit of choice for coronary bypass grafting, as reported by Loop and colleagues in a landmark 1986 publication.

Next was Dr. Blackstone’s recruitment to Cleveland Clinic in 1997 as Head of Clinical Research in the Department of Thoracic and Cardiovascular Surgery. He arrived after many years at the University of Alabama at Birmingham, where he collaborated with John Kirklin, MD, a pioneer of data-driven cardiothoracic surgery research and practice who helped put the subspecialty in the vanguard of clinical outcomes analysis. Dr. Blackstone brought extensive statistical expertise and a passion for novel mathematical models and algorithmic approaches, which he was eager to apply to Cleveland Clinic’s treasure trove of cardiovascular data.

Another key development was the start of a long-standing collaboration between Cleveland Clinic and IBM to construct an optimal medical record based on ontology, or the metaphysical concept of grouping things according to similarities and differences. “The aim was to capture information on patients in terms of values or variables, rather than as long stories, to promote interoperability and use anywhere in the world, regardless of language,” explains Dr. Blackstone. IBM’s efforts in this realm began at the University of Alabama in 1993 when Dr. Blackstone was still working there, but they continued at an accelerated pace starting in 1997 when Dr. Blackstone was able to help pair IBM’s data capabilities with Cleveland Clinic’s Cardiovascular Information Registry.

“Cleveland Clinic was one of only perhaps five medical centers in the world with a serious interest in data science dating back to the early 1970s,” he says. “This institution was ahead of the curve in carving out a role like mine and consistently supporting it for so many years.”

The wide reach of data science in CT surgery

That support has enabled Dr. Blackstone, who is also staff in Cleveland Clinic’s Department of Quantitative Health Sciences, to assemble a team of statisticians, computer scientists, mathematicians and other data experts to support his cardiothoracic surgery colleagues in a wealth of research and outcomes endeavors. They draw on methodologies ranging from machine learning to data management to artificial intelligence to help surgeons improve clinical decision-making and assess appropriateness of care through long-term follow-up. Below is a sampling of just a few projects the group has undertaken in recent years to shape practice.

Prediction models for decision support in surgical management of ischemic cardiomyopathy. About a decade ago, Dr. Blackstone’s team worked with clinical colleagues to develop and validate comparative prediction models of survival following four different surgical intervention strategies for ischemic cardiomyopathy (J Thorac Cardiovasc Surg. 2010;139:283-293). Using a robust, nonparametric algorithmic method called Random Survival Forests, they transformed the models into a computer-based strategic decision aid to facilitate personalized decision-making. “The idea was to quantitatively make decisions with a model that could go down many different paths rather than the norm of going down each path individually,” Dr. Blackstone explains. “These methods have subsequently morphed into a formal set of machine learning tools that allow clinicians to test alternative treatment scenarios to recommend the one that maximizes an outcome, such as length of life.”


Classification of bicuspid aortopathy. In work published just last year (J Thorac Cardiovasc Surg. 2018;155:461-469), Dr. Blackstone and colleagues applied “unsupervised clustering algorithms” to a data set of 656 patients with bicuspid aortic valves who underwent ascending aorta surgery at Cleveland Clinic over a 12-year period. This method finds similarities in uncategorized data and then groups similar data points, allowing processing of a large number of variables to uncover underlying patterns. The result was a new classification of bicuspid aortopathy that for the first time established a statistical relationship between the shapes of bicuspid valves and patterns of aortic aneurysms.

Selection of variables for esophageal cancer staging and precision cancer care. Cleveland Clinic thoracic surgeons and data scientists gathered data on patients with esophageal cancer from all six inhabited continents to develop a data-driven approach to staging for the seventh and eighth editions of the cancer staging manual of the American Joint Committee on Cancer. “Survival of individual patients with esophageal cancer is all over the map, which means ‘average survival time’ is not very meaningful,” Dr. Blackstone says. “We used machine learning algorithms — specifically, Random Survival Forests — to enable much more precise prediction of which treatments will promote longer survival at the individual patient level. This approach is generally applicable in many other conditions as well.”

Longitudinal assessment of valve function. In the past, emphasis has been on clinical events, like death and stroke, that occur during follow-up after heart surgery. Today, however, fully half of analyses done by Dr. Blackstone’s group focus on longitudinal data. Examples include series of follow-up echocardiographic assessments of pressure gradient developing across a repaired heart valve or an artificial heart valve, or serial assessments of the degree of leakiness of such valves. This information can inform clinicians about the durability of the valve repair or replacement device, but it also serves to identify factors that increase or decrease the speed of developing gradients or leakage, thereby informing clinicians of who is at risk and who needs reintervention. These analytic methods, developed at Cleveland Clinic with National Institutes of Health (NIH) funding, harken back to Dr. Blackstone’s early career at the University of Chicago in the late 1960s, where he helped develop the field of digital signal processing, whereby complex longitudinal data can be broken down into simple components much like white light is broken into a rainbow of colors by a prism.

Continuously updated heart transplant waitlist mortality estimation. In a recent publication (J Am Coll Cardiol. 2018;72:650-659), Cleveland Clinic clinicians and statisticians shared initial findings from a dynamic model that continuously updates predicted mortality based on lab values and other clinical measures as the condition of a patient on the heart transplant waitlist changes over time. Using a newly developed method for analyzing time-related mortality, the model recomputes patients’ mortality risk with each new clinical event or change in key lab values or organ function. “A model of this type that uses time-varying mortality risk estimation could reduce mortality on the waitlist and promote better utilization of the limited supply of donor hearts,” Dr. Blackstone explains. “It represents a combination of traditional statistics with machine learning.”

Identification of variables most predictive of death on the heart transplant waitlist. In related work, Dr. Blackstone and Cleveland Clinic heart transplant researchers have joined with colleagues from several other institutions on a project funded by a $2.8 million National Institutes of Health grant to develop new machine learning methods to examine and reduce disparities in survival among heart failure patients before and after transplant. The researchers are applying machine learning to the national Scientific Registry of Transplant Recipients database to identify major risk factors for death on the waitlist and how these factors interact. The group’s first publication (Am J Transplant. 2019 Jan 19 [Epub ahead of print]) identified the most important variables in predicting mortality on the waitlist, including a couple that are not in the current allocation system for donor hearts. “This work involves some new mathematics well suited to handling complex interactions among variable and large amounts of missing data,” Dr. Blackstone observes.

What’s ahead

Dr. Blackstone sees emerging heart valve therapies as one of the most active frontiers for future data science applications in cardiothoracic surgery. “I expect that we’ll be doing a lot of work to support decisions around when to do valve procedures surgically versus percutaneously, such as which approach and which valve type is best for a given patient,” he says.

He notes that Cleveland Clinic can draw on the troves of data it collected as one of the data analysis centers for the PARTNER trial program for transcatheter aortic valve replacement. “We also benefit from the considerable follow-up we have done with so many cardiovascular patients at Cleveland Clinic dating back to the early 1970s,” he adds. “Good data science requires the coupling of short-term data with long-term outcomes.”

Challenges and opportunities

Yet he sees follow-up as the biggest challenge now facing data science. “Patient follow-up today is much more difficult than it was 20-some years ago, in part because of patient privacy laws,” he says. The result is inferior follow-up data, which he notes is exacerbated by a continuing lack of interconnectivity of health data information systems. “Our methods have ended up being better than our data.”


Another challenge for a leader in his role is keeping top data science talent around for the next project. “The people who are good at this work come from a diversity of backgrounds — from traditional statistics to computer science to mathematics to computational fluid dynamics,” he says. “Whoever they are, it can be a challenge to keep good data scientists around, as industry is always looking to recruit them.”

Cleveland Clinic hopes its recent establishment of a cross-disciplinary Center for Clinical Artificial Intelligence (detailed here) may assist in attracting and keeping top talent, and Dr. Blackstone sees the center as a promising new partner.

Meanwhile, he also spends time mentoring other clinician-researchers with a shared interest in championing novel data methods to help shape cardiothoracic surgery. One current protégé is new Cleveland Clinic congenital heart surgeon Tara Karamlou, MD, who trained with him in the past.

“One way Cleveland Clinic tries to share the wealth in terms of data science expertise is through cardiothoracic surgery fellows and residents who train here,” Dr. Blackstone says. “They can go back to their home institution with knowledge of how to apply new tools or methodologies. But the most fundamental step any institution can take toward advanced data techniques is to first get a handle on their data. Start by understanding your data and focusing on the quality of your outcomes. That’s where Cleveland Clinic started many decades ago. Success and sophistication follow from that.”

“Gene Blackstone has made a huge contribution to our understanding of the management of cardiothoracic surgery patients,” says longtime colleague Lars Svensson, MD, PhD, Chair of Cleveland Clinic’s Heart & Vascular Institute. “His work has resulted in considerably better care of patients, not only here but around the world. As we look at new fields of statistical endeavor — such as machine learning, voice recognition and interpolation for the EMR, and ultimately automated artificial intelligence — we know that Gene and his team will be at the forefront of implementing these new methods to better understand what makes for world-class cardiovascular care.”

Related Articles

Preemptive Hemodynamic Support for Improving the Safety and Efficacy of VT Ablation

How we’re using a new multidisciplinary approach to broaden the benefits of ablation

Study Shows Machine Learning’s Potential to Predict Cancer Therapy-Related Cardiac Risk

Models developed with promising accuracy and generalizability to clinical practice

Digital Remote Blood Pressure Management: Has Its Time Come?

Keys to success include a team-based approach and integration into clinical workflow

Introducing Robotic Myectomy With Mitral Valve Repair

A minimally invasive, single-incision approach to two coexisting problems

Robotics in the Cardiac Cath Lab

A long-overdue technology is poised to reshape practice