New Data Mining Method Offers Easier Access to Epic’s Massive Data Trove

Streamlining the process so it's as painless as possible

To do studies without wasting precious time and money, researchers need to efficiently explore data housed within their institution’s set of electronic health records (EHRs). However digging into patient data to extract the precise information needed often proves a major headache.

Advertising Policy

Cleveland Clinic is a non-profit academic medical center. Advertising on our site helps support our mission. We do not endorse non-Cleveland Clinic products or services Policy

Now a team of Cleveland Clinic scientists is helping their fellow researchers by devising a better way to extract and utilize health data from the Epic EHR. They are using statistical methods that fall under the umbrella of natural language processing to create a wide array of research-ready data tables.

“Researchers want to mine the data to figure out what works in what sort of patient,” says Michael W. Kattan, PhD, MBA, coauthor along with system analyst Alex Milinovich of a recent paper on this subject. “The question becomes how to make that data mining as painless as possible because data collection takes time, and it’s time that is not reimbursed,” he says. “Everyone groans about this.”

The paper, “Extracting and utilizing electronic health data form Epic for research” appeared in the Annals of Translational Medicine.

Epic … not so research friendly

Cleveland Clinic was an early Epic adopter about 20 years ago and now possesses more than 35 billion individual data points for more than 4 million patients. In fact, Cleveland Clinic is the second largest installation of Epic; Kaiser Permanente is the largest.

Epic works well for taking care of patients, Dr. Kattan says, but it was not developed with research in mind. In fact, Epic makes life difficult for researchers. “It has a gazillion tables where data is housed,” he notes. “It stores information all over the place.” Even worse, a researcher seeking a specific sort of data in Epic will often find that the needed answers are buried in prose notes dictated by clinicians.

Advertising Policy

“At Cleveland Clinic, less than 5 percent of the EHR data are codified variables [of the sort needed for research,]” the team wrote. Ninety-five percent are identifiers, dates and free-text entries.

“I can’t work with a paragraph of text the doctor types,” Dr. Kattan says. “I am looking for a test result, for a number in that paragraph. I want to know if the patient had asthma, yes or no.”

Extracting gold from the EHR hills

To mine the raw Epic EHR and then use it to build robust datasets for statistical analysis, the team uses a number of statistical techniques to clean, parse and map the data. The cleaned, standardized data is then ready to be deposited into a registry of Cleveland Clinic clinical research data.

The statistical techniques used include calculations of similarity and relationships between terms. “For example, the term of ‘Heart Failure’ (C0018801) has relationships to various medications that may treat heart failure, finding sites of heart and myocardium as well as child diagnoses such as congestive heart failure and left-sided heart failure,” the team wrote. They stated that these relationships make querying the EHR easier. Researchers can, for example, identify top-level terms and then identify any pediatric or related terms that suit their population of interest.

“Approximately 185 tables from different data sources are condensed into 18 research-ready tables in the data repository,” the team wrote. These tables are updated automatically, on a weekly basis.

Advertising Policy

With this approach, the authors state, “Cleveland Clinic can do live population exploration as well as produce datasets for analysis faster than it takes most organizations to simply identify their base population.”

Simplifying fulfillment of the mission

Doing research fulfills one of the three pillars of the Cleveland Clinic’s mission, which includes providing “better care of the sick, investigation into their problems, and further education of those who serve.”

“If research is part of your mission, you have to do it somehow,” says Dr. Kattan, and that involves tapping the immense research resources held within the Epic EHR. “We are constantly developing processes to clean the data up, to define things, and to make rules,” he says, “Epic never sleeps.”