LLM-driven system uses both structured and unstructured data, provides auditable justifications
Image content: This image is available to view online.
View image online (https://assets.clevelandclinic.org/transform/06dd3e8d-d6b4-4c44-9407-5c1c83045405/AI-assess-patients-1479572879)
male doctor working at laptop with a high-tech algorithmic overlay
A medically trained artificial intelligence (AI) system deployed within a health system firewall accurately identified eligible and ineligible patients for a rare disease clinical trial, providing auditable and valid justifications. The findings, from a new study published by Cleveland Clinic researchers, suggest that this approach may enhance the efficiency of chart review and recruitment processes for clinical research.
Advertisement
Cleveland Clinic is a non-profit academic medical center. Advertising on our site helps support our mission. We do not endorse non-Cleveland Clinic products or services. Policy
“Our study showed that an AI system that uses both structured and unstructured data from complex real-world electronic health records [EHRs] was 96% accurate in assessing patients’ trial eligibility across nine prespecified domains of trial criteria,” says M. Trejeeve Martyn, MD, MSc, first author of the investigation, which was presented at the 2026 Technology and Heart Failure Therapeutics meeting and simultaneously published in the Journal of Cardiac Failure, official journal of the Heart Failure Society of America.
“These findings have implications for how AI can help us use EHR data more broadly across research beyond clinical trials that we are actively evaluating, with potential applications to retrospective studies, implementation studies and quality reporting for registries,” Dr. Martyn continues. “The number of potential use cases is large.”
Contemporary EHRs are complex, making manual chart review for clinical research time-consuming and costly. Previous efforts to automate chart review using large language models (LLMs) have shown promise but faced limitations such as reliance on simulated data, use of exclusively structured or unstructured data without synthesis, and lack of auditable justifications for decisions.
“Even though there are tremendous amounts of data that live in the EHR, the ability to use these data at scale for purposes like determining clinical trial eligibility has historically been limited,” Dr. Martyn notes.
The new study evaluated an AI system (Synapsis AI, Dyania Health) that has an LLM component and aims to overcome earlier limitations by synthesizing both structured and unstructured data from real-world EHRs and by providing interpretable justifications for its conclusions. The researchers studied the system’s ability to assess eligibility for a transthyretin amyloidosis therapeutics trial.
Advertisement
The AI system was deployed in August 2024 within the firewall of a unified EHR system covering multiple Cleveland Clinic hospitals and clinics in Ohio and Florida. Patients with cardiac amyloidosis-related diagnosis codes were prefiltered before AI system processing.
Cleveland Clinic investigators and the vendor created a “scoping document” based on the DepleTTR-CM phase 3 trial protocol, which served as the rubric for evaluating the AI system’s performance.
The system assessed 32 inclusion/exclusion criteria for the trial: 10 based on structured data alone, 18 combining structured data and LLM outputs (unstructured data) and four relying solely on LLM outputs.
“Structured data are from discrete fields that can readily be pulled, such as ICD-10 codes or lab values,” Dr. Martyn explains. “Unstructured data are things like imaging reports, pathology reports and clinical notes. Unstructured data need to be accurately abstracted and organized, and that’s where the LLM comes in and where around 80% of EHR data is housed.”
The system could assign one of four labels for each criterion: accept, reject, borderline or missing information. Based on the collective criteria assessments, patients were then categorized as follows:
Advertisement
The process ended with a “human in the loop” review in which a study investigator reviewed all partial, complete and borderline matches. Final calls on eligibility were made by the study investigators.
The primary outcome was the AI system’s accuracy in evaluating 77 trial criteria-related questions across nine broad criteria categories in a random sample of 100 patients.
Prefiltering yielded 1,476 patient EMRs with amyloid-related diagnosis codes. The AI system processed these records over six days using two graphics processing units.
In a random sample of 100 patients, the LLM answered 7,409 out of 7,700 questions correctly, achieving an accuracy of 96.2% against physician review. Patient-level accuracy averaged 96%, with a minimum of 86% (in 1 patient) and a maximum of 100% (in 12 patients).
The AI system identified 46 matches, of which 43 were deemed appropriate after human review (93.4% accuracy) — 4/4 as complete matches and 39/42 as partial/borderline. Three rejections were attributable to LLM errors identified during human review. Some patients were excluded for non-protocol reasons or deferred due to borderline criteria. After excluding borderline deferrals, 30 patients were deemed immediately recruitable. Among AI system-identified matches, 100% of complete matches and 76.5% of partial matches were considered eligible for recruitment.
Justifications provided by the system were judged 100% interpretable without further chart review.
Of 1,446 patients rejected by the AIS, a random sample of 200 were physician-reviewed, with 198 rejections deemed appropriate, yielding a negative predictive value of 99%.
Advertisement
Notably, 29 of the 30 patients identified as readily recruitable had not been identified through routine screening processes in the prior 90 days. Exploratory analysis showed that AI-assisted screening identified more patients (30 over 7 days) compared with routine care (14 patients), with a higher proportion of Black patients and less connection to heart failure specialists among AI-identified patients.
“This AI system demonstrated rapid processing of structured and unstructured data to provide accurate eligibility assessments with interpretable justifications,” Dr. Martyn observes. “The LLM showed consistent high performance — above 96%— across multiple trial criteria domains.”
He identified several key takeaways from the study:
Advertisement
“Clinical trials are the backbone of evidence generation in cardiology, but we know that they are expensive, time-consuming and often have trouble reaching enrollment goals,” notes study co-author Ashish Sarraju, MD, a Cleveland Clinic preventive cardiologist. “Efforts like this to incorporate auditable AI into clinical trial workflows are crucial opportunities to see if clinical trial conduct can be improved meaningfully with new technologies.”
“Our next steps are to deploy this technology for use in recruitment for more trials, both in rare conditions and more common diseases,” Dr. Martyn says. “Even common-disease trials have extensive inclusion and exclusion criteria, so this could save a lot of time spent screening for ultimately ineligible patients in that setting as well.”
Advertisement
New study highlights the need for plain-language prompting and human oversight in addiction care communication
Aim is for use with clinician oversight to make screening safer and more efficient
Automating routine medical coding tasks removes unnecessary barriers
Add AI to the list of tools expected to advance care for pain patients
Advanced software streamlines charting, supports deeper patient connections
AI histologic classifier reliably predicts clinical risk in men post-prostatectomy
5% of flagged ECGs in real-world study were from patients with previously undiagnosed HCM
Model shows promise in differentiating from hypertrophic cardiomyopathy and other conditions