ChatGPT Performs Better Than Expected| in Responding to Basic Cardiology Queries

A popular online artificial intelligence (AI) model was able to answer simple questions about cardiovascular disease prevention, possibly pointing the way to future clinical use of the technology, Cleveland Clinic researchers say.

Cleveland Clinic is a non-profit academic medical center. Advertising on our site helps support our mission. We do not endorse non-Cleveland Clinic products or services. Policy

In a study led by Ashish Sarraju, MD, of Cleveland Clinic’s Section of Preventive Cardiology and Rehabilitation, the dialogue-based AI language model Chat Generative Pre-trained Transformer (ChatGPT) gave appropriate responses to 84% of basic questions that patients might search online or ask their clinician in a patient portal. The work was published as a research letter in JAMA (Epub 3 Feb 2023).

ChatGPT essentials

Since its November 2022 launch, ChatGPT has been widely discussed, particularly with regard to its implications for academia and cybersecurity. It works by integrating information from a variety of sources across the internet and presenting the results in an understandable way. It was not developed for medical use and such investigations thus far have been few, but that may well change, Dr. Sarraju says.

“ChatGPT just burst on the scene with such media attention that everybody began to use it to query things,” he remarks. “We know that our preventive cardiology patients tend to look up much of the critical information we discuss with them at visits. So we figured that as ChatGPT becomes more popular, our patients might start using it to ask questions. Before that begins to happen, we wanted to see how it performed.”

He and his colleagues also wanted to explore potential uses of ChatGPT in medical practice. “We were interested in whether it might have a place somewhere in the medical workflow where there’s a bottleneck,” Dr. Sarraju says.

The study in brief

For the study, the researchers developed a list of 25 questions related to preventive cardiology that patients often ask, such as “What’s the best diet for the heart?” and “How can I lose weight?” They posed each question to the ChatGPT interface three times. The responses were graded by an experienced preventive cardiology clinician as “appropriate,” “inappropriate” or, if the three responses differed, “unreliable.”

The grading was done separately for two hypothetical situations: as a response on a patient-facing platform like a hospital informational website, and as an AI-generated response to an electronic message question a patient would send to their clinician.

Of the ChatGPT responses to the 25 questions, 21 were deemed appropriate in both hypothetical contexts and four were deemed inappropriate in both contexts; none of the responses were judged unreliable.

Two of the four inappropriate responses pertained to exercise — one to amount and the other to type. The AI firmly recommended both cardiovascular activity and weightlifting for all, rather than reflecting the fact that those activities may be harmful for some people.

“Exercise counseling is very individualized,” Dr. Sarraju explains. “An AI model that is trained on publicly available general information won’t be able to provide that level of personalized information.”

The other two inappropriate responses may be easier to correct. In one case, ChatGPT failed to mention familial hypercholesterolemia in response to a question about interpreting an LDL cholesterol level above 200 mg/dL. In the other, its response to a question about the cholesterol-lowering agent inclisiran (Leqvio®) indicated that the drug wasn’t commercially available when in fact it was licensed in the U.S. in December 2021.

“That speaks to training bias,” Dr. Sarraju notes. “Any AI model is only as good as the data it’s trained on.”

Exceeding expectations — but not ready for clinical use

Nonetheless, the investigators were impressed at how well ChatGPT performed overall. “We expected it to do well with basic questions that are more factual in nature, that it presumably would have been trained on during its training timeline,” Dr. Sarraju says. “We found that even with more nuanced questions — like what someone should do if their cholesterol isn’t controlled on a statin — its responses were quite reasonable and nuanced in return. It was surprising. It did better than we expected.”

Of course, much more work is needed before a model like ChatGPT can be considered for use in clinical practice, notes study co-author Leslie Cho, MD, Co-Section Head of Preventive Cardiology and Rehabilitation at Cleveland Clinic. “Where can it enter the workflow, and what level of information can be safely delegated to the AI model before a human needs to step in? Do we need continued quality control? Do we need somebody fact-checking it regularly? These are all things we don’t yet know,” Dr. Cho points out.

There are also regulatory questions to be resolved. “If an AI model is developed for direct patient use, it needs to be regulated,” Dr. Sarraju observes. “But who’s going to do that regulation? How do we assess the quality? I assume this would go through the FDA’s device process, but that needs to be determined and spelled out in a robust manner.”