ChatGPT “Pretty Good” at Basic IBS Info, but Misses Details

The artificial intelligence (AI) language model, ChatGPT, works reasonably well for answering basic queries from the public about irritable bowel syndrome (IBS), but less so for medical professionals seeking detailed and referenced information.

Cleveland Clinic is a non-profit academic medical center. Advertising on our site helps support our mission. We do not endorse non-Cleveland Clinic products or services. Policy

That’s what researchers from Cleveland Clinic’s Digestive Disease Institute concluded when they tested ChatGPT’s version 4.0 with 15 commonly asked questions about IBS. Its overall accuracy was 80%, and there were no fully inaccurate answers so it’s fine for basic inquiries.

However, the model still missed details and gave some outdated answers. Moreover, physicians seeking medical literature references will be disappointed, explain study co-author Anthony Lembo, MD, the Institute’s Director of Research and presenting co-author, Brian Baggott, MD, a gastroenterologist in Cleveland Clinic’s Department of Gastroenterology, Hepatology & Nutrition.

“For the most part, the model was reasonable. Where it gets lost – and I think this is important to emphasize about ChatGPT – it only has what’s available in the public domain. It can’t find papers that aren’t publicly available,” notes Dr. Baggott.

Regardless of what clinicians might think of ChatGPT, Dr. Lembo says it’s important they understand its advantages and limitations as patients use it to seek information. “ChatGPT is becoming more and more popular among laypeople. It used to be Google, but now they’re turning to ChatGPT because it’s more sophisticated.”

Study methods

For the study, the investigators tested the most current version, ChatGPT4.0, with 15 common questions about IBS that they derived from both ChatGPT itself and from Google Trends. For each question, they asked ChatGPT to provide references from the medical literature along with the answers. Three independent gastroenterologists then assessed ChatGPT’s answers in three ways:

1) An overall assessment as either “accurate” or “inaccurate.”

2) Granular assessments as either “100% accurate,” “accurate with missing information,” “partly inaccurate” or “100% inaccurate.”

3) The references judged as “suitable,” “unsuitable” or “nonexistent.”

Overall, the researchers deemed 80% of the answers “accurate” and 20% “inaccurate.” For the granular assessments, they considered just under two-thirds of ChatGPT’s answers to be 100% accurate, about one-third to be “partly inaccurate,” and a small number “accurate with missing information.” None were considered 100% inaccurate.

Strengths and weaknesses of ChatGPT

The two questions that ChatGPT4.0 answered best were “What causes IBS?” and “What foods should I avoid if I have IBS?” For both, the answers earned both overall and granular grades of “accurate” and the references provided as “suitable.”

Just two questions were answered inaccurately overall. These were “How is IBS diagnosed?” and “Can [cannabidiol] improve IBS symptoms?”

However, several more answers were granularly considered “partly inaccurate,” including “Is there a test for IBS?” Two more were deemed “accurate with missing information,” including “What support resources are available for people with IBS?”

Dr. Baggott points out that even though ChatGPT can’t access subscriber-restricted journal text, it should be able to find other sources referencing that content. Yet, that can take time. For example, it didn’t know that recent guidelines advise against using probiotics for IBS.

“It’s not always up to date. Plus, the results in that example may be confounded by the fact that a lot of people write materials saying that probiotics work for them,” Dr. Baggott notes.

He also points out that ChatGPT may miss some subtle points that may or may not make a difference clinically. “So it’s not that it’s completely wrong, but just not the way I would explain to a patient.”

As for the references, they were “nonexistent” for the questions “What are the treatment options for IBS?” and “How to manage IBS during pregnancy?” Eight more were deemed “unsuitable” while just five were “suitable.”

“ChatGPT is pretty good, but it doesn’t get the references. It will eventually, though,” Dr. Lembo predicts.

The takeaway from this study, he says, is to “caution patients that ChatGPT can give you good information about general topics, but for specific answers you should always consult your doctor.”

The study’s first author, post-doctoral research fellow Joseph El Dahdah, MD, is scheduled to present the new findings for ChatGPT-4.0 Answers Common Irritable Bowel Syndrome Patient Queries: Accuracy and References Validity at the American College of Gastroenterology meeting in Vancouver in late October.