Locations:
Search IconSearch
February 27, 2026/Behavioral Health/Research

When More Isn’t Better: ChatGPT’s Readability Gap in Opioid Use Disorder Education

New study highlights the need for plain-language prompting and human oversight in addiction care communication

Patient holding pills and phone

As generative artificial intelligence becomes embedded in clinical and patient-facing workflows, questions about accuracy are increasingly joined by concerns about readability and tone. A new comparative analysis of ChatGPT-generated responses and U.S. health organization frequently asked questions (FAQs) on opioid use disorder (OUD) provides timely data on how large language models perform when tasked with patient education in a stigmatized and literacy-sensitive domain.

Advertisement

Cleveland Clinic is a non-profit academic medical center. Advertising on our site helps support our mission. We do not endorse non-Cleveland Clinic products or services. Policy

OUD affects an estimated 16 million people worldwide and has contributed to more than 1.2 million deaths globally between 2014 and 2023, including more than 500,000 opioid-involved overdose deaths in the United States alone. Against this backdrop, accessible and non-stigmatizing communication has become an essential component of treatment, explains Cleveland Clinic psychiatrist Akhil Anand, MD, who coauthored the study.

“When addressing a disorder that has claimed more than a million lives globally in less than a decade, how we communicate is central to care,” he says. “Patients with OUD are often navigating shame, misinformation and ambivalence about treatment. If the information they encounter is overly complex or subtly stigmatizing, we risk reinforcing barriers that can directly influence whether someone seeks treatment.”

Key findings

The study, recently published in the American Journal of Addictions, evaluated 50 OUD-related FAQs drawn from U.S. federal and state public health agencies, academic medical centers and national professional societies. Each question was entered into ChatGPT, and responses were compared with the original organizational FAQ answers. Outcomes included structural measures (word and sentence counts), linguistic complexity (lexical density, syllables and characters per word), six standard readability indices and frequency of stigmatizing terms using the National Institute on Drug Abuse “Words Matter” framework.

The differences were striking, says Dr. Anand,an addiction specialist at Lutheran Hospital.

Advertisement

ChatGPT responses were substantially longer, with a mean word count of 253.7 compared with 76.6 for organizational FAQs—a mean difference of 177 words (95% CI, 151–203). Sentence counts nearly doubled (18.2 vs. 9.0; mean difference 9.2). Lexical density was higher by 6.5 percentage points (95% CI, 4.0–9.0), and ChatGPT used longer words, with greater characters and syllables per word. Although words per sentence were only modestly higher, the cumulative effect was increased syntactic and informational load.

Readability indices were consistent across the board. Compared with organizational FAQs, ChatGPT responses scored higher (indicating more difficult reading levels) on the Coleman–Liau Index (+3.43), Gunning Fog (+3.47), SMOG (+2.96), Flesch–Kincaid Grade Level (+3.61), and Automated Readability Index (+4.33). Flesch Reading Ease scores were lower by 20.4 points. All differences were statistically significant (p < .05). Notably, both sources exceeded the recommended sixth- to eighth-grade reading level for patient materials, but ChatGPT deviated further from established health literacy targets.

By contrast, stigmatizing language was infrequent in both groups and did not differ significantly. Sentences containing terms flagged by the National Institute on Drug Abuse list occurred in 9.6% of ChatGPT responses versus 6.0% of organizational FAQs (difference 3.57 percentage points; p = .16). The study team emphasized that automated screening was supplemented with human review, underscoring the limits of purely computational approaches to stigma detection.

Advertisement

Addressing literacy

For physicians, the key takeaway is not that ChatGPT produces problematic content per se, but that its default language may be misaligned with the literacy needs of many patients with OUD.

“Clinicians often assume that more information is better – but in OUD care, cognitive load matters,” Dr. Anand says. “When responses triple in length and jump by three or four grade levels, you risk losing the very patients you’re trying to engage.”

He notes that while ChatGPT’s answers were more comprehensive, they also reflected a more academic, written style — higher lexical density and longer words — that may challenge patients with limited health literacy.

“The model appears to err on the side of completeness and nuance,” he notes. “That’s admirable from a medical standpoint, but it doesn’t necessarily translate into clarity for a patient in crisis.”

Dr. Anand emphasizes that the findings also raise concerns about the uneven distribution of health literacy and its effect on social determinants, digital access and educational opportunity. He notes that default outputs that exceed recommended reading levels may disproportionately disadvantage patients with limited literacy, older adults and those with chronic conditions — populations already overrepresented in OUD morbidity and mortality statistics.

Importantly, the study did not evaluate factual accuracy, empathic tone, or motivational interviewing — consistent language — factors that are central to addiction care. Nor did it assess how patients interpret or act on chatbot-generated information. The analysis represents a snapshot of a single model version at a single time point, and large language models are evolving rapidly.

Advertisement

Still, the results quantify a trade-off that many clinicians have intuited: scalability and comprehensiveness may come at the cost of readability.

“Large language models can simplify text when explicitly prompted,” Dr. Anand observes. “But this study shows that if you use them ‘out of the box,’ you may get content that’s technically sound yet overly complex.”

Looking ahead

For addiction medicine in particular, Dr. Anand says the study’s implications are clear.

“Communication is not neutral; it shapes trust, stigma, and willingness to seek treatment,” he explains. “Although we found no significant increase in stigmatizing terminology, increased complexity alone may constitute a barrier to care.”

As generative AI continues to permeate clinical practice, Dr. Anand notes that physicians will need to evaluate not only whether a model is accurate, but whether it is accessible.

The researchers ultimately support a hybrid approach that leverages AI for scalability and draft generation, but anchors patient education in human judgment, health literacy standards and person-first language.

“In OUD care, where engagement can be fragile and stakes are high, plain language is not a stylistic preference – it is a clinical intervention,” Dr. Anand concludes. “And for now at least, the art of clear communication in addiction care remains a distinctly human responsibility.”

Advertisement

Related Articles

opioids
December 6, 2023/Pulmonary/Research
Can Kappa and Alpha-2 Agonist Agents Treat Opioid-Induced Ventilatory Depression Risk While Preserving Analgesic Effects?

Two NIH grants are looking at developing new antidotes against fentanyl overdose

23-NEU-4366965-CQD-Hero-650&#215;450-1
November 20, 2023/Behavioral Health
Experts Stress Importance of Improved Access to Fentanyl Testing

Urine test strips and point-of-care testing may be key to slowing opioid epidemic

Pill bottles manufacturer
How to Do Better With CDC Opioid Guidelines

Clinical judgment is foundational to appropriately prescribing

data strings on computer screen merged with illustration of a brain clot
LLM-Based Tool Shows High Accuracy in Flagging Contraindications to Stroke Thrombolysis

Aim is for use with clinician oversight to make screening safer and more efficient

computer keyboard
AI Tools Boost Efficiency, Patient Experience

Automating routine medical coding tasks removes unnecessary barriers

photo of Dr. Kapoor
November 17, 2025/Neurosciences/Podcast
Complex Tech Is Improving Care for Complex Pain Conditions (Podcast)

Add AI to the list of tools expected to advance care for pain patients

Physician showing patient the phone-based AI scribe
Less Typing, More Talking: How Ambient AI Is Reshaping Clinical Workflow at Cleveland Clinic

Advanced software streamlines charting, supports deeper patient connections

Chris Weight typing into laptop in office
Artificial Intelligence Tool Informs Prostate Cancer Management

AI histologic classifier reliably predicts clinical risk in men post-prostatectomy

Ad