AI chatbots oversimplify scientific studies and gloss over critical details — the newest models are especially guilty – 1v1 Video Chat & LIve Streaming & Influencer Subscription

By Lisa D. Sparks

Live Science
Live Science

Search Live Science

View Profile

Planet Earth

Archaeology

Physics & Math

Human Behavior

Science news

Life’s Little Mysteries

Science quizzes

Newsletters

Story archive

Live Science crossword puzzle
NASA confirms ‘interstellar visitor’
Shipwreck with ‘eye-watering’ treasure
Adult brain cells
Bird flu spread

You may like

AI hallucinates more frequently the more advanced it gets. Is there any way of stopping it?

‘Meth is what makes you able to do your job’: AI can push you to relapse if you’re struggling with addiction, study finds

AI ‘hallucinates’ constantly, but there’s a solution

The researchers published their findings in a new study April 30 in the journal Royal Society Open Science.

“I think one of the biggest challenges is that generalization can seem benign, or even helpful, until you realize it’s changed the meaning of the original research,” study author Uwe Peters, a postdoctoral researcher at the University of Bonn in Germany, wrote in an email to Live Science. “What we add here is a systematic method for detecting when models generalize beyond what’s warranted in the original text.”
It’s like a photocopier with a broken lens that makes the subsequent copies bigger and bolder than the original. LLMs filter information through a series of computational layers. Along the way, some information can be lost or change meaning in subtle ways. This is especially true with scientific studies, since scientists must frequently include qualifications, context and limitations in their research results. Providing a simple yet accurate summary of findings becomes quite difficult.
“Earlier LLMs were more likely to avoid answering difficult questions, whereas newer, larger, and more instructible models, instead of refusing to answer, often produced misleadingly authoritative yet flawed responses,” the researchers wrote.

Sign up for the Live Science daily newsletter now
Get the world’s most fascinating discoveries delivered straight to your inbox.
Contact me with news and offers from other Future brandsReceive email from us on behalf of our trusted partners or sponsorsBy submitting your information you agree to the Terms & Conditions and Privacy Policy and are aged 16 or over.
Related: AI is just as overconfident and biased as humans can be, study shows
In one example from the study, DeepSeek produced a medical recommendation in one summary by changing the phrase “was safe and could be performed successfully” to “is a safe and effective treatment option.”
Another test in the study showed Llama broadened the scope of effectiveness for a drug treating type 2 diabetes in young people by eliminating information about the dosage, frequency, and effects of the medication.
If published, this chatbot-generated summary could cause medical professionals to prescribe drugs outside of their effective parameters.
Unsafe treatment options
In the new study, researchers worked to answer three questions about 10 of the most popular LLMs (four versions of ChatGPT, three versions of Claude, two versions of Llama, and one of DeepSeek).
They wanted to see if, when presented with a human summary of an academic journal article and prompted to summarize it, the LLM would overgeneralize the summary and, if so, whether asking it for a more accurate answer would yield a better result. The team also aimed to find whether the LLMs would overgeneralize more than humans do.
The findings revealed that LLMs — with the exception of Claude, which performed well on all testing criteria — that were given a prompt for accuracy were twice as likely to produce overgeneralized results. LLM summaries were nearly five times more likely than human-generated summaries to render generalized conclusions.
The researchers also noted that LLMs transitioning quantified data into generic information were the most common overgeneralizations and the most likely to create unsafe treatment options.
These transitions and overgeneralizations have led to biases, according to experts at the intersection of AI and healthcare.
“This study highlights that biases can also take more subtle forms — like the quiet inflation of a claim’s scope,” Max Rollwage, vice president of AI and research at Limbic, a clinical mental health AI technology company, told Live Science in an email. “In domains like medicine, LLM summarization is already a routine part of workflows. That makes it even more important to examine how these systems perform and whether their outputs can be trusted to represent the original evidence faithfully.”
Such discoveries should prompt developers to create workflow guardrails that identify oversimplifications and omissions of critical information before putting findings into the hands of public or professional groups, Rollwage said.
While comprehensive, the study had limitations; future studies would benefit from extending the testing to other scientific tasks and non-English texts, as well as from testing which types of scientific claims are more subject to overgeneralization, said Patricia Thaine, co-founder and CEO of Private AI — an AI development company.
Rollwage also noted that “a deeper prompt engineering analysis might have improved or clarified results,” while Peters sees larger risks on the horizon as our dependence on chatbots grows.
“Tools like ChatGPT, Claude and DeepSeek are increasingly part of how people understand scientific findings,” he wrote. “As their usage continues to grow, this poses a real risk of large-scale misinterpretation of science at a moment when public trust and scientific literacy are already under pressure.”

RELATED STORIES

—Cutting-edge AI models from OpenAI and DeepSeek undergo ‘complete collapse’ when problems get too difficult, study reveals
—’Foolhardy at best, and deceptive and dangerous at worst’: Don’t believe the hype — here’s why artificial general intelligence isn’t what the billionaires tell you it is
—Current AI models a ‘dead end’ for human-level intelligence, scientists agree
For other experts in the field, the challenge we face lies in ignoring specialized knowledge and protections.
“Models are trained on simplified science journalism rather than, or in addition to, primary sources, inheriting those oversimplifications,” Thaine wrote to Live Science.
“But, importantly, we’re applying general-purpose models to specialized domains without appropriate expert oversight, which is a fundamental misuse of the technology which often requires more task-specific training.”

In December 2024, Future Publishing agreed a deal with OpenAI in which the AI company would bring content from Future’s 200-plus media brands to OpenAI’s users. You can read more about the partnership here.

Lisa D. Sparks

Lisa D Sparks is a freelance journalist for Live Science and an experienced editor and marketing professional with a background in journalism, content marketing, strategic development, project management, and process automation. She specializes in artificial intelligence (AI), robotics and electric vehicles (EVs) and battery technology, while she also holds expertise in the trends including semiconductors and data centers.

You must confirm your public display name before commenting

Please logout and then login again, you will then be prompted to enter your display name.

AI hallucinates more frequently the more advanced it gets. Is there any way of stopping it?

‘Meth is what makes you able to do your job’: AI can push you to relapse if you’re struggling with addiction, study finds

AI ‘hallucinates’ constantly, but there’s a solution

Threaten an AI chatbot and it will lie, cheat and ‘let you die’ in an effort to stop you, study warns

AI benchmarking platform is helping top companies rig their model performances, study claims

New study claims AI ‘understands’ emotion better than us

Latest in Artificial Intelligence

Threaten an AI chatbot and it will lie, cheat and ‘let you die’ in an effort to stop you, study warns

AI hallucinates more frequently the more advanced it gets. Is there any way of stopping it?

Advanced AI reasoning models generate up to 50 times more carbon dioxide than common LLMs

New study claims AI ‘understands’ emotion better than us

Your devices feed AI assistants and harvest personal data even if they’re asleep. Here’s how to know what you’re sharing.

Hurricanes and sandstorms can be forecast 5,000 times faster thanks to new Microsoft AI model

Latest in News

Oldest wooden tools unearthed in East Asia show that ancient humans made planned trips to dig up edible plants

Scientists transform pee into material fit for medical implants

Astronaut snaps giant red ‘jellyfish’ sprite over North America during upward-shooting lightning event

Can adults make new brain cells? New study may finally settle one of neuroscience’s greatest debates

Neanderthal DNA may refute 65,000-year-old date for human occupation in Australia, but not all experts are convinced

1,400-year-old temple ruins the size of a city block unearthed in Bolivia