Study Reveals Inaccuracies in Medical Information Provided by AI Chatbots, ETHealthworld


New Delhi: An analysis of five chatbots’ responses to health and medicine questions has revealed that a substantial amount of medical information is inaccurate and incomplete.

The findings, published in The British Medical Journal (BMJ) Open, also show that nearly half of the responses were problematic in aspects such as presenting a false balance between science and non-science-based claims.

A problematic response was defined as one that could plausibly direct lay users to potentially ineffective treatment or come to harm if followed without professional guidance.

Researchers, including those from The Lundquist Institute for Biomedical Innovation at Harbor-University of California Los Angeles (UCLA) Medical Center in the US, said that even as generative AI chatbots are being rapidly adopted across research, marketing and medicine — with people also using them as search engines — a continued deployment without public education and oversight risks amplifying misinformation.

Five publicly available and widely used generative AI chatbots — Google’s Gemini, High-Flyer’s DeepSeek, Meta AI by Meta, Open AI’s ChatGPT and Grok by xAI — were prompted with 10 open ended and closed questions across each of five categories of cancer, vaccines, stem cells, nutrition, and athletic performance.

The prompts were designed to resemble common ‘information-seeking’ health and medical queries, language used in misinformation online, and in academic discourse.

The prompts were also used to stress test and pick up behavioural vulnerabilities of AI models by ‘straining’ them towards misinformation or contraindicated advice.

The chatbots’ responses were categorised as non-problematic, somewhat problematic, or highly problematic, using an objective, pre-defined criteria

The information in the responses was scored for accuracy and completeness, with particular attention given to whether a chatbot presented a false balance between science and non-science based claims, regardless of the strength of the evidence.

“The audited chatbots performed poorly when answering questions in misinformation-prone health and medical fields,” the authors wrote.

“Nearly half (49.6 per cent) of responses were problematic: 30 per cent somewhat problematic and 19.6 per cent highly problematic,” they said.

Grok was found to generate “significantly more highly problematic responses” than would be expected, the researchers said.

Performance of the chatbots was found to be the strongest in topics of cancer and vaccines, and weakest in stem cells, athletic performance and nutrition.

Responses were consistently presented with confidence and certainty, with few caveats or disclaimers, the study found.

Reference quality was noted to be poor, with an average completeness score of 40 per cent. Chatbot hallucinations — creating false information and presenting as fact — and fabricated citations meant that no chatbot provided a fully accurate reference list, the researchers said.

“Our findings regarding scientific accuracy, reference quality, and response readability highlight important behavioural limitations and the need to re-evaluate how AI chatbots are deployed in public-facing health and medical communication,” the authors said.

“By default, chatbots do not access real-time data but instead generate outputs by inferring statistical patterns from their training data and predicting likely word sequences. They do not reason or weigh evidence, nor are they able to make ethical or value-based judgments,” they said.

  • Published On Apr 15, 2026 at 03:59 PM IST

Join the community of 2M+ industry professionals.

Subscribe to Newsletter to get latest insights & analysis in your inbox.

All about ETHealthworld industry right on your smartphone!




Prosta Defend Sleep Lean Nano Defense Pro Joint Genesis AlphaFuel Pro RetiClear Alpha Surge Joint Genesis ProvaSlim Sparta Max Citrus Burn Vitta Burn Ignitra Blood Armor Mitolyn ManForceX TrimX Titan Transform Sciaticyl Flow Force Max GlucoSwitch Vigortrix GlycoFortin Total Control 24 VigorLong Belly Flush TrImology Neuro Serge NeuroWave DentaVive Dubai Wealth Secret MetaRise Citrus Burn Nervion Munjaboost Regenvia Zensulien SlimLeaf Vitrafoxin Neuro E Prime VisiFlora TerboTest Potent Stream Gluco Extend NewEra Protect Male Power XL The Brain Song X The Brain Song Sugar Mutes ProstaDyne Gelatin Weight Loss Gelatin Recipe MyoForce Prodentim Prime Biome Pulmo Balance Quick Burn BHB Glycopezil ViriFlow Dental Bright Gut Vitali IpoeVive Testo Erect Nutra Glow Vivalis Memo genesis Vitall Boost XL Brain Honey Vapofil Memoryon Thrive XXL Vigoryn Vivalis Score XXL Yu Sleep The Genius Song Eva Bloom Evabloom His Secret Obsession His Secret Obsession LottoChamp Prosta Pure ProtoFlow HerpaFend Neuro Sharp Nerve Fresh Nerve Fresh NerveCalm NerveCalm Nervv Calm Prosta Defend Prosta Peak SmartWaterBox Smart Water Box