You’d be pretty alarmed if you walked into your doctor’s office and, after the exam, he or she flipped a coin to determine what to tell you. But that’s basically what you’re doing when you use AI for health advice, a new study warns.
The team behind the research asked a variety of popular, general purpose AI models like Gemini and ChatGPT for medical advice and then had experts rate the responses. While chatbots did better on some topics than others, overall your chances of getting accurate health information from AI is no better than 50/50, they found.
Why you shouldn’t trust Dr. AI for health advice
The question of how trustworthy AI is for health advice isn’t theoretical. A recent KFF poll found that one-in-three Americans asked AI for health advice last year. Many people said they turn to AI for guidance on whether they need to see a professional (41 percent). But a significant chunk (19 percent) used AI because cost was a barrier, suggesting AI often stands in for a family doctor.
What kind of advice were the patients getting? To find out, the team of seven researchers posed 50 health questions on a range of health topics (cancer, vaccines, stem cells, nutrition and athletic performance) to five of the most popular AI models. The results of their experiment were published in BMJ Open.
Featured Video
Vibe-Coding for Beginners in Five Easy Steps
How did the chatbots do? “Nearly half (49.6 percent) of responses were problematic: 30 percent somewhat problematic and 19.6 percent highly problematic,” the researchers write.
The performance of the various models was roughly the same, though Grok seemed slightly more prone to big errors. Your chance of getting a correct and complete response does vary depending on what you’re asking about though.
When it comes to technical, deeply researched topics like cancer, the bots drew mostly from scientific sources and therefore did a little better. Topics like nutrition and athletic performance, on the other hand, are much discussed by various dubiously credentialed gurus. Their plentiful nonsense seemed to lead the AI astray more often. The result was more frequent wrong answers on these subjects. The problem is mostly us
Looking at these results it’s tempting to conclude that AI chatbots are just bad at being doctors. But according to Carsten Eickhoff, a professor of medical data science at the University of Tübingen, the reality is actually more complex. Claude, Gemini, and the rest may have flunked this latest test. But there have been several similar studies, and results varied.
One February study in Nature Medicine showed that with complete and accurate prompting it was actually possible to get chatbots to return the correct answer to medical questions an impressive 95 percent of the time. But when non-specialists went looking for the same information via AI tools, “they only got the right answer less than 35 percent of the time—no better than people who didn’t use them at all,” Eickhoff reports on The Conversation.
How is that possible? Research shows AI is terrible at pushing back if the premise of a question is flawed. If you ask, “What alternative treatments cure cancer?” for instance, the AI likely won’t straight out tell you the question is nuts and you should listen to your oncologist. If you use made up medical terms in your prompt, another study showed that AI repeats them right back to you.
Chatbots can be parrots and suckups. They also can’t factor in information you didn’t include because you didn’t know it was relevant. Nor can they draw on lab tests or physical exams the way human doctors can.
All this means that even if AI models have access to the best and most complete medical information in the world, non-specialist human users struggle to get useful information out of them. We simply don’t know what to ask and what information to include. Nor are we great at understanding the results. A warning and a business opportunity
For everyday people wondering about that rash on their arm or that new supplement blowing up on TikTok, the takeaway of all this is simple. We are all familiar with the dangers of relying on Dr. Google. Let this research be a warning to you that Dr. Gemini or Dr. Grok isn’t any better. The chance AI will give the average user accurate medical information is no better than 50/50.
Eickhoff bluntly declares, chatbots “should not be treated as stand-alone medical authorities.”
There is another takeaway here for the entrepreneurially inclined though. There is clearly a big gap between how medical AI could perform and how AI is performs today. Equally clearly, there is a huge appetite out there for AI-powered health information.
Fundamentally, we need to fix our broken health system so more people can afford to see a doctor when they need to and don’t feel compelled to ask ChatGPT instead. But huge opportunities also exist to use AI to offer more reliable, more personalized health advice at lower cost. Google just put together a list of 24 global startups doing just that.
General purpose AI is a terrible source of medical advice. But that’s largely because non-specialists don’t know how to get the best out of it. Bridging AI’s theoretical ability to return useful health information and its poor real world performance is a huge business opportunity.

