AINewsWire

Study Finds AI Medical Diagnosis Errors Exceed 80%

A new study suggests that generative AI has yet to develop the level of reasoning required for safe use in clinical settings. While recent systems show stronger performance when given detailed patient information, researchers found they still struggle with one of the most critical aspects of medical decision-making. 

The analysis, conducted by Mass General Brigham researchers, examined how large language models handle diagnostic tasks. Their findings, published in JAMA Network Open, indicate that these systems fall short when it comes to the complex thinking processes doctors rely on in real-world practice. 

According to co-author Marc Succi, current models are not suitable for independent clinical use. He pointed out that despite steady progress, these tools cannot yet match the reasoning involved in forming a differential diagnosis, a process he described as central to the practice of medicine. 

Differential diagnosis involves identifying and comparing possible conditions that could explain a patient’s symptoms. It is typically the first step clinicians take before narrowing down to a conclusion. 

To assess performance, the research team evaluated 21 language models, including systems from major developers such as OpenAI, Google, and Anthropic. The models were tested using 29 standardized clinical scenarios with the help of a new evaluation framework known as PrIME-LLM. 

The tool measures how well systems perform across multiple stages of clinical reasoning, from forming an initial impression to selecting tests, reaching a diagnosis, and proposing treatment. To mirror real medical cases, the researchers introduced information in stages. Models first received basic patient details, followed by examination findings and lab results. 

Even when allowed to continue to later steps after failing early ones, the models showed a consistent weakness in generating appropriate lists of possible conditions. They failed to produce accurate differential diagnoses in over 80% of cases. 

Performance improved significantly when additional data, such as imaging and laboratory results, were provided. Final diagnosis accuracy ranged from roughly 60% to above 90%, depending on the system. 

Among the strongest performers were newer models, including Claude 4.5 Opus, Grok 4, Gemini 3.0 variants, and GPT-5. Despite these advances, the study concluded that even the most capable systems still lack the depth of reasoning needed for safe, unsupervised use. 

The authors emphasized the continued importance of human oversight, warning that AI tools should support, not replace, clinical judgment. 

Experts not involved in the study echoed these concerns. They stressed that AI should not be used to make healthcare decisions without professional oversight and advised the public to seek qualified medical advice when dealing with health issues. 

Developers of cutting-edge tech software and hardware, like D-Wave Quantum Inc. (NYSE: QBTS), may not be surprised by these findings since AI models are only as good as the data they are trained on, and the system keeps improving as more data becomes available. 

About AINewsWire

AINewsWire (“AINW”) is a specialized communications platform with a focus on the latest advancements in artificial intelligence (“AI”), including the technologies, trends and trailblazers driving innovation forward. It is one of 75+ brands within the Dynamic Brand Portfolio @ IBN that delivers: (1) access to a vast network of wire solutions via InvestorWire to efficiently and effectively reach a myriad of target markets, demographics and diverse industries; (2) article and editorial syndication to 5,000+ outlets; (3) enhanced press release enhancement to ensure maximum impact; (4) social media distribution via IBN to millions of social media followers; and (5) a full array of tailored corporate communications solutions. With broad reach and a seasoned team of contributing journalists and writers, AINW is uniquely positioned to best serve private and public companies that want to reach a wide audience of investors, influencers, consumers, journalists, and the general public. By cutting through the overload of information in today’s market, AINW brings its clients unparalleled recognition and brand awareness.

AINW is where breaking news, insightful content and actionable information converge.

To receive SMS alerts from AINewsWire, text “AI” to 888-902-4192 (U.S. Mobile Phones Only)

For more information, please visit www.AINewsWire.com

Please see full terms of use and disclaimers on the AINewsWire website applicable to all content provided by AINW, wherever published or re-published: https://www.AINewsWire.com/Disclaimer

AINewsWire
Los Angeles, CA
www.AINewsWire.com
310.299.1717 Office
Editor@AINewsWire.com

AINewsWire is powered by IBN

AINewsWire

Share
Published by
AINewsWire

Recent Posts

Survey Says AI Has Replaced 20% of Full-Time Roles in the US

A newly released survey suggests AI is quickly becoming part of everyday life for many Americans, both…

1 day ago

OpenAI Receives $122B Funding to Advance its AI Projects

OpenAI has secured a record-setting $122 billion in new funding, pushing its valuation to roughly $852 billion after the deal. The deal…

3 days ago

Safe Pro Group Inc. (NASDAQ: SPAI) Appoints Chief Operating Officer, Is Also Awarded Government Support Order for its Edge Processing Solution

Safe Pro Group, a developer of security and defense solutions, has appointed decorated SOCOM Joint…

4 days ago

Rail Vision Ltd. (NASDAQ: RVSN) Targets Growing Billion-Dollar Train Safety Market with AI Obstacle Detection Systems

The global train collision avoidance system market is expected to reach $13.32 billion by 2030,…

4 days ago

SPARC AI Inc. (CSE: SPAI) (OTCQB: SPAIF) Advances Ukraine Defense Initiatives as Electronic Warfare Reshapes Battlefield

Disseminated on behalf of SPARC AI Inc. (CSE: SPAI) (OTCQB: SPAIF)and may include paid advertising.…

1 week ago

Is AI Dominance Being Shared Between the US and China?

In the decades after World War II, competition between Washington and Moscow centered on nuclear…

1 week ago