A new study suggests that generative AI has yet to develop the level of reasoning required for safe use in clinical settings. While recent systems show stronger performance when given detailed patient information, researchers found they still struggle with one of the most critical aspects of medical decision-making.
The analysis, conducted by Mass General Brigham researchers, examined how large language models handle diagnostic tasks. Their findings, published in JAMA Network Open, indicate that these systems fall short when it comes to the complex thinking processes doctors rely on in real-world practice.
According to co-author Marc Succi, current models are not suitable for independent clinical use. He pointed out that despite steady progress, these tools cannot yet match the reasoning involved in forming a differential diagnosis, a process he described as central to the practice of medicine.
Differential diagnosis involves identifying and comparing possible conditions that could explain a patient’s symptoms. It is typically the first step clinicians take before narrowing down to a conclusion.
To assess performance, the research team evaluated 21 language models, including systems from major developers such as OpenAI, Google, and Anthropic. The models were tested using 29 standardized clinical scenarios with the help of a new evaluation framework known as PrIME-LLM.
The tool measures how well systems perform across multiple stages of clinical reasoning, from forming an initial impression to selecting tests, reaching a diagnosis, and proposing treatment. To mirror real medical cases, the researchers introduced information in stages. Models first received basic patient details, followed by examination findings and lab results.
Even when allowed to continue to later steps after failing early ones, the models showed a consistent weakness in generating appropriate lists of possible conditions. They failed to produce accurate differential diagnoses in over 80% of cases.
Performance improved significantly when additional data, such as imaging and laboratory results, were provided. Final diagnosis accuracy ranged from roughly 60% to above 90%, depending on the system.
Among the strongest performers were newer models, including Claude 4.5 Opus, Grok 4, Gemini 3.0 variants, and GPT-5. Despite these advances, the study concluded that even the most capable systems still lack the depth of reasoning needed for safe, unsupervised use.
The authors emphasized the continued importance of human oversight, warning that AI tools should support, not replace, clinical judgment.
Experts not involved in the study echoed these concerns. They stressed that AI should not be used to make healthcare decisions without professional oversight and advised the public to seek qualified medical advice when dealing with health issues.
Developers of cutting-edge tech software and hardware, like D-Wave Quantum Inc. (NYSE: QBTS), may not be surprised by these findings since AI models are only as good as the data they are trained on, and the system keeps improving as more data becomes available.
About AINewsWire
AINewsWire (“AINW”) is a specialized communications platform with a focus on the latest advancements in artificial intelligence (“AI”), including the technologies, trends and trailblazers driving innovation forward. It is one of 75+ brands within the Dynamic Brand Portfolio @ IBN that delivers: (1) access to a vast network of wire solutions via InvestorWire to efficiently and effectively reach a myriad of target markets, demographics and diverse industries; (2) article and editorial syndication to 5,000+ outlets; (3) enhanced press release enhancement to ensure maximum impact; (4) social media distribution via IBN to millions of social media followers; and (5) a full array of tailored corporate communications solutions. With broad reach and a seasoned team of contributing journalists and writers, AINW is uniquely positioned to best serve private and public companies that want to reach a wide audience of investors, influencers, consumers, journalists, and the general public. By cutting through the overload of information in today’s market, AINW brings its clients unparalleled recognition and brand awareness.
AINW is where breaking news, insightful content and actionable information converge.
To receive SMS alerts from AINewsWire, text “AI” to 888-902-4192 (U.S. Mobile Phones Only)
For more information, please visit www.AINewsWire.com
Please see full terms of use and disclaimers on the AINewsWire website applicable to all content provided by AINW, wherever published or re-published: https://www.AINewsWire.com/Disclaimer
AINewsWire
Los Angeles, CA
www.AINewsWire.com
310.299.1717 Office
Editor@AINewsWire.com
AINewsWire is powered by IBN
A newly released survey suggests AI is quickly becoming part of everyday life for many Americans, both…
OpenAI has secured a record-setting $122 billion in new funding, pushing its valuation to roughly $852 billion after the deal. The deal…
Safe Pro Group, a developer of security and defense solutions, has appointed decorated SOCOM Joint…
The global train collision avoidance system market is expected to reach $13.32 billion by 2030,…
Disseminated on behalf of SPARC AI Inc. (CSE: SPAI) (OTCQB: SPAIF)and may include paid advertising.…
In the decades after World War II, competition between Washington and Moscow centered on nuclear…