Humans, AI collaborating makes for more accurate medical diagnoses: Study

2 minutes, 50 seconds Read

Human experts that work with artificial intelligence make more accurate medical diagnoses than they do by themselves, a new study found. Diagnoses made by human-AI collectives were also more accurate than those from AI alone.

These results come from a research team led by the Max Planck Institute for Human Development in Germany. They partnered with the Human Diagnosis Project in San Francisco, Calif. and the Institute of Cognitive Sciences and Technologies of the Italian National Research Council in Rome.

QR code for SAN app download

Download the SAN app today to stay up-to-date with Unbiased. Straight Facts™.

Point phone camera here

A press release from the Max Planck Institute said that diagnostic errors are “some of the most serious problems” in the medical field. While AI programs like ChatGPT, Gemini and Claude 3 can be used to support doctors when making diagnoses, they can also be risky to use, researchers noted.

“AI systems, particularly large language models (LLMs), are increasingly being employed in high-stakes decisions that impact both individuals and society at large, often without adequate safeguards to ensure safety, quality and equity,” researchers wrote in the study’s abstract. “Yet LLMs hallucinate, lack common sense and are biased — shortcomings that may reflect LLMs’ inherent limitations and thus may not be remedied by more sophisticated architectures, more data or more human feedback.”

Researchers tested human experts’ and AI’s diagnostic accuracy by analyzing data from the Human Diagnosis Project, which gave them clinical vignettes, or short descriptions of medical case studies and the correct diagnoses. For the study, researchers looked at more than 2,100 vignettes, and compared the diagnoses made by medical professionals to ones made by five leading AI models, as well as groups that had human experts using AI.

Diagnoses from groups that had both humans and AI were “significantly” more accurate than those that only contained one or the other. Human and AI groups outperformed 85% of human diagnosticians, the study found, though there were also many cases where humans alone did a better job. In addition, when AI had the wrong diagnosis, humans “often” knew the correct one, according to the study.

Adding just one AI model to a group of human experts was enough to improve their results, but the best outcomes usually came from multiple humans using multiple AI tools.

This was especially true for “complex, open-ended diagnostic questions with numerous possible solutions,” the press release said.

“Our results show that cooperation between humans and AI models has great potential to improve patient safety,” lead study author Nikolas Zöller, a postdoctoral researcher at the Max Planck Institutes’ Center for Adaptive Rationality, said.

Reasons for the results

Why is this the case? The press release said humans and AI make “systematically different errors” — so they can complement each other.

“It’s not about replacing humans with machines. Rather, we should view artificial intelligence as a complementary tool that unfolds its full potential in collective decision-making,” study co-author Stefan Herzog, a senior research scientist at the Max Planck Institute, said.

Still, there are some limitations to this research. The study did not look at real patients in clinical settings, just vignettes, and it focused on diagnosing patients, not treating them.

The press release said the study was part of the Hybrid Human Artificial Collective Intelligence in Open-Ended Decision Making (HACID) project. HACID’s goal is to promote the development of future clinical decision-support systems by integrating human and machine intelligence.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *