Public Health

Comprehensive Summary

Toal et al. compared large language model (LLM) recommendations with nephrologists’ clinical decision-making regarding kidney biopsies. Over 1,000 nephrologists from 83 countries completed a questionnaire (Aug 2023–Jan 2024), and eight LLMs were asked to answer the same cases. Results showed significant variation: OpenAI’s ChatGPT-3.5 and ChatGPT-4 were most aligned with physician responses, while Mistral Hugging Face was least concordant. Risk aversion also varied, with MedLM being most conservative and Claude 3 least. The authors conclude that LLMs may support—but cannot yet replace—clinical judgment, reinforcing their role as adjunct tools rather than stand-alone decision-makers.

Outcomes and Implications

As the use of artificial intelligence (AI) becomes more implicit in everyday life, the investigation into its medical applications becomes critical. Due to the fact that human, and thereby physician, decision making is poorly understood, using evidence-based AI models to determine the best course of healthcare for a given patient is an important use to consider. Further, the study suggests that LLMs may be useful in assisting clinicians in resolving uncertain decisions. However, the LLMs demonstrated difficulty interpreting more complex situations, limiting their “real world” clinical applicability.

Our mission is to

Connect medicine with AI innovation.

No spam. Only the latest AI breakthroughs, simplified and relevant to your field.

Our mission is to

Connect medicine with AI innovation.

No spam. Only the latest AI breakthroughs, simplified and relevant to your field.

Our mission is to

Connect medicine with AI innovation.

No spam. Only the latest AI breakthroughs, simplified and relevant to your field.

AIIM Research

Articles

© 2025 AIIM. Created by AIIM IT Team

AIIM Research

Articles

© 2025 AIIM. Created by AIIM IT Team

AIIM Research

Articles

© 2025 AIIM. Created by AIIM IT Team