Orthopedics

Comprehensive Summary

This study compares the effectiveness of large language model (LLM) artificial intelligence (AI) systems, such as ChatGPT-4o, DeepSeek-V3, and Gemini Pro, with that of orthopedic surgeons in the context of spinal surgery. Two experienced orthopedic surgeons developed 50 multiple choice questions, including 25 questions examining clinical judgement and 25 questions examining theoretical understanding. The question content covered anatomy, trauma, tumors, infections, postoperative surgical complications, physical examination, deformities, degenerative spine diseases, and congenital spine diseases. The questions were administered to two groups: Group 1 included the three AI models and Group 2 included ten orthopedic and traumatology surgeons who have a minimum of 10 years of clinical experience. Their answers were scored by the surgeons who created the questions and statistical analysis was performed. Group 2 (88.8% accuracy) significantly outperformed Group 1 (44% accuracy) on questions covering clinical judgement, but the groups performed more similarly on the knowledge-based questions. Among the models, DeepSeek-V3 had the highest overall accuracy, followed by ChatGPT-4o, and Gemini Pro. Overall, the research emphasized that while AI models have potential in knowledge-based tasks and supporting physicians, they are not yet fit for clinical decision-making. AI cannot serve as an independent tool due to limitations such as outdated information and lack of scientific citations.

Outcomes and Implications

This research is important because it addresses the growing use of AI in medicine to support clinical decision making in complex surgical fields like spinal surgery. As AI becomes more integrated into healthcare, understanding its capabilities and limitations is critical to ensuring patient safety and achieving desired clinical outcomes. This study demonstrates that while AI can be a valuable tool, it lacks the clinical reasoning required for patient-specific decision making limiting its independent use. This underscores that further development is needed before AI can be considered fully reliable for clinical practices.

Our mission is to

Connect medicine with AI innovation.

No spam. Only the latest AI breakthroughs, simplified and relevant to your field.

Our mission is to

Connect medicine with AI innovation.

No spam. Only the latest AI breakthroughs, simplified and relevant to your field.

Our mission is to

Connect medicine with AI innovation.

No spam. Only the latest AI breakthroughs, simplified and relevant to your field.

AIIM Research

Articles

© 2025 AIIM. Created by AIIM IT Team

AIIM Research

Articles

© 2025 AIIM. Created by AIIM IT Team

AIIM Research

Articles

© 2025 AIIM. Created by AIIM IT Team