Pediatrics

Comprehensive Summary

This study’s purpose is to evaluate the effectiveness of two language learning models (LLMs), OpenAI o1 and Claude 3.5 Sonnet, for analysing dental radiographs. This study used an open-source data set of 87 images, from which each LLM was individually prompted 5 times to count the number of primary and permanent teeth, including those that hadn’t yet erupted. Each radiograph contained 52 potential tooth positions that experts first identified as present or absent. The results of the LLM identification were classified into true positive (TP), true negative (TN), false positive (FP), and false negative (FN) based on accurately identifying the presence of teeth. Metrics of accuracy, sensitivity, specificity, precision, and F1 score were calculated for each of the 5 trials and averaged. It was found that OpenAI had a higher accuracy, specificity, precision, F1 score, and Fleiss kappa (88.6%, 82.5 %, 95.5%, 92.8%, and 0.90 respectively) while Claude had a higher sensitivity (97.0%). The high Fleiss kappa meant that between readings, OpenAI was far more consistent in analysis between trials, in comparison to Claude’s (0.21). Identification of anterior teeth and premolars exceeded 95% in accuracy in both models. Between types of teeth, OpenAI achieved a higher accuracy in the molars while in primary teeth, Claude showed higher sensitivity, though also lower specificity, accuracy, and precision. It was also found that Claude was better in getting at least 1 perfect trial (correctly identifying all teeth and lack of teeth), while OpenAI had a higher overall rate of perfect analysis regarding all trials. Both models had issues reporting anterior teeth and premolars in cases where there were not any, likely due to limited data set. It was lastly found that Claude had many false positives due to its low specificity (29.8%), unlike OpenAI (82.5%). Overall, both models were deemed not suitable for clinical applications. In the future, researchers wished to optimize in areas other than tooth identification, procure a wider range of tooth abnormalities for model training, and develop different techniques for further optimization.

Outcomes and Implications

Dental radiography is important in informing clinicians of possible anomalies in a child’s jaw, bone structure, and teeth. Readings of these radiographs contain many easily-missed details and require specific knowledge and experience to correctly interpret. AI assistance can be useful in this matter. Though LLMs have potential, they are limited by potential biases in training data, privacy concerns for patients involved in said data, approval of medical device data collection, safety concerns, and comparisons to deep learning models. While the LLMs in this study showed promise, their accuracy, consistency, and specificity were not clinically acceptable and thus their use in clinical practice will require further testing and optimization.

Our mission is to

Connect medicine with AI innovation.

No spam. Only the latest AI breakthroughs, simplified and relevant to your field.

Our mission is to

Connect medicine with AI innovation.

No spam. Only the latest AI breakthroughs, simplified and relevant to your field.

Our mission is to

Connect medicine with AI innovation.

No spam. Only the latest AI breakthroughs, simplified and relevant to your field.

AIIM Research

Articles

© 2025 AIIM. Created by AIIM IT Team

AIIM Research

Articles

© 2025 AIIM. Created by AIIM IT Team

AIIM Research

Articles

© 2025 AIIM. Created by AIIM IT Team