PediaBench: A Comprehensive Chinese Pediatric Dataset for Benchmarking Large Language Models

Back

Pediatrics

PediaBench: A Comprehensive Chinese Pediatric Dataset for Benchmarking Large Language Models

arXIV

Research Authors: Qian Zhang, Yanhao Wang

AIIM Authors: Amanuael Yigzaw, Zaid Shehryar

Approved by President Reda Riffi

Publication Date: Feb 28, 2025

Comprehensive Summary

The study by Zhang et al. introduces PediaBench, a dataset designed to evaluate the performance of large language models (LLMs) in answering pediatric medical questions. The dataset encompasses 12 pediatric disease groups and includes both objective and subjective questions. Researchers tested 20 LLMs, scoring them based on accuracy for objective questions and comparing subjective question responses with human scoring using GPT-4o. The results revealed that most LLMs failed to achieve the passing score of 60, with the highest score being 75.74 out of 100. Medical LLMs underperformed due to inadequate reasoning and writing skills, as well as poor instruction-following capabilities. Interestingly, smaller LLMs sometimes outperformed larger ones, possibly due to insufficient training on Chinese medical content in larger models. The study suggests that enhancing medical knowledge training and employing tools like Retrieval-Augmented Generation (RAG) and Chain-of-Thought (CoT) prompting can improve LLM performance.

Outcomes and Implications

The findings underscore the necessity for improved training of AI models in medical knowledge to ensure accurate responses to pediatric medical inquiries. Given the complexity of pediatric cases compared to adult cases, the study highlights the importance of refining LLMs to prevent inaccurate answers that could jeopardize patient safety. PediaBench, with its clinically relevant dataset sourced from the Chinese National Medical Licensing Examination, serves as a foundation for developing LLMs that can assist medical professionals. Such advancements could enhance diagnostic accuracy, save time, and improve patient outcomes in the Chinese healthcare system.

Our mission is to

Connect medicine with AI innovation.

No spam. Only the latest AI breakthroughs, simplified and relevant to your field.

Our mission is to

Connect medicine with AI innovation.

No spam. Only the latest AI breakthroughs, simplified and relevant to your field.

Our mission is to

Connect medicine with AI innovation.

No spam. Only the latest AI breakthroughs, simplified and relevant to your field.