Pediatrics

Comprehensive Summary

The purpose of this study was to train AI models to interpret uroflowmetry (UF) curves, evaluate their effectiveness, and examine rates of interobserver agreement between pediatric urologists. The uroflowmetry data of 368 patients between the ages of 4 and 17 from a clinic at Marmara University in Istanbul, Turkey were included in this study, depending on if they were able to void more than 50% of expected bladder capacity during UF, had a normal curve in their UF graph, fully responded to commands to void, and did not have a neurological disorder. Many models (Decision Tree, Random Forest, CatBoost, XGBoost, LightGBM) were chosen for this study because of their high prediction accuracy and interpretability and trained with data acquired from independent classification of UF curves by 3 pediatric urologists into 5 curve categories recommended by the International Children’s Continence Society (ICCS): bell, staccato, interrupted, plateau, and tower-shaped. These urologists, when consensus was not reached independently, met to reach consensus on 189 of the total tests utilized in the study, and reached consensus with each other on 187 of those tests. 80% of this data was used to train the AI models while 20% was used for testing. Uroflowmetry data included voiding time, volume, maximum flow rate, average flow rate, and other derived features in addition to other data like age, gender, voiding diary, constipation status, history of febrile UTIs, use of anticholinergic treatment, and score received from the Dysfunctional Voiding and Incontinence Scoring System (DVISS). It was found that XGBoost was the most accurate model with an accuracy of 85.00 ± 2.90%, while the least accurate was DecisionTree with an accuracy of 81.40 ± 1.47%. All models had the highest accuracy rate in identifying the interrupted voiding pattern, and had the lowest accuracy with plateau/tower voiding patterns. As for inter-observer agreement between the pediatric urologists, it was noted that 62.2% of tests had independent agreement and a Fleiss’ k value of 0.608 (-1 being agreement is worse than chance and +1 being perfect agreement) indicating that inter-observer agreement was moderate to substantial. Limitations of this study included the limited training data for patterns other than normal and staccato, a lack of an independent observer for consensus meetings between the 3 pediatric urologists, and the use of the ICCS shapes when simple curves prove to have better inter-observer agreement.

Outcomes and Implications

This study is clinically relevant because many school-aged children are negatively affected by lower urinary tract symptoms (LUTS) and a low cost, noninvasive test like UF can help diagnose and guide management of these symptoms. Although this is a common test, there are many inconsistencies in interpreting voiding patterns, in both intraobserver and interobserver settings, indicating the need to improve their objective analysis. This study shows that AI can likely improve both accuracy and consistency in analyzing these voiding patterns, thus improving efficacy of care for pediatric patients with LUTS. No timeline on future implementation is provided, however, the authors note that exploring visual based techniques and developing more user and budget friendly models may increase AI quality in this context even further.

Our mission is to

Connect medicine with AI innovation.

No spam. Only the latest AI breakthroughs, simplified and relevant to your field.

Our mission is to

Connect medicine with AI innovation.

No spam. Only the latest AI breakthroughs, simplified and relevant to your field.

Our mission is to

Connect medicine with AI innovation.

No spam. Only the latest AI breakthroughs, simplified and relevant to your field.

AIIM Research

Articles

© 2025 AIIM. Created by AIIM IT Team

AIIM Research

Articles

© 2025 AIIM. Created by AIIM IT Team

AIIM Research

Articles

© 2025 AIIM. Created by AIIM IT Team