Comprehensive Summary
Natural language processing (NLP) can extract valuable insights from clinical notes, aiding in developing personalized physical rehabilitation plans. Sivarajkumar et al. developed NLP algorithms to extract rehabilitation exercise information from clinical notes of stroke patients. First, these rehabilitation exercises were represented through a comprehensive clinical ontology. Then, 3 clinical experts annotated 300 randomly selected sections of clinical notes, divided into training and testing datasets. Finally, they developed rules-based NLP algorithms, machine learning–based NLP algorithms, and large language model (LLM)–based NLP algorithms, and evaluated their ability to extract exercise-related information. Overall, the rules- based NLP algorithm performed best at retrieving exercise concepts, particularly duration, sets, and reps. Gradient boosting was the most effective machine-learning model, excelling in range of motion and location detection. LLM-based approaches achieved high recall but low precision and rarely outperformed simpler algorithms. Inaccuracies in the training dataset contributed to the algorithm’s misclassifications, which might be mitigated through clinical notes’ pre-processing. More than half of the concepts were omitted due to infrequent annotation, while lack of context, abbreviations, and inconsistent numerical reporting further complicated data extraction. Although refining the rules could address some of these challenges, it also risks overfitting the training set. Future improvements should incorporate more patient variables, expand exercise concepts, diversify data sources accounting for varied styles, and ensure extensive algorithm validation.
Outcomes and Implications
Rehabilitation relies on personalized physical therapy plans, which could be assisted by predictive models evaluating treatment options and outcomes. Extracting relevant information from clinical notes using NLP techniques is a step towards developing such tools. This study reports the development and performance of several NLP algorithms in extracting exercise concepts from stroke patients’ clinical notes. While different approaches excelled in particular concepts, the rules-based NLP algorithm performed best overall. Future developments will include additional patient variables, exercise concepts, and diversifying data sources. The goal is to implement a robust standardized extraction protocol with consistency and completeness checks ensuring reliability and accuracy.