Comprehensive Summary
This study, presented by Hashtarkhani et al., examined the application of machine learning models (ML), specifically natural language processing models (NLP), on determining cancer diagnoses based on electronic health records structured as ICD-codes and free text. Using patient records (n = 3456), the researchers pulled 762 distinct diagnoses that were categorized into 14 classes of which 326 were ICD-coded and 436 were free text (i.e. doctors charting). GPT-3.5, GPT-4o, Llama 3.2, Gemini 1.5, BioBERT were machine learning models tested on the EHR data while oncology experts also evaluated the outputs. In terms of ICD-codes, BioBERT had the highest mean precision and recall score (F1 = 84.2) while GPT-4o had a higher performance than BioBERT (weighted F1 = 71.8, weighted F1 = 61.5) when categorizing free-text diagnoses. BioBERT’s accuracy for both ICD-codes and free-text were 90.8% and 81.6% respectively. Hashtarkhani et. al note that while the cancer diagnosis categorization by AI/NLP models could be used for administrative tasks or research, the technology is not yet so advanced as to be used in clinical practice.
Outcomes and Implications
An aid for cancer diagnosis or a reliable tool to further assist physicians in classifying cancers is a positive contribution to the medical field and can help further research initiatives. However, EHR data is often inconsistent and documentation often is unique to each physician thus AI/NLM models could so far be unreliable for use on a mass clinical basis.