Comprehensive Summary
This paper broadly studies whether information about patients’ social determinants of health (specifically housing instability and low income) can be inferred from existing primary care electronic medical record (EMR) data using either regular-expression (REGEX) searches or supervised machine learning. The authors used de-identified EMR data from a large academic primary care practice in Toronto, comparing model predictions against patient-reported housing and income status from a standard “Health Equity Questionnaire” (HEQ). Their findings show that both approaches performed poorly overall: for housing instability, the machine learning model had sensitivity of only 3.1% while the REGEX method did somewhat better at 29.0% sensitivity but still low, and for low income status, machine learning sensitivity was 41.7%. Although the machine learning model had reasonable positive predictive value, the low sensitivity implies many patients with social needs would be missed. In the discussion, the authors conclude that these methods are unlikely to be useful in clinical or research settings as currently implemented and underscore that direct collection of SDoH data remains necessary, while noting potential ethical and practical challenges.
Outcomes and Implications
This research is important because social determinants like housing and income are critical drivers of health outcomes and health equity, yet are often missing from EMRs. This limits the ability of clinicians and health systems to identify and address patient needs. The failure of both ML and text-search methods here suggests that reliance on existing EMR data alone is insufficient to ascertain these determinants accurately. Clinically, this means that without improved data capturing or more sophisticated methods, many patients in need may be overlooked; direct patient-reported SDoH tools and better EMR integration remain essential. The authors imply that even with more advanced approaches, clinical implementation will require careful validation, inclusion of patient consent, and possibly system-level changes.