Characterizing the limitations of using diagnosis codes in the context of machine learning for healthcare-Reference-Cited by-同舟云学术

Characterizing the limitations of using diagnosis codes in the context of machine learning for healthcare

Published:2024-02-14 Issue:1 Volume:24 Page:
ISSN:1472-6947
Container-title:BMC Medical Informatics and Decision Making
language:en
Short-container-title:BMC Med Inform Decis Mak

Author:

Guo Lin Lawrence,Morse Keith E.,Aftandilian Catherine,Steinberg Ethan,Fries Jason,Posada Jose,Fleming Scott Lanyon,Lemmon Joshua,Jessa Karim,Shah Nigam,Sung Lillian

Abstract

Abstract Background Diagnostic codes are commonly used as inputs for clinical prediction models, to create labels for prediction tasks, and to identify cohorts for multicenter network studies. However, the coverage rates of diagnostic codes and their variability across institutions are underexplored. The primary objective was to describe lab- and diagnosis-based labels for 7 selected outcomes at three institutions. Secondary objectives were to describe agreement, sensitivity, and specificity of diagnosis-based labels against lab-based labels. Methods This study included three cohorts: SickKids from The Hospital for Sick Children, and StanfordPeds and StanfordAdults from Stanford Medicine. We included seven clinical outcomes with lab-based definitions: acute kidney injury, hyperkalemia, hypoglycemia, hyponatremia, anemia, neutropenia and thrombocytopenia. For each outcome, we created four lab-based labels (abnormal, mild, moderate and severe) based on test result and one diagnosis-based label. Proportion of admissions with a positive label were presented for each outcome stratified by cohort. Using lab-based labels as the gold standard, agreement using Cohen’s Kappa, sensitivity and specificity were calculated for each lab-based severity level. Results The number of admissions included were: SickKids (n = 59,298), StanfordPeds (n = 24,639) and StanfordAdults (n = 159,985). The proportion of admissions with a positive diagnosis-based label was significantly higher for StanfordPeds compared to SickKids across all outcomes, with odds ratio (99.9% confidence interval) for abnormal diagnosis-based label ranging from 2.2 (1.7–2.7) for neutropenia to 18.4 (10.1–33.4) for hyperkalemia. Lab-based labels were more similar by institution. When using lab-based labels as the gold standard, Cohen’s Kappa and sensitivity were lower at SickKids for all severity levels compared to StanfordPeds. Conclusions Across multiple outcomes, diagnosis codes were consistently different between the two pediatric institutions. This difference was not explained by differences in test results. These results may have implications for machine learning model development and deployment.

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1186/s12911-024-02449-8.pdf

Reference38 articles.

1. Hong JC, Eclov NCW, Dalal NH, Thomas SM, Stephens SJ, Malicki M, et al. System for high-intensity evaluation during Radiation Therapy (SHIELD-RT): a prospective Randomized Study of Machine Learning–Directed clinical evaluations during Radiation and Chemoradiation. J Clin Oncol. 2020;38(31):3652–61.

2. Escobar GJ, Liu VX, Schuler A, Lawson B, Greene JD, Kipnis P. Automated identification of adults at risk for In-Hospital clinical deterioration. N Engl J Med. 2020;383(20):1951–60.

3. Manz CR, Parikh RB, Small DS, Evans CN, Chivers C, Regli SH, et al. Effect of integrating machine learning mortality estimates with behavioral nudges to clinicians on Serious Illness conversations among patients with Cancer: a stepped-Wedge Cluster Randomized Clinical Trial. JAMA Oncol. 2020;6(12):e204759–e.

4. Guan L, Tian X, Gombar S, Zemek AJ, Krishnan G, Scott R, et al. Big data modeling to predict platelet usage and minimize wastage in a tertiary care system. Proc Natl Acad Sci U S A. 2017;114(43):11368–73.

5. Yelin I, Snitser O, Novich G, Katz R, Tal O, Parizade M, et al. Personal clinical history predicts antibiotic resistance of urinary tract infections. Nat Med. 2019;25(7):1143–52.

Cited by 13 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Detecting and Remediating Harmful Data Shifts for the Responsible Deployment of Clinical AI Models;JAMA Network Open;2025-06-04

2. Broadening the Net: Overcoming Challenges and Embracing Novel Technologies in Lung Cancer Screening;American Society of Clinical Oncology Educational Book;2025-06

3. Editorial Comment on "Association between Kidney Stones and Subsequent Risk of Upper Tract Urothelial Carcinoma: A Systematic Review and Meta-Analysis";Urology;2025-06

4. Safety of LAIV Vaccination in Asthma or Wheeze: A Systematic Review and GRADE Assessment;Pediatrics;2025-04-24

5. Feasibility of Machine Learning Analysis for the Identification of Patients with Possible Primary Ciliary Dyskinesia;2025-04-20