Evaluating Prognostic Bias of Critical Illness Severity Scores Based on Age, Gender, and Primary Language in the USA: A Retrospective Multicenter Study

Author:

Liu Xiaoli,Shen Max,Lie Margaret,Zhang Zhongheng,Li Deyu,Liu Chao,Mark Roger,Zhang Zhengbo,Celi Leo Anthony

Abstract

SummaryBackgroundAlthough severity scoring systems are used to support decision making and assess ICU performance, the likelihood of bias based on age, gender, and primary language has not been studied. We aimed to identify the potential bias of them such as Sequential Organ Failure Assessment (SOFA) and Acute Physiology and Chronic Health Evaluation IVa (APACHE IVa) by evaluating hospital mortality across subgroups divided by age, gender, and primary language via two large intensive care unit (ICU) databases.MethodsThis multicenter, retrospective study was conducted using data from the Medical Information Mart for Intensive Care (MIMIC, 2001-2019) database and the electronic ICU Collaborative Research Database (eICU-CRD, 2014-2015). SOFA and APACHE IVa scores were obtained from the first 24 hours of ICU admission. Hospital mortality was the primary outcome. Patients were stratified by age (16-44, 45-64, 64-79, and 80-), gender (female and male), and primary language (English and non-English) then assessed for discrimination and calibration in all subgroups. To evaluate for discrimination, the area under receiver operating characteristic (AUROC) curve and area under precision-recall curve (AUPRC) were used. Standardized mortality ratio (SMR) and calibration belt plot were used to evaluate calibration.FindingsA total of 173,930 patient encounters (78,550 MIMIC and 95,380 eICU-CRD) were studied. Measurements of discrimination performed best for the youngest age ranges and worsened with increasing age (AUROC ranging from 0.812 to 0.673 for SOFA and 0.882 to 0.754 for APACHE IVa, p <0.001). There was a significant difference in discrimination between male and female patients, with female patients performing worse. With MIMIC data, patients whose primary language was not English performed worse than English speaking patients (AUROC ranging 0.771 to 0.709 [p <0.001] for SOFA). Measurements of calibration applied to SOFA showed a statistically significant overestimation of mortality in the youngest patients (SMR 0.55-0.6) and underestimation of mortality in the oldest patients (SMR 1.54-1.57). When using SOFA, mortality is overestimated for male patients (SMR 0.92-0.97) and underestimated for female patients (SMR 1.05-1.11) while mortality is overestimated for English-speaking patients (SMR 0.85) and greatly underestimated for non-English speaking patients (SMR 1.4). In contrast, the calibration applied to APACHE-IVa shows underestimation of mortality for all age groups and genders.InterpretationThe differences in discrimination and calibration with increasing age, female gender, and non-English speaking patients suggest that illness severity scores are prone to bias in their mortality predictions. Caution must be taken when using these illness severity scores for quality benchmarking across ICUs and decision-making for practices among a diverse population.FundingZ.B.Z was funded by the National Natural Science Foundation of China (62171471).Research in contextEvidence before this studyWe searched PubMed, arXiv, and medRxiv from the inception of the database to July 10, 2022, for articles published without language restrictions. The search terms were (illness severity score OR SOFA OR APACHE-II OR APACHE-IV OR SAPS) AND (evaluation OR performance OR bias) AND ((age OR older OR elderly OR 65 years old OR 80 years old OR subgroup) OR (gender OR Female OR male) OR (language speaking OR English speaking)). Multiple studies have explored the performance among their concerned subgroups with limited patients and hospitals such as over 80, older with sepsis, and surgical patients. Although a small number of studies have presented the performance of scores by age groups, they have not systematically examined the differences and bias between younger and older patients in depth. Few articles analyzed the differences between men and women. No study has discussed the evaluation performance between Non-English and English speakers. We identified that no studies have comprehensively reported the potential bias of clinical scores in the assessment of subgroups classified by age, gender, and English-speaking.Added value of this studyTo our best knowledge, we are the first to conduct a systematic bias analysis of the SOFA and APACHE-IVa scores to assess in-hospital outcomes across age (16-44, 45-64, 65-79, and 80-), gender (male and female), and English speaking (Yes and No) subgroups using multicenter data from 189 U.S. hospitals and 173,930 patients episodes. The assessment was performed covering discrimination (AUROC and AUPRC) and calibration (SMR and Calibration belt plot). We found that the AUROCs between the two scores decreased significantly with age. The illness severity exists underestimation for oldest patients and serious overestimation for youngest patients using SOFA score. Both scores demonstrated slightly better AUROCs for males. For Non-English speaking patients, SOFA showed a large reduction in AUROC and very significant underestimation compared to English speakers. Furthermore, there exists higher observed mortality of older patients, females, and Non-English speakers compared to their respective other subgroups using the same SOFA score.Implications of all the available evidenceThe aging of the ICU, especially the extremely rapid growth of patients over 80 years old. They exhibit unique characteristics with more comorbidities, frailty, worse prognosis, and the need for more humanistic care, which has evolved into a serious challenge for early clinical triage, diagnosis, and treatment. Females are more likely to withhold pain and not be transferred to the ICU for treatment, which leads to potentially more critical severity illnesses admitted to ICU compared to males. SOFA and APACHE-IVa scores are very important basis and standards for early ICU assessment of illness severity and decision-making. While these general phenomena were noticed in clinical practice of the mentioned subgroups, there is a lack of clear and detailed quantitative analysis of the bias in the use of these scores to protect these vulnerable populations and prevent potential unintentional harm to them. The U.S. is a multicultural and racially integrated country, and the number of Non-English speakers is rising every year which reflects greater socioeconomic and ethnic disparities. Limited communication can also have an impact on patient assessment and treatment. However, the use of the SOFA score for the evaluation of this group of patients has not been reported to date. In this study, we used multicenter data with a large sample size to identify potential bias using the SOFA and APACHE-IVa scores for all mentioned special groups of patients.

Publisher

Cold Spring Harbor Laboratory

Cited by 1 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3