Opening the black box: interpretable machine learning for predictor finding of metabolic syndrome

Author:

Zhang Yan,Zhang Xiaoxu,Razbek Jaina,Li Deyang,Xia Wenjun,Bao Liangliang,Mao Hongkai,Daken Mayisha,Cao Mingqin

Abstract

Abstract Objective The internal workings ofmachine learning algorithms are complex and considered as low-interpretation "black box" models, making it difficult for domain experts to understand and trust these complex models. The study uses metabolic syndrome (MetS) as the entry point to analyze and evaluate the application value of model interpretability methods in dealing with difficult interpretation of predictive models. Methods The study collects data from a chain of health examination institution in Urumqi from 2017 ~ 2019, and performs 39,134 remaining data after preprocessing such as deletion and filling. RFE is used for feature selection to reduce redundancy; MetS risk prediction models (logistic, random forest, XGBoost) are built based on a feature subset, and accuracy, sensitivity, specificity, Youden index, and AUROC value are used to evaluate the model classification performance; post-hoc model-agnostic interpretation methods (variable importance, LIME) are used to interpret the results of the predictive model. Results Eighteen physical examination indicators are screened out by RFE, which can effectively solve the problem of physical examination data redundancy. Random forest and XGBoost models have higher accuracy, sensitivity, specificity, Youden index, and AUROC values compared with logistic regression. XGBoost models have higher sensitivity, Youden index, and AUROC values compared with random forest. The study uses variable importance, LIME and PDP for global and local interpretation of the optimal MetS risk prediction model (XGBoost), and different interpretation methods have different insights into the interpretation of model results, which are more flexible in model selection and can visualize the process and reasons for the model to make decisions. The interpretable risk prediction model in this study can help to identify risk factors associated with MetS, and the results showed that in addition to the traditional risk factors such as overweight and obesity, hyperglycemia, hypertension, and dyslipidemia, MetS was also associated with other factors, including age, creatinine, uric acid, and alkaline phosphatase. Conclusion The model interpretability methods are applied to the black box model, which can not only realize the flexibility of model application, but also make up for the uninterpretable defects of the model. Model interpretability methods can be used as a novel means of identifying variables that are more likely to be good predictors.

Funder

National Natural Science Foundation of China

Publisher

Springer Science and Business Media LLC

Subject

General Medicine,Endocrinology, Diabetes and Metabolism

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3