Multi-Dimensional Validation of the Integration of Syntactic and Semantic Distance Measures for Clustering Fibromyalgia Patients in the Rheumatic Monitor Big Data Study

Author:

Goldstein Ayelet1ORCID,Shahar Yuval2ORCID,Weisman Raymond Michal2,Peleg Hagit3,Ben-Chetrit Eldad3,Ben-Yehuda Arie4,Shalom Erez2,Goldstein Chen5,Shiloh Shmuel Shay5,Almoznino Galit56ORCID

Affiliation:

1. Computer Science Department, Hadassah Academic College, Jerusalem 9101001, Israel

2. Medical Informatics Research Center, Department of Software and Information Systems Engineering, Ben Gurion University of the Negev, Beer Sheva 8410501, Israel

3. Rheumatology Unit, Hadassah Medical Center, Jerusalem 9112102, Israel

4. Division of Internal Medicine, Hadassah Medical Center, Jerusalem 9112102, Israel

5. Faculty of Dental Medicine, Hebrew University of Jerusalem, Israel; Big Biomedical Data Research Laboratory, Dean’s Office, Hadassah Medical Center, Jerusalem 91120, Israel

6. Department of Oral Medicine, Sedation & Maxillofacial Imaging, Hadassah Medical Center, Faculty of Dental Medicine, Hebrew University of Jerusalem, Jerusalem 91120, Israel

Abstract

This study primarily aimed at developing a novel multi-dimensional methodology to discover and validate the optimal number of clusters. The secondary objective was to deploy it for the task of clustering fibromyalgia patients. We present a comprehensive methodology that includes the use of several different clustering algorithms, quality assessment using several syntactic distance measures (the Silhouette Index (SI), Calinski–Harabasz index (CHI), and Davies–Bouldin index (DBI)), stability assessment using the adjusted Rand index (ARI), and the validation of the internal semantic consistency of each clustering option via the performance of multiple clustering iterations after the repeated bagging of the data to select multiple partial data sets. Then, we perform a statistical analysis of the (clinical) semantics of the most stable clustering options using the full data set. Finally, the results are validated through a supervised machine learning (ML) model that classifies the patients back into the discovered clusters and is interpreted by calculating the Shapley additive explanations (SHAP) values of the model. Thus, we refer to our methodology as the clustering, distance measures and iterative statistical and semantic validation (CDI-SSV) methodology. We applied our method to the analysis of a comprehensive data set acquired from 1370 fibromyalgia patients. The results demonstrate that the K-means was highly robust in the syntactic and the internal consistent semantics analysis phases and was therefore followed by a semantic assessment to determine the optimal number of clusters (k), which suggested k = 3 as a more clinically meaningful solution, representing three distinct severity levels. the random forest model validated the results by classification into the discovered clusters with high accuracy (AUC: 0.994; accuracy: 0.946). SHAP analysis emphasized the clinical relevance of "functional problems" in distinguishing the most severe condition. In conclusion, the CDI-SSV methodology offers significant potential for improving the classification of complex patients. Our findings suggest a classification system for different profiles of fibromyalgia patients, which has the potential to improve clinical care, by providing clinical markers for the evidence-based personalized diagnosis, management, and prognosis of fibromyalgia patients.

Funder

Israeli Ministry of Innovation, Science and Technology

Publisher

MDPI AG

Subject

Bioengineering

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3