Comparison of the Effectiveness of Various Classifiers for Breast Cancer Detection Using Data Mining Methods

Author:

Al-Qazzaz Noor Kamal1ORCID,Mohammed Iyden Kamil1,Al-Qazzaz Halah Kamal2,Ali Sawal Hamid Bin Mohd34ORCID,Ahmad Siti Anom56ORCID

Affiliation:

1. Department of Biomedical Engineering, Al-Khwarizmi College of Engineering, University of Baghdad, Baghdad 47146, Iraq

2. Department of Biotechnology, College of Science, University of Baghdad, Baghdad 47146, Iraq

3. Department of Electrical, Electronic and Systems Engineering, Faculty of Engineering and Built Environment, Universiti Kebangsaan Malaysia, UKM Bangi, Bangi 43600, Selangor, Malaysia

4. Centre of Advanced Electronic and Communication Engineering, Department of Electrical, Electronic and Systems Engineering, Universiti Kebangsaan Malaysia, UKM Bangi, Bangi 43600, Selangor, Malaysia

5. Department of Electrical and Electronic Engineering, Faculty of Engineering, Universiti Putra Malaysia, UPM Serdang, Serdang 43400, Selangor, Malaysia

6. Malaysian Research Institute of Ageing (MyAgeing)TM, University Putra Malaysia, Serdang 43400, Selangor, Malaysia

Abstract

Countless women and men worldwide have lost their lives to breast cancer (BC). Although researchers from around the world have proposed various diagnostic methods for detecting this disease, there is still room for improvement in the accuracy and efficiency with which they can be used. A novel approach has been proposed for the early detection of BC by applying data mining techniques to the levels of prolactin (P), testosterone (T), cortisol (C), and human chorionic gonadotropin (HCG) in the blood and saliva of 20 women with histologically confirmed BC, 20 benign subjects, and 20 age-matched control women. In the proposed method, blood and saliva were used to categorize the severity of the BC into normal, benign, and malignant cases. Ten statistical features were collected to identify the severity of the BC using three different classification schemes—a decision tree (DT), a support vector machine (SVM), and k-nearest neighbors (KNN) were evaluated. Moreover, dimensionality reduction techniques using factor analysis (FA) and t-stochastic neighbor embedding (t-SNE) have been computed to obtain the best hyperparameters. The model has been validated using the k-fold cross-validation method in the proposed approach. Metrics for gauging a model’s effectiveness were applied. Dimensionality reduction approaches for salivary biomarkers enhanced the results, particularly with the DT, thereby increasing the classification accuracy from 66.67% to 93.3% and 90%, respectively, by utilizing t-SNE and FA. Furthermore, dimensionality reduction strategies for blood biomarkers enhanced the results, particularly with the DT, thereby increasing the classification accuracy from 60% to 80% and 93.3%, respectively, by utilizing FA and t-SNE. These findings point to t-SNE as a potentially useful feature selection for aiding in the identification of patients with BC, as it consistently improves the discrimination of benign, malignant, and control healthy subjects, thereby promising to aid in the improvement of breast tumour early detection.

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Reference49 articles.

1. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries;Sung;CA Cancer J. Clin.,2021

2. Breast Cancer Classification on Multiparametric MRI–Increased Performance of Boosting Ensemble Methods;Vamvakas;Technol. Cancer Res. Treat.,2022

3. Artificial intelligence in the interpretation of breast cancer on MRI;Sheth;J. Magn. Reson. Imaging,2020

4. Cartesian Genetic Programming: Some New Detections;Thi;Advances in Information and Communication, Proceedings of the 2022 Future of Information and Communication Conference (FICC), San Francisco, CA, USA, 3–4 March 2022,2022

5. Detection of breast cancer through clinical data using supervised and unsupervised feature selection techniques;Haq;IEEE Access,2021

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3