Machine-Learning Classification Models to Predict Liver Cancer with Explainable AI to Discover Associated Genes-Reference-Cited by-同舟云学术

Machine-Learning Classification Models to Predict Liver Cancer with Explainable AI to Discover Associated Genes

Published:2023-05-12 Issue:2 Volume:3 Page:417-445
ISSN:2673-9909
Container-title:AppliedMath
language:en
Short-container-title:AppliedMath

Author:

Hasan Md Easin¹^ORCID,Mostafa Fahad²^ORCID,Hossain Md S.²^ORCID,Loftin Jonathon³

Affiliation:

1. Department of Mathematical Sciences, The University of Texas at El Paso, El Paso, TX 79968, USA

2. Department of Mathematics and Statistics, Texas Tech University, Lubbock, TX 79409, USA

3. Department of Mathematics and Computer Sciences, Southern Arkansas University, Magnolia, AR 71730, USA

Abstract

Hepatocellular carcinoma (HCC) is the primary liver cancer that occurs the most frequently. The risk of developing HCC is highest in those with chronic liver diseases, such as cirrhosis brought on by hepatitis B or C infection and the most common type of liver cancer. Knowledge-based interpretations are essential for understanding the HCC microarray dataset due to its nature, which includes high dimensions and hidden biological information in genes. When analyzing gene expression data with many genes and few samples, the main problem is to separate disease-related information from a vast quantity of redundant gene expression data and their noise. Clinicians are interested in identifying the specific genes responsible for HCC in individual patients. These responsible genes may differ between patients, leading to variability in gene selection. Moreover, ML approaches, such as classification algorithms, are similar to black boxes, and it is important to interpret the ML model outcomes. In this paper, we use a reliable pipeline to determine important genes for discovering HCC from microarray analysis. We eliminate redundant and unnecessary genes through gene selection using principal component analysis (PCA). Moreover, we detect responsible genes with the random forest algorithm through variable importance ranking calculated from the Gini index. Classification algorithms, such as random forest (RF), naïve Bayes classifier (NBC), logistic regression, and k-nearest neighbor (kNN) are used to classify HCC from responsible genes. However, classification algorithms produce outcomes based on selected genes for a large group of patients rather than for specific patients. Thus, we apply the local interpretable model-agnostic explanations (LIME) method to uncover the AI-generated forecasts as well as recommendations for patient-specific responsible genes. Moreover, we show our pathway analysis and a dendrogram of the pathway through hierarchical clustering of the responsible genes. There are 16 responsible genes found using the Gini index, and CCT3 and KPNA2 show the highest mean decrease in Gini values. Among four classification algorithms, random forest showed 96.53% accuracy with a precision of 97.30%. Five-fold cross-validation was used in order to collect multiple estimates and assess the variability for the RF model with a mean ROC of 0.95±0.2. LIME outcomes were interpreted for two random patients with positive and negative effects. Therefore, we identified 16 responsible genes that can be used to improve HCC diagnosis or treatment. The proposed framework using machine-learning-classification algorithms with the LIME method can be applied to find responsible genes to diagnose and treat HCC patients.

Publisher

MDPI AG

Link

https://www.mdpi.com/2673-9909/3/2/22/pdf

Reference67 articles.

1. Epidemiology of hepatocellular carcinoma in the United States: Where are we? Where do we go?;Kanwal;Hepatology,2014

2. Cancer metastases: Challenges and opportunities;Guan;Acta Pharm. Sin. B,2015

3. A unique metastasis gene signature enables prediction of tumor relapse in early-stage hepatocellular carcinoma patients;Roessler;Cancer Res.,2010

4. Integrative genomic identification of genes on 8p associated with hepatocellular carcinoma progression and patient survival;Roessler;Gastroenterology,2012

5. Integrative genomics identifies YY1AP1 as an oncogenic driver in EpCAM+ AFP+ hepatocellular carcinoma;Zhao;Oncogene,2015

Cited by 5 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. An Integrated Machine Learning Framework Based Liver Disease Diagnosis System;2024 International Conference on Engineering & Computing Technologies (ICECT);2024-05-23

2. Feature extraction-based liver tumor classification using Machine Learning and Deep Learning methods of computed tomography images;Cogent Engineering;2024-04-25

3. A perfectly imperfect engine: Utilizing the digital twin paradigm in pulmonary hypertension;Pulmonary Circulation;2024-04

4. Explainable AI for Discovering Disease Biomarkers: A Survey;EAI/Springer Innovations in Communication and Computing;2024

5. Bibliometric analysis of the global scientific production on machine learning applied to different cancer types;Environmental Science and Pollution Research;2023-08-11