Abstract
This paper presents a comprehensive study on the classification of paclitaxel-resistant cell lines based on gene expression analysis and machine learning algorithms. The dataset used in this study was obtained from the NCBI - GEO datasets, comprising three datasets that included gene expression profiles of four paclitaxel-resistant cell lines: BAS, HS578T, MCF7, and MDA-MB-231. The gene expression data was preprocessed by converting gene identifiers to gene symbols and calculated adjusts p-value, t-test, B-test and logFC using R and also added cell lines. Subsequently, various machine learning classifiers, including Random Forest, Support Vector Machine (SVM), Gaussian Naive Bayes, K-Nearest Neighbors (KNN), Decision Tree, and AdaBoost, were employed to classify the paclitaxel-resistant cell lines. The performance of the classifiers was evaluated using accuracy scores and confusion matrices.The performance of these classifiers was assessed through accuracy scores and confusion matrices. Our results demonstrated that Random Forest and SVM achieved the highest accuracy scores, outperforming other algorithms. These findings suggest the potential of gene expression data and machine learning approaches in accurately classifying paclitaxel-resistant cell lines, which can aid in predicting drug resistance and developing targeted therapies for breast cancer treatment.