Machine Learning Based Efficient Approach for Classification of Cancer Disease
Loading...
Date
item.page.authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
vi
newlineABSTRACT
newlineThe application of machine learning techniques in healthcare, especially for cancer diagnosis, continues to have a transformative impact on improving early detection and effective treatments. Microarray technology analyzes gene expression profiles to classify and predict cancer, discover biomarkers, and predict patient outcomes. This research addresses the challenges posed by high-dimensional microarray data, which provides a detailed genetic profile of cancer tissues but complicates the effectiveness of traditional ML models. To overcome these challenges, this study proposes several novel ML-based methods to improve diagnostic accuracy.
newlineThe research begins by addressing the high dimensionality of the data set through advanced feature selection and optimization techniques. In the first work, we have proposed a hybrid model on a binary class dataset including Recursive Feature Elimination (RFE), Ant Colony Optimization (ACO) and Random Forest (RF) techniques, which achieves an accuracy of 97.8%. The next proposed hybrid model combines Principal Component Analysis (PCA) for dimensionality reduction, Maximum Relevance Minimum Redundancy (MRMR) for feature selection, Particle Swarm Optimization (PSO) for feature subset optimization, and Support Vector Machine (SVM) for classification, achieving 98.2% accuracy while incorporating Explainable AI (XAI) techniques like SHAP values and LIME explanations to enhance interpretability by quantifying feature contributions and providing instance-level decision insights, ensuring both high performance and clinical transparency for trustworthy cancer diagnosis.
newlineFurther, an ensemble model is proposed incorporating Correlation Feature Selection (CFS), Improved Grey Wolf Optimizer (IGWO), and classifiers such as SVM, Multi-layer Perceptron (MLP), and K-Nearest Neighbors (KNN) on binary microarray data, and the proposed model achieves an accuracy of 99.01%. Then another ensemble approach is proposed, incorporating both Ant Lion Optimization and Ant Colony Optimization. In this
newlinevii
newlinevoting and averaging method has been utilized for the ensemble model, which obtains the maximum accuracy of 99.08%.
newlineTo address multiclass datasets, a new ensemble ML model is proposed, prioritizing the accuracy parameter when selecting algorithms. The proposed model MINMAXSALP leverages Salp Swarm Optimization Algorithms (SSA), MRMR, and five classifiers (SVM, Random Forest, Extreme Learning Machine, AdaBoost, and XGBoost) with a majority voting ensemble approach. This approach attains an accuracy rate of 96.59% in multiclass cancer datasets. Then another BIMSSA model has been proposed for the selection and classification of features in high-dimensional microarray data. Incorporating Boruta along with improved MRMR for feature selection and then using SSA for feature set optimization, BIMSSA achieves an accuracy of 97.1%. The proposed model is evaluated on microarray datasets on four cancer types: Adult Acute Lymphoblastic Leukemia and Acute Myelogenous Leukemia (ALL-AML), Lymphoma, Mixed-Lineage Leukemia (MLL), and Small Round Blue Cell Tumors (SRBCT). Comprehensive evaluations using metrics such as accuracy, specificity, sensitivity, precision, recall, and F1-score demonstrate the effectiveness of these models in classifying complex cancer datasets. This research contributes to the advancement of ML-based diagnosis methods, providing accurate, efficient solutions for cancer detection and setting a benchmark for future developments in automated healthcare systems.
newlineKeywords: Cancer, Microarray data, SSS, RFE, Ant Colony Optimization, Ant Lion Optimization, Machine Learning, Correlation Feature Selection, Improved Grey Wolf optimizer
newline