Ensemble Approach for Breast Cancer Prediction and Treatment Recommendation System using Clinico Pathological Parameters
Loading...
Date
item.page.authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Breast cancer is a significant global health concern, accounting for a substantial number of deaths among women. Early detection and accurate diagnosis are essential for improving patient survival rates and machine learning techniques offer valuable tools for enhancing tumour identification and classification.
newlineThis study identifies key datasets and parameters relevant to breast cancer and its treatment, focusing on six datasets: OWBCD, WDBC, Coimbra, BRCA, Haberman and SEER. Using the Random Forest classifier, feature importance analysis is conducted to pinpoint the most influential factors. Six machine learning models Logistic Regression, Support Vector Machine, K-Nearest Neighbors, Naïve Bayes, Decision Tree and Random Forest are optimized using Grid Search Hyperparameter Optimization (GSHPO) to enhance their predictive performance. The preprocessing pipeline involves feature elimination, handling of missing values, categorical encoding and data standardization. Findings reveal that hyperparameter tuning and feature importance significantly boosts model accuracy, with Random Forest, SVM and KNN consistently showing the highest performance. These results underscore the critical role of hyperparameter optimization and feature importance in improving predictive outcomes for breast cancer diagnosis and treatment.
newlineFurther, this study explores an ensemble approach to enhance prediction accuracy in breast cancer classification on WDBC (Wisconsin Breast Cancer Diagnostic) dataset. This approach integrates Principal Component Analysis (PCA) for dimensionality reduction and SMOTE for addressing class imbalance.