Identifying And Handling Class Imbalance Problem In Machine Learning

Abstract

The work done in the thesis is all about the meta-learning approach of ensemble newlinelearning using FR-SMOTE. The meta-learning ensemble aims to improve the newlinepredictive accuracy by combining the prediction of multiple heterogeneous newlineclassification models. The meta-learning methods used in the thesis are Stacked newlineGeneralization (Stacking), Cascade Generalization, Grading, and some variants of newlineStacking. This thesis mainly focuses on analysing a stacked generalisation algorithm newlinefor binary classification data. We also included the effect of machine learning issues newlineon Stacking, like the Class Imbalance problem and diversity of base learners. newlineThe results of these meta-learning methods corroborate that the Stacking method s newlineperformance is increased as compared to base learners and other meta-learning newlinemethods. We also presented that out of every 10 datasets, the performance of 8 newlinedatasets is improved by using Stacking. We have also included the generalisation of newlinemultiple MLP classifiers and Meta-AdaBoost, which uses the Stacking method with newlineAdaBoost as base learners. At the end of the thesis, the effect of the number of base newlinelearners is described in which we have compared the Stacking method with a number newlineof base learners and XGBoost and AdaBoost as meta-learners. This comparison states newlinethat fewer base learners are preferable for implementation, and XGBoost gives better newlineresults than AdaBoost. Then, we implemented FRSMOTE in the ensemble learning, newlinechecked the accuracy, precision, ROC and F1 score, and compared it with others. newlineData was collected from Koggle, and after receiving the raw dataset, it was passed newlinethrough the DCU and DIU. Then, feature extraction and FR-SMOTE are used to newlinehandle the class imbalance technique. Some traditional machine learning algorithms newlineare used, including bagging and K-fold cross-validation. Then, the results are newlineanalysed in terms of the Accuracy, Precision, ROC, and F1 Score. newline

Description

Keywords

Citation

item.page.endorsement

item.page.review

item.page.supplemented

item.page.referenced