Identifying And Handling Class Imbalance Problem In Machine Learning
Loading...
Date
item.page.authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The work done in the thesis is all about the meta-learning approach of ensemble
newlinelearning using FR-SMOTE. The meta-learning ensemble aims to improve the
newlinepredictive accuracy by combining the prediction of multiple heterogeneous
newlineclassification models. The meta-learning methods used in the thesis are Stacked
newlineGeneralization (Stacking), Cascade Generalization, Grading, and some variants of
newlineStacking. This thesis mainly focuses on analysing a stacked generalisation algorithm
newlinefor binary classification data. We also included the effect of machine learning issues
newlineon Stacking, like the Class Imbalance problem and diversity of base learners.
newlineThe results of these meta-learning methods corroborate that the Stacking method s
newlineperformance is increased as compared to base learners and other meta-learning
newlinemethods. We also presented that out of every 10 datasets, the performance of 8
newlinedatasets is improved by using Stacking. We have also included the generalisation of
newlinemultiple MLP classifiers and Meta-AdaBoost, which uses the Stacking method with
newlineAdaBoost as base learners. At the end of the thesis, the effect of the number of base
newlinelearners is described in which we have compared the Stacking method with a number
newlineof base learners and XGBoost and AdaBoost as meta-learners. This comparison states
newlinethat fewer base learners are preferable for implementation, and XGBoost gives better
newlineresults than AdaBoost. Then, we implemented FRSMOTE in the ensemble learning,
newlinechecked the accuracy, precision, ROC and F1 score, and compared it with others.
newlineData was collected from Koggle, and after receiving the raw dataset, it was passed
newlinethrough the DCU and DIU. Then, feature extraction and FR-SMOTE are used to
newlinehandle the class imbalance technique. Some traditional machine learning algorithms
newlineare used, including bagging and K-fold cross-validation. Then, the results are
newlineanalysed in terms of the Accuracy, Precision, ROC, and F1 Score.
newline