Statistical Comparison of Machine Learning Techniques for Predicting Software Maintainability and Defects
Loading...
Date
item.page.authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Software is one of the key drivers of twenty first century business and society. Delivering high quality software systems is a challenging task for software industry practitioners, managers and developers. High maintainability and low Defect-Proneness are most desirable software quality attributes. Delivering software systems with these desirable quality attributes requires accurate and generalizable prediction models because exhaustive software testing is impossible and testing phase costs are high. The major contribution of this thesis is that a broad range of machine learning techniques have been investigated for building maintainability and defect prediction models. Empirical findings regarding accuracy of machine learning techniques for Software Maintainability and Software Defect Prediction are secured on basis of Statistical Significance Tests. It is demonstrated that instance based machine learning techniques- K-nearest neighbour (IBK) and K-star are accurate and generalizable for predicting Software Maintainability as well as post-release defects. In this thesis, statistical comparison of twenty seven machine learning techniques is performed to find accurate and generalizable techniques for software maintainability prediction. Our statistical comparison is based on Friedman test and Wilcoxon post-hoc tests. Software metrics data of 44 medium to large scale open source software systems are utilized for building Software Defect Prediction Models. Statistical comparison of accuracies of seventeen machine learners over 44 software metrics data sets show that Random Forests. Kstar, Logistic Regression classifiers are robust and stable classifiers in identifying defect-prone modules. Ensemble learners for classification as well as regression are applied for defect prediction. Suitable ensemble classifiers are recommended for fifteen base classifiers in defect prediction. Experiments are conducted on large as well as small data sets to avoid bias of machine learning algorithms towards data set size.