Prediction of membrane protein types using decision tree based techniques

Abstract

newlineMembrane proteins are important type of proteins that work as channels, receptors and energy converters in a living cell. Essential functions of proteins are related to their types. In drug discovery for many diseases membrane proteins are the attractive targets. So predicting membrane protein types is a challenging research field in bioinformatics. Customary biophysical techniques are immensely tedious, costly and prone to errors due to the vast exploration of unpredicted protein sequences in databases. So building an effective system to predict membrane protein types is very much essential. Membrane proteins are of eight types. Three benchmark datasets are available with all eight membrane protein types. They are imbalanced in nature. Decision tree classifiers are the one that can handle the imbalanced datasets and large datasets well. So the performances of various decision tree classifiers are analysed with an existing feature set Composite Protein Sequence Representation (CPSR) for Dataset 1. Then a new set of features are proposed and named as Exchange Group Based Protein Sequence Representation (EGBPSR). The classifiers performed well with existing feature set is also analysed for proposed feature set for Dataset 1. From the analysis, it is observed that Random forest classifier accomplishes ably in less time with good accuracy of 96.35% and 96.45% for the existing and proposed feature set respectively. With less number of feature dimensions than the existing feature set and thereby with reduced complexity, the Random forest classifier with the proposed feature set is also able to perform better than the existing feature set. Also the accuracies obtained in this work are 3.6% to 3.7% more than the highest accuracy obtained in the existing works.

Description

Keywords

Citation

item.page.endorsement

item.page.review

item.page.supplemented

item.page.referenced