Classification of enzyme functional classes and subclasses using support vector machine
Loading...
Date
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Enzymes play an important role in metabolism that helps in catalyzing bio-chemical reactions. Predicting functions of enzymes by experiments is costly and time consuming. Hence a computational method is required to predict the function of enzymes. This paper presents a supervised machine learning approach to predict the functional classes and subclass of protein sequences including enzymes and non-enzymes based on 857 sequence derived features. This paper used seven sequence derived properties including amino acid composition, dipeptide composition, correlation feature, composition, transition, distribution and pseudo amino acid composition. We have used recursive feature elimination technique (RFE), in order to select optimal number of features. The support vector machine (SVM) has been used to construct a three level model with optimal number of features selected by SVM-RFE, where top (first) level distinguish a query protein as an enzyme or nonenzyme, the next (second) level predicts the enzyme functional class and the last (third) level predict the subfunctional class. The proposed model reported overall accuracy of 97.6%, precision of 97.8%and Matthew Correlation Coefficient (MCC) value of 0.93 for the first level, whereas accuracy of 87.3%, precision of 87.7% and MCC value of 0.84 for second level and accuracy of 85.6%, precision of 87.9% and MCC value of 0.86 for the third level. © 2015 IEEE.