Enhanced Prediction of Animal Toxins using Intuitionistic Fuzzy Rough Feature Selection Technique followed by SMOTE
Abstract
The toxins found in venomous animals are small peptides of disulphide-rich class. These toxins are widely utilized as therapeutic agents and pharmacological tools in medicine due to their high specificity for targets. Prediction of these toxin proteins is an interesting research area for the pharmacological and therapeutic researchers. Various machine learning techniques can offer an efficient and effective way to solve such problems. Three aspects namely: feature selection, class imbalance, and selection of appropriate learning algorithms, play the vital role in enhancing the prediction performance. In this paper, we present a new methodology to improve the prediction performance of animal toxin proteins that not only selects optimal feature subsets but also prevents misclassification occurring due to noise. Firstly, intuitionistic fuzzy rough set based feature selection technique is employed that fits the data well and prevents misclassification using atom search heuristic. Then, SMOTE (Synthetic minority oversampling technique) is applied as an oversampling technique to convert imbalanced datasets into optimally balanced datasets. Moreover, various learning algorithms are applied on the reduced optimally balanced dataset of the toxin. An accuracy of 89.2% is achieved by RealAdaBoost with RandomForest classifier. From the experimental results, it can be visualized that proposed methodology has significantly enhanced prediction performance and is outperforming the existing models. Keywords: Feature Selection, Imbalanced Dataset, SMOTE and Intuitionistic Fuzzy Rough Set. © 2021 IEEE.