An efficient method for autoencoder based outlier detection
Abstract
Unsupervised Learning is widely used approach for outlier detection because non-availability of training dataset in various domains (especially, in evolving domains). Approaches like clustering-based, distance-based, density-based outlier detection methods have been proposed over the last several years. Recently, outlier detection using deep learning has drawn attention of researchers. Deep learning-based unsupervised techniques (autoencoder) minimize the reconstruction error using each data instance in the dataset and subsequently, data points with higher reconstruction error are treated as outlier points. However, autoencoder based model overestimates the reconstruction error for normal points whereas it is underestimated for outlier points. As a result, genuine outliers are missed by this approach. We propose two techniques to address the issue of reconstruction error stated earlier. Main idea of our techniques is to compute reconstruction error only using ‘normal points’. In the proposed techniques, we identify probable outliers utilizing the clustering approaches intelligently and subsequently, we do not include them in the minimization process of reconstruction error. We exploit recently recognized clustering approach Density Peak Clustering (DPC) to identify the probable outlier points based on density and distance to the higher density points. However, DPC has inherent drawback of setting threshold which plays important role in deciding density. Therefore, Self Organizing Map (SOM) is exploited as another clustering approach in this article. Subsequently, we conducted experiments on synthetic as well as real world datasets and the results show that the proposed technique outperforms the popular existing deep learning model like RandNet, Boosting-based Autoencoder Ensemble method (BAE), One Class Support Vector Machine (OCSVM) and density-based algorithms like LOF, LDOF, INFLO, and RDOS. © 2022