Combining CNN streams of dynamic image and depth data for action recognition

Singh R.; Khurana R.; Kushwaha A.K.S.; Srivastava R.

doi:https://doi.org/10.1007/s00530-019-00645-5

Combining CNN streams of dynamic image and depth data for action recognition

Authors

Singh R.; Khurana R.; Kushwaha A.K.S.; Srivastava R.

Abstract

RGB-D sensors have been in great demand due to its capability of producing large amount of multimodal data like RGB images and depth maps, useful for better training of deep learning models. In this paper, a deep learning model for recognizing human activities in a video sequence by combining multiple CNN streams has been proposed. The proposed work comprises the use of dynamic images generated from RGB images and depth map for three different dimensions. The proposed model is trained using these four streams on VGG Net for action recognition purpose. Further, it is evaluated and compared with the other state-of-the-art methods available in literature, on three challenging datasets, namely MSR daily Activity, UTD MHAD and CAD 60, in terms of accuracy, error, recall, specificity, precision and f-score. From obtained results, it has been observed that the proposed method outperforms other methods. © 2020, Springer-Verlag GmbH Germany, part of Springer Nature.