Enhancing Human Action Recognition in High-Resolution Videos using ConvLSTM and LRCN model
Abstract
The human action recognition problem and its solution is the most demanding nowadays. Using deep learning methods like CNN and RNN increases the effectiveness of systems designed for recognizing human actions. RNNs have been specifically tailored for processing sequential data, while CNNs have traditionally found their use in image analysis. LSTM networks are a subset of RNNs specially trained for data sequence analysis and forecasting which is highly proficient. In our research, we have collected video data of High Definition (HD) from multiple source. We have divided the dataset into 7 categories each categories have at least 30 videos. We have used two of the best deep-learning models here specifically ConvLSTM and LRCN. The ConvLSTM method is similar to LSTM and it incorporates convolutional operations so that data processing and calculations can be carried out simultaneously. On the other hand, LRCN integrates convolutional, recurrent, and LSTM layers into a single architecture. Based on our evaluation, the accuracy of both the ConvLSTM and LRCN models and the LRCN model has given better results with 91% accuracy. To test the performance of our model, we have used human action videos from Youtube in high resolution. © 2023 IEEE.