TS-MDA: two-stream multiscale deep architecture for crowd behavior prediction
Loading...
Date
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
In recent years, crowd behavior prediction (CBP) has gained much attention from academics and helps to control crowd disasters. The CBP has been solved either as one-class classification (OCC) or multi-class classification (MCC) problems. The OCC-based CBP models learn the normal crowd behavior patterns and treat outliers as anomalies or abnormal crowd behaviors. Nevertheless, these models do not consider the differences in anomaly types and interpret them as one class. On the other hand, the MCC-based CBP models overcome such drawbacks. However, very few datasets and models have been proposed. The current state-of-the-art MCC-based CBP approaches exploit spatial–temporal features but lack in addressing two crucial challenges in the crowd scenes: (a) human-scale variation due to perspective distortion and (b) minimizing effects of cluttered background. To this end, an end-to-end trainable two-stream multiscale deep architecture has been proposed for MCC-based CBP. The first stream uses a deep convolution neural network to extract multiscale spatial features from the frames to handle human-scale variation. The second stream extracts multiscale temporal features from de-background frames using a multi-layer dilated convolution long short-term memory. The effect of the cluttered background has been minimized by extracting de-background frames by adopting a visual background extractor algorithm. The multiscale features from the two streams are concatenated and used to classify different crowd behaviors. The experiments are manifested on two large-scale crowd behavior datasets: MED and GTA. The experimental results show that the proposed model performs better than the state-of-the-art MCC-based CBP approaches. © 2022, The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature.