Repository logo
Institutional Digital Repository
Shreenivas Deshpande Library, IIT (BHU), Varanasi

A Transfer Learning-Based Multi-cues Multi-scale Spatial–Temporal Modeling for Effective Video-Based Crowd Counting and Density Estimation Using a Single-Column 2D-Atrous Net

Loading...
Thumbnail Image

Date

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Crowd count and density estimation (CCDE) is an emerging research area and is a useful tool for crowd analysis and behavior modeling. The existing video-based CCDE approaches utilize spatial–temporal modeling for the CCDE. However, these approaches fail to address some of the major issues, such as scale variation because of perspective distortion in the frame and volume of frames and minimization of background influence during spatial–temporal modeling. To attain these issues, we are motivated to design a transfer learning-based multi-cues multi-scale spatial–temporal modeling for video-based CCDE. The proposed model utilizes a pre-trained Inception-V3 to extract multi-scale features for four different video frames cues, such as color frame, the foreground map of the frame, volume of the frame, and volume of foreground maps. The foreground maps are obtained by the Gaussian mixture model. The extracted multi-cue multi-scale features are then concatenated and fed into a single-column 2D-Atrous-Net. The 2D-Atrous-Net estimates the crowd density by regression on the ground-truth density maps. The experiments are conducted on two datasets, namely the Mall and Venice. The model outperforms the state-of-the-art techniques and yields an effective CCDE model by achieving better MAE and RMSE. © 2021, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

Description

Keywords

Citation

Collections

Endorsement

Review

Supplemented By

Referenced By