CrowdMAC: Masked Crowd Density Completion for Robust Crowd Density Forecasting

Keio University1, NVIDIA2

Banner Image

Overview of the CrowdMAC during the training phase. Cube embedding transforms the crowd density maps into a sequence of tokens. At each training step, one of the task masks is sampled, and a subset of tokens is masked. The remaining tokens, along with the space-time position embedding, are fed into a Transformer encoder and decoder to reconstruct the masked maps. Note that the multi-task masking is applied only during the training phase, while the only future prediction task mask is used during the inference phase.

Abstract

A crowd density forecasting task aims to predict how the crowd density map will change in the future from observed past crowd density maps. However, the past crowd density maps are often incomplete due to the miss-detection of pedestrians, and it is crucial to develop a robust crowd density forecasting model against the miss-detection. This paper presents a MAsked crowd density Completion framework for crowd density forecasting (CrowdMAC), which is simultaneously trained to forecast future crowd density maps from partially masked past crowd density maps (i.e., forecasting maps from past maps with miss-detection) while reconstructing the masked observation maps (i.e., imputing past maps with miss-detection). Additionally, we propose Temporal-Density-aware Masking (TDM), which non-uniformly masks tokens in the observed crowd density map, considering the sparsity of the crowd density maps and the informativeness of the subsequent frames for the forecasting task. Moreover, we introduce multi-task masking to enhance training efficiency. In the experiments, CrowdMAC achieves state-of-the-art performance on seven large-scale datasets, including SDD, ETH-UCY, inD, JRDB, VSCrowd, FDST, and croHD. We also demonstrate the robustness of the proposed method against both synthetic and realistic miss-detections.

Qualitative Results

Banner Image
Qualitative results on the SDD and FDST datasets. The density maps are visualized as a heatmap and overlayed onto the RGB image for visualization purposes.

BibTeX


        @inproceedings{FUJII2025CrowdMAC,
        author = {Ryo Fujii, Ryo Hachiuma, and Hideo Saito},
        title = {CrowdMAC: Masked Crowd Density Completion for Robust Crowd Density Forecasting},
        booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
        year = {2025},
        }