Paper Reading AI Learner

HDNet: A Hierarchically Decoupled Network for Crowd Counting

2022-12-12 06:01:26
Chenliang Gu, Changan Wang, Bin-Bin Gao, Jun Liu, Tianliang Zhang

Abstract

Recently, density map regression-based methods have dominated in crowd counting owing to their excellent fitting ability on density distribution. However, further improvement tends to saturate mainly because of the confusing background noise and the large density variation. In this paper, we propose a Hierarchically Decoupled Network (HDNet) to solve the above two problems within a unified framework. Specifically, a background classification sub-task is decomposed from the density map prediction task, which is then assigned to a Density Decoupling Module (DDM) to exploit its highly discriminative ability. For the remaining foreground prediction sub-task, it is further hierarchically decomposed to several density-specific sub-tasks by the DDM, which are then solved by the regression-based experts in a Foreground Density Estimation Module (FDEM). Although the proposed strategy effectively reduces the hypothesis space so as to relieve the optimization for those task-specific experts, the high correlation of these sub-tasks are ignored. Therefore, we introduce three types of interaction strategies to unify the whole framework, which are Feature Interaction, Gradient Interaction, and Scale Interaction. Integrated with the above spirits, HDNet achieves state-of-the-art performance on several popular counting benchmarks.

Abstract (translated)

URL

https://arxiv.org/abs/2212.05722

PDF

https://arxiv.org/pdf/2212.05722.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot