Paper Reading AI Learner

AdaFPP: Adapt-Focused Bi-Propagating Prototype Learning for Panoramic Activity Recognition

2024-05-04 01:53:22
Meiqi Cao, Rui Yan, Xiangbo Shu, Guangzhao Dai, Yazhou Yao, Guo-Sen Xie

Abstract

Panoramic Activity Recognition (PAR) aims to identify multi-granularity behaviors performed by multiple persons in panoramic scenes, including individual activities, group activities, and global activities. Previous methods 1) heavily rely on manually annotated detection boxes in training and inference, hindering further practical deployment; or 2) directly employ normal detectors to detect multiple persons with varying size and spatial occlusion in panoramic scenes, blocking the performance gain of PAR. To this end, we consider learning a detector adapting varying-size occluded persons, which is optimized along with the recognition module in the all-in-one framework. Therefore, we propose a novel Adapt-Focused bi-Propagating Prototype learning (AdaFPP) framework to jointly recognize individual, group, and global activities in panoramic activity scenes by learning an adapt-focused detector and multi-granularity prototypes as the pretext tasks in an end-to-end way. Specifically, to accommodate the varying sizes and spatial occlusion of multiple persons in crowed panoramic scenes, we introduce a panoramic adapt-focuser, achieving the size-adapting detection of individuals by comprehensively selecting and performing fine-grained detections on object-dense sub-regions identified through original detections. In addition, to mitigate information loss due to inaccurate individual localizations, we introduce a bi-propagation prototyper that promotes closed-loop interaction and informative consistency across different granularities by facilitating bidirectional information propagation among the individual, group, and global levels. Extensive experiments demonstrate the significant performance of AdaFPP and emphasize its powerful applicability for PAR.

Abstract (translated)

全景活动识别(PAR)旨在识别在全景场景中多个人的多种粒度行为,包括个人活动、群体活动和全局活动。之前的方法1)在训练和推理过程中严重依赖手动标注检测框,阻碍了进一步的实用部署;或者2)直接使用大小和空间遮挡变化多样的个人检测器来检测多个大小和空间遮挡变化多样的个人,阻碍了PAR的性能提升。因此,我们考虑学习一个自适应大小遮挡的检测器,该检测器与识别模块在一体化框架中进行优化。为此,我们提出了一个名为Adapt-Focused bi-Propagating Prototype learning (AdoFPP)的新颖对齐传播原型学习(AdaFPP)框架,通过学习适应关注的检测器和多粒度原型作为端到端任务的前馈任务,以同时识别全景活动场景中的个人、群体和全局活动。 具体来说,为了适应 crowed(拥挤)全景场景中多个人的不同大小和空间遮挡,我们引入了一个全景适应焦点检测器,通过全面选择并执行在原始检测到的物体密集子区域中的精细检测,实现对个人大小的适应检测。此外,为了减轻由于不准确的个人局部定位而产生的信息损失,我们引入了一个双向信息传播原型,通过促进个体、群体和全局层次之间的双向信息传播,实现信息的有用一致性。 大量的实验证明AdoFPP 的性能显著提高,并强调了其在PAR上的强大应用价值。

URL

https://arxiv.org/abs/2405.02538

PDF

https://arxiv.org/pdf/2405.02538.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot