Paper Reading AI Learner

Robust Ensemble Person Re-Identification via Orthogonal Fusion with Occlusion Handling

2024-03-29 18:38:59
Syeda Nyma Ferdous, Xin Li

Abstract

Occlusion remains one of the major challenges in person reidentification (ReID) as a result of the diversity of poses and the variation of appearances. Developing novel architectures to improve the robustness of occlusion-aware person Re-ID requires new insights, especially on low-resolution edge cameras. We propose a deep ensemble model that harnesses both CNN and Transformer architectures to generate robust feature representations. To achieve robust Re-ID without the need to manually label occluded regions, we propose to take an ensemble learning-based approach derived from the analogy between arbitrarily shaped occluded regions and robust feature representation. Using the orthogonality principle, our developed deep CNN model makes use of masked autoencoder (MAE) and global-local feature fusion for robust person identification. Furthermore, we present a part occlusion-aware transformer capable of learning feature space that is robust to occluded regions. Experimental results are reported on several Re-ID datasets to show the effectiveness of our developed ensemble model named orthogonal fusion with occlusion handling (OFOH). Compared to competing methods, the proposed OFOH approach has achieved competent rank-1 and mAP performance.

Abstract (translated)

遮挡仍然是人物识别(ReID)中的一个主要挑战,由于不同姿态和外观的差异。开发新的架构来提高遮挡注意到的行人ReID的鲁棒性需要新的见解,尤其是在低分辨率边缘相机上。我们提出了一种深度集成模型,利用CNN和Transformer架构生成鲁棒的特征表示。为了实现无需手动标注遮挡区域的稳健ReID,我们提出了一个基于元学习的方法,其来源于任意形状的遮挡区域与鲁棒特征表示的类比。通过正交性原理,我们开发了一种深度CNN模型,利用遮罩自动编码器(MAE)和全局局部特征融合进行鲁棒的行人识别。此外,我们还提出了一个部分遮挡注意到的Transformer,能够学习对遮挡区域鲁棒的特征空间。在多个ReID数据集上进行的实验结果表明,我们提出的具有遮挡处理能力的元学习模型具有很好的效果,名为Orthogonal Fusion with Occlusion Handling (OFOH)。与竞争方法相比,所提出的OFOH方法已经取得了出色的排名1和mAP性能。

URL

https://arxiv.org/abs/2404.00107

PDF

https://arxiv.org/pdf/2404.00107.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot