Paper Reading AI Learner

Parameter Hierarchical Optimization for Visible-Infrared Person Re-Identification

2024-04-11 17:27:39
Zeng YU, Yunxiao Shi

Abstract

Visible-infrared person re-identification (VI-reID) aims at matching cross-modality pedestrian images captured by disjoint visible or infrared cameras. Existing methods alleviate the cross-modality discrepancies via designing different kinds of network architectures. Different from available methods, in this paper, we propose a novel parameter optimizing paradigm, parameter hierarchical optimization (PHO) method, for the task of VI-ReID. It allows part of parameters to be directly optimized without any training, which narrows the search space of parameters and makes the whole network more easier to be trained. Specifically, we first divide the parameters into different types, and then introduce a self-adaptive alignment strategy (SAS) to automatically align the visible and infrared images through transformation. Considering that features in different dimension have varying importance, we develop an auto-weighted alignment learning (AAL) module that can automatically weight features according to their importance. Importantly, in the alignment process of SAS and AAL, all the parameters are immediately optimized with optimization principles rather than training the whole network, which yields a better parameter training manner. Furthermore, we establish the cross-modality consistent learning (CCL) loss to extract discriminative person representations with translation consistency. We provide both theoretical justification and empirical evidence that our proposed PHO method outperform existing VI-reID approaches.

Abstract (translated)

可见红外人员重新识别(VI-reID)旨在通过设计不同类型的网络架构来匹配由分离的可见或红外相机捕获的跨模态行人图像。现有的方法通过设计不同的网络架构来减轻跨模态差异。与现有的方法不同,本文提出了一种新的参数优化范例,参数层次优化(PHO)方法,用于VI-reID任务。它允许部分参数通过直接优化而无需训练,从而缩小参数搜索空间并使整个网络更容易训练。具体来说,我们首先将参数分为不同类型,然后引入自适应对齐策略(SAS)通过变换来自动对齐可见和红外图像。考虑到不同维度特征的重要性不同,我们开发了一个自适应加权对齐学习(AAL)模块,可以根据其重要性自动加权特征。重要的是,在SAS和AAL的对齐过程中,所有参数都使用优化原理进行优化,而不是训练整个网络,这导致了更好的参数训练方式。此外,我们还建立了跨模态一致性学习(CCL)损失,用于通过平移一致性提取具有平移一致性的区分性人物表示。我们提供了理论证明和实证证据,证明我们提出的PHO方法优于现有的VI-reID方法。

URL

https://arxiv.org/abs/2404.07930

PDF

https://arxiv.org/pdf/2404.07930.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot