Paper Reading AI Learner

A Little Bit Attention Is All You Need for Person Re-Identification

2023-02-28 13:54:31
Markus Eisenbach, Jannik Lübberstedt, Dustin Aganian, Horst-Michael Gross

Abstract

Person re-identification plays a key role in applications where a mobile robot needs to track its users over a long period of time, even if they are partially unobserved for some time, in order to follow them or be available on demand. In this context, deep-learning based real-time feature extraction on a mobile robot is often performed on special-purpose devices whose computational resources are shared for multiple tasks. Therefore, the inference speed has to be taken into account. In contrast, person re-identification is often improved by architectural changes that come at the cost of significantly slowing down inference. Attention blocks are one such example. We will show that some well-performing attention blocks used in the state of the art are subject to inference costs that are far too high to justify their use for mobile robotic applications. As a consequence, we propose an attention block that only slightly affects the inference speed while keeping up with much deeper networks or more complex attention blocks in terms of re-identification accuracy. We perform extensive neural architecture search to derive rules at which locations this attention block should be integrated into the architecture in order to achieve the best trade-off between speed and accuracy. Finally, we confirm that the best performing configuration on a re-identification benchmark also performs well on an indoor robotic dataset.

Abstract (translated)

人重新配对在需要对移动机器人的用户进行长期跟踪的应用中发挥着关键作用,即使他们部分被观察了一段时间,以便跟随他们或随时可用。在这种情况下,基于深度学习的实时特征提取通常在特殊的专用设备上进行,这些设备的计算资源被共享用于多个任务。因此,推断速度必须考虑到。相比之下,人重新配对通常通过建筑结构改变来实现,这样做的代价是显著减缓推断速度。注意力块就是一个这样的例子。我们将证明,一些先进的注意力块在常用的设计中表现良好,但推断成本却非常高,以至于不能将其用于移动机器人应用。因此,我们提出了一个注意力块,它只略微影响推断速度,而能够在人重新配对精度方面与更深层的网络或更复杂的注意力块保持同步。我们进行了广泛的神经网络架构搜索,以推导出该注意力块应该嵌入到架构中的特定位置的规则,以实现速度与精度的最佳权衡。最后,我们确认,在人重新配对基准测试中表现最佳的配置也在室内机器人数据集上表现良好。

URL

https://arxiv.org/abs/2302.14574

PDF

https://arxiv.org/pdf/2302.14574.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot