Paper Reading AI Learner

Learning Deep Multi-Level Similarity for Thermal Infrared Object Tracking

2019-06-09 05:09:05
Qiao Liu, Xin Li, Zhenyu He, Nana Fan, Di Yuan, Hongpeng Wang

Abstract

Existing deep Thermal InfraRed (TIR) trackers only use semantic features to describe the TIR object, which lack the sufficient discriminative capacity for handling distractors. This becomes worse when the feature extraction network is only trained on RGB this http URL address this issue, we propose a multi-level similarity model under a Siamese framework for robust TIR object tracking. Specifically, we compute different pattern similarities on two convolutional layers using the proposed multi-level similarity network. One of them focuses on the global semantic similarity and the other computes the local structural similarity of the TIR object. These two similarities complement each other and hence enhance the discriminative capacity of the network for handling distractors. In addition, we design a simple while effective relative entropy based ensemble subnetwork to integrate the semantic and structural similarities. This subnetwork can adaptive learn the weights of the semantic and structural similarities at the training stage. To further enhance the discriminative capacity of the tracker, we construct the first large scale TIR video sequence dataset for training the proposed model. The proposed TIR dataset not only benefits the training for TIR tracking but also can be applied to numerous TIR vision tasks. Extensive experimental results on the VOT-TIR2015 and VOT-TIR2017 benchmarks demonstrate that the proposed algorithm performs favorably against the state-of-the-art methods.

Abstract (translated)

现有的深热红外(TIR)跟踪器只使用语义特征来描述TIR对象,缺乏足够的识别能力来处理干扰源。当特征提取网络只在RGB上训练时,情况变得更糟。这个HTTP URL解决了这个问题,我们在暹罗框架下提出了一个多层次的相似性模型,用于鲁棒的TIR对象跟踪。具体来说,我们使用所提出的多层相似网络计算两个卷积层上的不同模式相似性。其中一个关注全局语义相似度,另一个则计算TIR对象的局部结构相似度。这两个相似点相辅相成,从而增强了网络处理干扰源的识别能力。此外,我们还设计了一个简单而有效的基于相对熵的集成子网来整合语义和结构上的相似性。该子网络可以自适应地学习训练阶段语义相似性和结构相似性的权重。为了进一步提高跟踪器的识别能力,我们构建了第一个大规模的TIR视频序列数据集,用于训练该模型。提出的TIR数据集不仅有利于TIR跟踪训练,而且可以应用于许多TIR视觉任务。在VOT-TIR12015和VOT-TIR12017基准上的大量实验结果表明,所提出的算法与最先进的方法相比表现良好。

URL

https://arxiv.org/abs/1906.03568

PDF

https://arxiv.org/pdf/1906.03568.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot