Abstract
Existing deep Thermal InfraRed (TIR) trackers only use semantic features to describe the TIR object, which lack the sufficient discriminative capacity for handling distractors. This becomes worse when the feature extraction network is only trained on RGB this http URL address this issue, we propose a multi-level similarity model under a Siamese framework for robust TIR object tracking. Specifically, we compute different pattern similarities on two convolutional layers using the proposed multi-level similarity network. One of them focuses on the global semantic similarity and the other computes the local structural similarity of the TIR object. These two similarities complement each other and hence enhance the discriminative capacity of the network for handling distractors. In addition, we design a simple while effective relative entropy based ensemble subnetwork to integrate the semantic and structural similarities. This subnetwork can adaptive learn the weights of the semantic and structural similarities at the training stage. To further enhance the discriminative capacity of the tracker, we construct the first large scale TIR video sequence dataset for training the proposed model. The proposed TIR dataset not only benefits the training for TIR tracking but also can be applied to numerous TIR vision tasks. Extensive experimental results on the VOT-TIR2015 and VOT-TIR2017 benchmarks demonstrate that the proposed algorithm performs favorably against the state-of-the-art methods.
Abstract (translated)
现有的深热红外(TIR)跟踪器只使用语义特征来描述TIR对象,缺乏足够的识别能力来处理干扰源。当特征提取网络只在RGB上训练时,情况变得更糟。这个HTTP URL解决了这个问题,我们在暹罗框架下提出了一个多层次的相似性模型,用于鲁棒的TIR对象跟踪。具体来说,我们使用所提出的多层相似网络计算两个卷积层上的不同模式相似性。其中一个关注全局语义相似度,另一个则计算TIR对象的局部结构相似度。这两个相似点相辅相成,从而增强了网络处理干扰源的识别能力。此外,我们还设计了一个简单而有效的基于相对熵的集成子网来整合语义和结构上的相似性。该子网络可以自适应地学习训练阶段语义相似性和结构相似性的权重。为了进一步提高跟踪器的识别能力,我们构建了第一个大规模的TIR视频序列数据集,用于训练该模型。提出的TIR数据集不仅有利于TIR跟踪训练,而且可以应用于许多TIR视觉任务。在VOT-TIR12015和VOT-TIR12017基准上的大量实验结果表明,所提出的算法与最先进的方法相比表现良好。
URL
https://arxiv.org/abs/1906.03568