Abstract
The tracking-by-detection framework receives growing attentions through the integration with the Convolutional Neural Networks (CNNs). Existing tracking-by-detection based methods, however, fail to track objects with severe appearance variations. This is because the traditional convolutional operation is performed on fixed grids, and thus may not be able to find the correct response while the object is changing pose or under varying environmental conditions. In this paper, we propose a deformable convolution layer to enrich the target appearance representations in the tracking-by-detection framework. We aim to capture the target appearance variations via deformable convolution, which adaptively enhances its original features. In addition, we also propose a gated fusion scheme to control how the variations captured by the deformable convolution affect the original appearance. The enriched feature representation through deformable convolution facilitates the discrimination of the CNN classifier on the target object and background. Extensive experiments on the standard benchmarks show that the proposed tracker performs favorably against state-of-the-art methods.
Abstract (translated)
通过与卷积神经网络(CNN)的集成,检测跟踪框架受到越来越多的关注。然而,现有的基于检测的跟踪方法无法跟踪外观变化严重的物体。这是因为传统的卷积运算是在固定网格上进行的,因此在物体改变姿态或在不同的环境条件下可能无法找到正确的响应。本文提出了一种可变形卷积层,以丰富检测跟踪框架中的目标外观表示。我们的目标是通过可变形卷积来捕获目标的外观变化,它自适应地增强了目标的原始特征。此外,我们还提出了一种门控融合方案,以控制由可变形卷积捕获的变化如何影响原始外观。通过可变形卷积的丰富特征表示有助于区分目标对象和背景上的CNN分类器。在标准基准上进行的大量实验表明,所提出的跟踪器相对于最先进的方法表现良好。
URL
https://arxiv.org/abs/1809.10417