Paper Reading AI Learner

Learning Adaptive Discriminative Correlation Filters via Temporal Consistency Preserving Spatial Feature Selection for Robust Visual Tracking

2018-07-30 13:46:46
Tianyang Xu, Zhen-Hua Feng, Xiao-Jun Wu, Josef Kittler

Abstract

With efficient appearance learning models, Discriminative Correlation Filter (DCF) has been proven to be very successful in recent video object tracking benchmarks and competitions. However, the existing DCF paradigm suffers from two major problems, \ie spatial boundary effect and temporal filter degeneration. To mitigate these challenges, we propose a new DCF-based tracking method. The key innovations of the proposed method include adaptive spatial feature selection and temporal consistent constraints, with which the new tracker enables joint spatio-temporal filter learning in a lower dimensional discriminative manifold. More specifically, we apply structured sparsity constraints to multi-channel filers. Consequently, the process of learning spatial filters can be approximated by the lasso regularisation. To encourage temporal consistency, the filter model is restricted to lie around its historical value and updated locally to preserve the global structure in the manifold. Last, a unified optimisation framework is proposed to jointly select temporal consistency preserving spatial features and learn discriminative filters with the augmented Lagrangian method. Qualitative and quantitative evaluations have been conducted on a number of well-known benchmarking datasets such as OTB2013, OTB50, OTB100, Temple-Colour and UAV123. The experimental results demonstrate the superiority of the proposed method over the state-of-the-art approaches.

Abstract (translated)

凭借高效的外观学习模型,Discriminative Correlation Filter(DCF)已被证明在最近的视频对象跟踪基准和竞赛中非常成功。然而,现有的DCF范例存在两个主要问题,即空间边界效应和时间滤波器退化。为了缓解这些挑战,我们提出了一种新的基于DCF的跟踪方法。该方法的关键创新包括自适应空间特征选择和时间一致约束,新跟踪器使得能够在较低维度判别流形中进行联合时空滤波器学习。更具体地说,我们将结构化稀疏性约束应用于多通道文件管理器。因此,学习空间滤波器的过程可以通过套索正则化来近似。为了鼓励时间一致性,过滤器模型被限制在其历史值附近并且在本地更新以保持流形中的全局结构。最后,提出了一种统一的优化框架,通过增广拉格朗日方法,共同选择保持空间特征的时间一致性,学习判别滤波器。已经对许多着名的基准数据集进行了定性和定量评估,例如OTB2013,OTB50,OTB100,Temple-Color和UAV123。实验结果证明了所提方法优于现有技术方法的优越性。

URL

https://arxiv.org/abs/1807.11348

PDF

https://arxiv.org/pdf/1807.11348.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot