Paper Reading AI Learner

DensSiam: End-to-End Densely-Siamese Network with Self-Attention Model for Object Tracking

2018-09-07 23:41:02
Mohamed H. Abdelpakey, Mohamed S. Shehata, Mostafa M. Mohamed

Abstract

Convolutional Siamese neural networks have been recently used to track objects using deep features. Siamese architecture can achieve real time speed, however it is still difficult to find a Siamese architecture that maintains the generalization capability, high accuracy and speed while decreasing the number of shared parameters especially when it is very deep. Furthermore, a conventional Siamese architecture usually processes one local neighborhood at a time, which makes the appearance model local and non-robust to appearance changes. To overcome these two problems, this paper proposes DensSiam, a novel convolutional Siamese architecture, which uses the concept of dense layers and connects each dense layer to all layers in a feed-forward fashion with a similarity-learning function. DensSiam also includes a Self-Attention mechanism to force the network to pay more attention to the non-local features during offline training. Extensive experiments are performed on four tracking benchmarks: OTB2013 and OTB2015 for validation set; and VOT2015, VOT2016 and VOT2017 for testing set. The obtained results show that DensSiam achieves superior results on these benchmarks compared to other current state-of-the-art methods.

Abstract (translated)

卷积连体神经网络最近已被用于使用深度特征来跟踪对象。连体结构可以实现实时速度,但是仍然很难找到保持泛化能力,高精度和速度的连体结构,同时减少共享参数的数量,特别是当它非常深时。此外,传统的Siamese架构通常一次处理一个局部邻域,这使得外观模型对于外观变化是局部的和非鲁棒的。为了克服这两个问题,本文提出了DensSiam,一种新颖的卷积连体结构,它使用密集层的概念,并以前馈方式将每个密集层连接到所有层,具有相似性学习功能。 DensSiam还包括一个自我注意机制,迫使网络在离线培训期间更加关注非本地功能。在四个跟踪基准上进行了大量实验:用于验证集的OTB2013和OTB2015;和VOT2015,VOT2016和VOT2017用于测试集。得到的结果表明,与其他现有技术相比,DensSiam在这些基准测试中取得了优异的成果。

URL

https://arxiv.org/abs/1809.02714

PDF

https://arxiv.org/pdf/1809.02714.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot