DASTSiam: Spatio-Temporal Fusion and Discriminative Augmentation for Improved Siamese Tracking

Abstract
Abstract (translated)
URL
PDF

Abstract

Tracking tasks based on deep neural networks have greatly improved with the emergence of Siamese trackers. However, the appearance of targets often changes during tracking, which can reduce the robustness of the tracker when facing challenges such as aspect ratio change, occlusion, and scale variation. In addition, cluttered backgrounds can lead to multiple high response points in the response map, leading to incorrect target positioning. In this paper, we introduce two transformer-based modules to improve Siamese tracking called DASTSiam: the spatio-temporal (ST) fusion module and the Discriminative Augmentation (DA) module. The ST module uses cross-attention based accumulation of historical cues to improve robustness against object appearance changes, while the DA module associates semantic information between the template and search region to improve target discrimination. Moreover, Modifying the label assignment of anchors also improves the reliability of the object location. Our modules can be used with all Siamese trackers and show improved performance on several public datasets through comparative and ablation experiments.

Abstract (translated)

随着深度学习网络的出现,基于三目搜索跟踪任务的技术已经得到了极大的改进。然而,在跟踪过程中,目标的外观经常发生变化,这可能会削弱跟踪器在面对如比例变化、遮挡和尺寸变化等挑战时的鲁棒性。此外,背景杂乱可能导致响应地图上多个高响应点,导致目标定位错误。在本文中,我们介绍了两个基于Transformer的模块,称为DASTSiam,用于改进Siamese跟踪:空间-时间融合模块和分类增强模块。空间-时间融合模块使用交叉注意力based的历史 cues 进行积累,以提高对象外观变化时的鲁棒性;分类增强模块将模板和搜索区域之间的语义信息联系起来,以提高目标识别。此外,修改anchor 标签 assignments 也改善了对象位置的可靠性。我们的模块可以与所有Siamese跟踪器一起使用,通过比较和消除实验展示了在多个公共数据集上的性能改进。

URL

https://arxiv.org/abs/2301.09063

PDF

https://arxiv.org/pdf/2301.09063.pdf