Revisiting RGBT Tracking Benchmarks from the Perspective of Modality Validity: A New Benchmark, Problem, and Method

Abstract
Abstract (translated)
URL
PDF

Abstract

RGBT tracking draws increasing attention due to its robustness in multi-modality warranting (MMW) scenarios, such as nighttime and bad weather, where relying on a single sensing modality fails to ensure stable tracking results. However, the existing benchmarks predominantly consist of videos collected in common scenarios where both RGB and thermal infrared (TIR) information are of sufficient quality. This makes the data unrepresentative of severe imaging conditions, leading to tracking failures in MMW scenarios. To bridge this gap, we present a new benchmark, MV-RGBT, captured specifically in MMW scenarios. In contrast with the existing datasets, MV-RGBT comprises more object categories and scenes, providing a diverse and challenging benchmark. Furthermore, for severe imaging conditions of MMW scenarios, a new problem is posed, namely \textit{when to fuse}, to stimulate the development of fusion strategies for such data. We propose a new method based on a mixture of experts, namely MoETrack, as a baseline fusion strategy. In MoETrack, each expert generates independent tracking results along with the corresponding confidence score, which is used to control the fusion process. Extensive experimental results demonstrate the significant potential of MV-RGBT in advancing RGBT tracking and elicit the conclusion that fusion is not always beneficial, especially in MMW scenarios. Significantly, the proposed MoETrack method achieves new state-of-the-art results not only on MV-RGBT, but also on standard benchmarks, such as RGBT234, LasHeR, and the short-term split of VTUAV (VTUAV-ST). More information of MV-RGBT and the source code of MoETrack will be released at this https URL.

Abstract (translated)

RGBT跟踪因其多模态保证（MMW）场景中的稳健性而受到越来越多的关注。在这些场景中，仅依赖单个感测模态无法确保稳定的跟踪结果。然而，现有的基准主要是由质量和足够的RGB和热红外（TIR）信息的视频组成的。这使得数据无法代表严重的成像条件，导致在MMW场景中跟踪失败。为了弥合这个空白，我们提出了一个新的基准，即MV-RGBT，专门针对MMW场景进行捕捉。与现有数据集相比，MV-RGBT包含了更多的物体类别和场景，为数据提供了一个多样化和具有挑战性的基准。此外，在MMW场景的严重成像条件下，还提出了一个新的问题，即何时进行融合，以刺激数据中融合策略的发展。我们提出了基于专家混合的方法，即MoETrack，作为基线融合策略。大量的实验结果证明了MV-RGBT在推动RGBT跟踪和激发在MMW场景中的进一步发展方面的显著潜力。值得注意的是，与MV-RGBT一起，MoETrack方法不仅在MV-RGBT上取得了最先进的成果，而且在标准基准，如RGBT234，LasHeR和VTUAV（VTUAV-ST）短时间分割上同样表现出色。关于MV-RGBT和MoETrack的更多详细信息以及源代码将在这个链接处发布。

URL

https://arxiv.org/abs/2405.00168

PDF

https://arxiv.org/pdf/2405.00168.pdf

Revisiting RGBT Tracking Benchmarks from the Perspective of Modality Validity: A New Benchmark, Problem, and Method

Abstract

Abstract (translated)

URL

PDF Copy

PDF