Abstract
Visual object tracking and segmentation in omnidirectional videos are challenging due to the wide field-of-view and large spherical distortion brought by 360° images. To alleviate these problems, we introduce a novel representation, extended bounding field-of-view (eBFoV), for target localization and use it as the foundation of a general 360 tracking framework which is applicable for both omnidirectional visual object tracking and segmentation tasks. Building upon our previous work on omnidirectional visual object tracking (360VOT), we propose a comprehensive dataset and benchmark that incorporates a new component called omnidirectional video object segmentation (360VOS). The 360VOS dataset includes 290 sequences accompanied by dense pixel-wise masks and covers a broader range of target categories. To support both the development and evaluation of algorithms in this domain, we divide the dataset into a training subset with 170 sequences and a testing subset with 120 sequences. Furthermore, we tailor evaluation metrics for both omnidirectional tracking and segmentation to ensure rigorous assessment. Through extensive experiments, we benchmark state-of-the-art approaches and demonstrate the effectiveness of our proposed 360 tracking framework and training dataset. Homepage: this https URL
Abstract (translated)
视图对象跟踪和分割在全景视频中具有挑战性,因为360°图像带来的 wide field-of-view 和 large spherical distortion。为了减轻这些问题,我们引入了一种新的表示,扩展边界场视野(eBFoV),用于目标定位,并将其作为通用360跟踪框架的基础,适用于全景视觉对象跟踪和分割任务。在我们的之前工作基础上(360VOT),我们提出了一个全面的 datasets 和 benchmark,其中包含了一个新的组件,称为全景视频对象分割(360VOS)。360VOS 数据集包括 290 个序列,并伴有密集的像素级掩码,涵盖了更广泛的目标类别。为了支持在这个领域的算法的发展和评估,我们将数据集划分为训练集和测试集,其中训练集包含170个序列,测试集包含120个序列。此外,我们还为全景跟踪和分割定义了严格的评估指标,以确保严谨的评估。通过广泛的实验,我们基准了最先进的 approaches,并证明了所提出的360跟踪框架和训练数据集的有效性。主页:https:// this URL
URL
https://arxiv.org/abs/2404.13953