Paper Reading AI Learner

Video Object Segmentation and Tracking: A Survey

2019-04-19 12:49:22
Rui Yao, Guosheng Lin, Shixiong Xia, Jiaqi Zhao, Yong Zhou

Abstract

Object segmentation and object tracking are fundamental research area in the computer vision community. These two topics are diffcult to handle some common challenges, such as occlusion, deformation, motion blur, and scale variation. The former contains heterogeneous object, interacting object, edge ambiguity, and shape complexity. And the latter suffers from difficulties in handling fast motion, out-of-view, and real-time processing. Combining the two problems of video object segmentation and tracking (VOST) can overcome their respective difficulties and improve their performance. VOST can be widely applied to many practical applications such as video summarization, high definition video compression, human computer interaction, and autonomous vehicles. This article aims to provide a comprehensive review of the state-of-the-art tracking methods, and classify these methods into different categories, and identify new trends. First, we provide a hierarchical categorization existing approaches, including unsupervised VOS, semi-supervised VOS, interactive VOS, weakly supervised VOS, and segmentation-based tracking methods. Second, we provide a detailed discussion and overview of the technical characteristics of the different methods. Third, we summarize the characteristics of the related video dataset, and provide a variety of evaluation metrics. Finally, we point out a set of interesting future works and draw our own conclusions.

Abstract (translated)

目标分割和目标跟踪是计算机视觉领域的基础研究领域。这两个主题很难处理一些常见的挑战,例如遮挡、变形、运动模糊和比例变化。前者包含异构对象、交互对象、边缘模糊性和形状复杂性。后者在处理快速运动、视距外和实时处理方面存在困难。结合视频对象分割与跟踪(VOST)这两个问题,可以克服各自的困难,提高其性能。VOST可广泛应用于视频总结、高清视频压缩、人机交互、自主车辆等多种实际应用领域。本文旨在对最先进的跟踪方法进行全面的回顾,并将这些方法分为不同的类别,并找出新的趋势。首先,我们提出了一种分层分类的现有方法,包括无监督VOS、半监督VOS、交互式VOS、弱监督VOS和基于分段的跟踪方法。其次,对不同方法的技术特点进行了详细的讨论和概述。第三,我们总结了相关视频数据集的特点,并提供了各种评估指标。最后,我们指出了一系列有趣的未来工作,并得出了自己的结论。

URL

https://arxiv.org/abs/1904.09172

PDF

https://arxiv.org/pdf/1904.09172.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot