Paper Reading AI Learner

Unsupervised Video Object Segmentation using Motion Saliency-Guided Spatio-Temporal Propagation

2018-09-04 17:59:55
Yuan-Ting Hu, Jia-Bin Huang, Alexander G. Schwing

Abstract

Unsupervised video segmentation plays an important role in a wide variety of applications from object identification to compression. However, to date, fast motion, motion blur and occlusions pose significant challenges. To address these challenges for unsupervised video segmentation, we develop a novel saliency estimation technique as well as a novel neighborhood graph, based on optical flow and edge cues. Our approach leads to significantly better initial foreground-background estimates and their robust as well as accurate diffusion across time. We evaluate our proposed algorithm on the challenging DAVIS, SegTrack v2 and FBMS-59 datasets. Despite the usage of only a standard edge detector trained on 200 images, our method achieves state-of-the-art results outperforming deep learning based methods in the unsupervised setting. We even demonstrate competitive results comparable to deep learning based methods in the semi-supervised setting on the DAVIS dataset.

Abstract (translated)

无监督视频分割在从对象识别到压缩的各种应用中起着重要作用。然而,迄今为止,快速运动,运动模糊和遮挡带来了重大挑战。为了解决无监督视频分割的这些挑战,我们基于光流和边缘线索开发了一种新颖的显着性估计技术以及一种新的邻域图。我们的方法可以显着改善初始前景 - 背景估计,以及它们在时间上的稳健和准确扩散。我们在具有挑战性的DAVIS,SegTrack v2和FBMS-59数据集上评估我们提出的算法。尽管仅使用在200个图像上训练的标准边缘检测器,但是我们的方法在无监督设置中实现了优于基于深度学习的方法的最新结果。我们甚至在DAVIS数据集的半监督设置中展示了与基于深度学习的方法相当的竞争结果。

URL

https://arxiv.org/abs/1809.01125

PDF

https://arxiv.org/pdf/1809.01125.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot