Paper Reading AI Learner

Learning Rigidity in Dynamic Scenes with a Moving Camera for 3D Motion Field Estimation

2018-07-30 08:11:45
Zhaoyang Lv, Kihwan Kim, Alejandro Troccoli, Deqing Sun, James M. Rehg, Jan Kautz

Abstract

Estimation of 3D motion in a dynamic scene from a temporal pair of images is a core task in many scene understanding problems. In real world applications, a dynamic scene is commonly captured by a moving camera (i.e., panning, tilting or hand-held), increasing the task complexity because the scene is observed from different view points. The main challenge is the disambiguation of the camera motion from scene motion, which becomes more difficult as the amount of rigidity observed decreases, even with successful estimation of 2D image correspondences. Compared to other state-of-the-art 3D scene flow estimation methods, in this paper we propose to \emph{learn} the rigidity of a scene in a supervised manner from a large collection of dynamic scene data, and directly infer a rigidity mask from two sequential images with depths. With the learned network, we show how we can effectively estimate camera motion and projected scene flow using computed 2D optical flow and the inferred rigidity mask. For training and testing the rigidity network, we also provide a new semi-synthetic dynamic scene dataset (synthetic foreground objects with a real background) and an evaluation split that accounts for the percentage of observed non-rigid pixels. Through our evaluation we show the proposed framework outperforms current state-of-the-art scene flow estimation methods in challenging dynamic scenes.

Abstract (translated)

从时间对图像估计动态场景中的3D运动是许多场景理解问题中的核心任务。在现实世界的应用中,动态场景通常由移动的相机捕获(即,平移,倾斜或手持),增加了任务复杂性,因为从不同的视点观察场景。主要挑战是从场景运动中消除相机运动的歧义,即使成功估计2D图像对应,随着观察到的刚度的减少,这变得更加困难。与其他最先进的三维场景流估计方法相比,本文提出了从大量动态场景数据中以监督方式“学习”场景的刚度,并直接推断刚度从具有深度的两个连续图像中掩盖。通过学习的网络,我们展示了如何使用计算的2D光流和推断的刚性掩模有效地估计相机运动和投影场景流。为了训练和测试刚性网络,我们还提供了一个新的半合成动态场景数据集(具有真实背景的合成前景对象)和一个评估分割,它考虑了观察到的非刚性像素的百分比。通过我们的评估,我们展示了提出的框架优于当前最先进的场景流量估计方法在具有挑战性的动态场景中。

URL

https://arxiv.org/abs/1804.04259

PDF

https://arxiv.org/pdf/1804.04259.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot