Abstract
We address the unsupervised learning of several interconnected problems in low-level vision: single view depth prediction, camera motion estimation, optical flow, and segmentation of a video into the static scene and moving regions. Our key insight is that these four fundamental vision problems are coupled through geometric constraints. Consequently, learning to solve them together simplifies the problem because the solutions can reinforce each other. We go beyond previous work by exploiting geometry more explicitly and segmenting the scene into static and moving regions. To that end, we introduce Competitive Collaboration, a framework that facilitates the coordinated training of multiple specialized neural networks to solve complex problems. Competitive Collaboration works much like expectation-maximization, but with neural networks that act as both competitors to explain pixels that correspond to static or moving regions, and as collaborators through a moderator that assigns pixels to be either static or independently moving. Our novel method integrates all these problems in a common framework and simultaneously reasons about the segmentation of the scene into moving objects and the static background, the camera motion, depth of the static scene structure, and the optical flow of moving objects. Our model is trained without any supervision and achieves state-of-the-art performance among joint unsupervised methods on all sub-problems.
Abstract (translated)
我们解决了低水平视觉中几个相互关联的问题的无监督学习:单视图深度预测、摄像机运动估计、光流以及视频到静态场景和运动区域的分割。我们的关键见解是这四个基本视觉问题是通过几何约束耦合的。因此,学习一起解决它们简化了问题,因为解决方案可以相互加强。我们超越了以前的工作,更明确地利用几何学,将场景分割成静态和移动区域。为此,我们引入了竞争协作,这是一个促进多个专门神经网络协调训练以解决复杂问题的框架。竞争性协作的工作原理与期望最大化很相似,但它与神经网络的作用是同时解释与静态或移动区域相对应的像素,并通过调节器将像素分配为静态或独立移动,以此作为协作者。我们的新方法将所有这些问题整合到一个共同的框架中,同时考虑到将场景分割为运动对象和静态背景、相机运动、静态场景结构的深度以及运动对象的光流的原因。我们的模型在没有任何监督的情况下接受培训,在所有子问题的联合无监督方法中实现了最先进的性能。
URL
https://arxiv.org/abs/1805.09806