Forget About the LiDAR: Self-Supervised Depth Estimators with MED Probability Volumes

Abstract
Abstract (translated)
URL
PDF

Abstract

Self-supervised depth estimators have recently shown results comparable to the supervised methods on the challenging single image depth estimation (SIDE) task, by exploiting the geometrical relations between target and reference views in the training data. However, previous methods usually learn forward or backward image synthesis, but not depth estimation, as they cannot effectively neglect occlusions between the target and the reference images. Previous works rely on rigid photometric assumptions or the SIDE network to infer depth and occlusions, resulting in limited performance. On the other hand, we propose a method to "Forget About the LiDAR" (FAL), for the training of depth estimators, with Mirrored Exponential Disparity (MED) probability volumes, from which we obtain geometrically inspired occlusion maps with our novel Mirrored Occlusion Module (MOM). Our MOM does not impose a burden on our FAL-net. Contrary to the previous methods that learn SIDE from stereo pairs by regressing disparity in the linear space, our FAL-net regresses disparity by binning it into the exponential space, which allows for better detection of distant and nearby objects. We define a two-step training strategy for our FAL-net: It is first trained for view synthesis and then fine-tuned for depth estimation with our MOM. Our FAL-net is remarkably light-weight and outperforms the previous state-of-the-art methods with 8x fewer parameters and 3x faster inference speeds on the challenging KITTI dataset. We present extensive experimental results on the KITTI, CityScapes, and Make3D datasets to verify our method's effectiveness. To the authors' best knowledge, the presented method performs the best among all the previous self-supervised methods until now.

Abstract (translated)

URL

https://arxiv.org/abs/2008.03633

PDF

https://arxiv.org/pdf/2008.03633.pdf