Semi-synthesis: A fast way to produce effective datasets for stereo matching

2021-01-26 14:34:49

Ju He, Enyu Zhou, Liusheng Sun, Fei Lei, Chenyang Liu, Wenxiu Sun

arXiv_CV

arXiv_CV CNN Attention Pose 3D Scene_Text Matching

Abstract
Abstract (translated)
URL
PDF

Abstract

Stereo matching is an important problem in computer vision which has drawn tremendous research attention for decades. Recent years, data-driven methods with convolutional neural networks (CNNs) are continuously pushing stereo matching to new heights. However, data-driven methods require large amount of training data, which is not an easy task for real stereo data due to the annotation difficulties of per-pixel ground-truth disparity. Though synthetic dataset is proposed to fill the gaps of large data demand, the fine-tuning on real dataset is still needed due to the domain variances between synthetic data and real data. In this paper, we found that in synthetic datasets, close-to-real-scene texture rendering is a key factor to boost up stereo matching performance, while close-to-real-scene 3D modeling is less important. We then propose semi-synthetic, an effective and fast way to synthesize large amount of data with close-to-real-scene texture to minimize the gap between synthetic data and real data. Extensive experiments demonstrate that models trained with our proposed semi-synthetic datasets achieve significantly better performance than with general synthetic datasets, especially on real data benchmarks with limited training data. With further fine-tuning on the real dataset, we also achieve SOTA performance on Middlebury and competitive results on KITTI and ETH3D datasets.

Abstract (translated)

URL

https://arxiv.org/abs/2101.10811

PDF

https://arxiv.org/pdf/2101.10811.pdf