Paper Reading AI Learner

Geometry-Aware Symmetric Domain Adaptation for Monocular Depth Estimation

2019-04-03 09:23:03
Shanshan Zhao, Huan Fu, Mingming Gong, Dacheng Tao

Abstract

Supervised depth estimation has achieved high accuracy due to the advanced deep network architectures. Since the groundtruth depth labels are hard to obtain, recent methods try to learn depth estimation networks in an unsupervised way by exploring unsupervised cues, which are effective but less reliable than true labels. An emerging way to resolve this dilemma is to transfer knowledge from synthetic images with ground truth depth via domain adaptation techniques. However, these approaches overlook specific geometric structure of the natural images in the target domain (i.e., real data), which is important for high-performing depth prediction. Motivated by the observation, we propose a geometry-aware symmetric domain adaptation framework (GASDA) to explore the labels in the synthetic data and epipolar geometry in the real data jointly. Moreover, by training two image style translators and depth estimators symmetrically in an end-to-end network, our model achieves better image style transfer and generates high-quality depth maps. The experimental results demonstrate the effectiveness of our proposed method and comparable performance against the state-of-the-art. Code will be publicly available at: https://github.com/sshan-zhao/GASDA.

Abstract (translated)

监控深度估计由于采用了先进的深度网络体系结构,具有较高的精度。由于基态深度标签很难获得,最近的方法试图通过探索无监督的线索来无监督地学习深度估计网络,这是有效的,但不如真正的标签可靠。解决这一难题的一种新方法是通过领域自适应技术从具有地面真实深度的合成图像中传输知识。然而,这些方法忽略了目标域中自然图像的特定几何结构(即真实数据),这对于高性能深度预测很重要。在观测的激励下,我们提出了一个几何感知对称域适应框架(GASDA),以共同探索合成数据中的标签和真实数据中的极外几何。此外,通过在端到端网络中对称地训练两个图像样式转换器和深度估计器,我们的模型实现了更好的图像样式转换并生成高质量的深度图。实验结果表明,我们提出的方法的有效性和与最新技术相比较的性能。代码将在以下网址公开:https://github.com/sshan-zhao/gasda。

URL

https://arxiv.org/abs/1904.01870

PDF

https://arxiv.org/pdf/1904.01870.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot