Paper Reading AI Learner

Deep Cross-modality Adaptation via Semantics Preserving Adversarial Learning for Sketch-based 3D Shape Retrieval

2018-07-04 22:51:44
Jiaxin Chen, Yi Fang

Abstract

Due to the large cross-modality discrepancy between 2D sketches and 3D shapes, retrieving 3D shapes by sketches is a significantly challenging task. To address this problem, we propose a novel framework to learn a discriminative deep cross-modality adaptation model in this paper. Specifically, we first separately adopt two metric networks, following two deep convolutional neural networks (CNNs), to learn modality-specific discriminative features based on an importance-aware metric learning method. Subsequently, we explicitly introduce a cross-modality transformation network to compensate for the divergence between two modalities, which can transfer features of 2D sketches to the feature space of 3D shapes. We develop an adversarial learning based method to train the transformation model, by simultaneously enhancing the holistic correlations between data distributions of two modalities, and mitigating the local semantic divergences through minimizing a cross-modality mean discrepancy term. Experimental results on the SHREC 2013 and SHREC 2014 datasets clearly show the superior retrieval performance of our proposed model, compared to the state-of-the-art approaches.

Abstract (translated)

由于2D草图和3D形状之间存在较大的跨模态差异,因此通过草图检索3D形状是一项非常具有挑战性的任务。为了解决这个问题,我们提出了一个新的框架来学习一个有区别的深度跨模态适应模型。具体而言,我们首先分别采用两个度量网络,遵循两个深度卷积神经网络(CNN),以基于重要性感知度量学习方法学习模态特定的判别特征。随后,我们明确地引入了跨模态转换网络来补偿两种模态之间的差异,这可以将2D草图的特征转移到3D形状的特征空间。我们开发了一种基于对抗性学习的方法来训练转换模型,同时增强两种模态的数据分布之间的整体相关性,并通过最小化交叉模态均值差异项来减轻局部语义差异。 SHREC 2013和SHREC 2014数据集的实验结果清楚地表明,与最先进的方法相比,我们提出的模型具有更高的检索性能。

URL

https://arxiv.org/abs/1807.01806

PDF

https://arxiv.org/pdf/1807.01806.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot