Paper Reading AI Learner

Deep Cross Modal Learning for Caricature Verification and Identification

2018-07-31 07:19:14
Jatin Garg, Skand Vishwanath Peri, Himanshu Tolani, Narayanan C Krishnan

Abstract

Learning from different modalities is a challenging task. In this paper, we look at the challenging problem of cross modal face verification and recognition between caricature and visual image modalities. Caricature have exaggerations of facial features of a person. Due to the significant variations in the caricatures, building vision models for recognizing and verifying data from this modality is an extremely challenging task. Visual images with significantly lesser amount of distortions can act as a bridge for the analysis of caricature modality. We introduce a publicly available large Caricature-VIsual dataset [CaVI] with images from both the modalities that captures the rich variations in the caricature of an identity. This paper presents the first cross modal architecture that handles extreme distortions of caricatures using a deep learning network that learns similar representations across the modalities. We use two convolutional networks along with transformations that are subjected to orthogonality constraints to capture the shared and modality specific representations. In contrast to prior research, our approach neither depends on manually extracted facial landmarks for learning the representations, nor on the identities of the person for performing verification. The learned shared representation achieves 91% accuracy for verifying unseen images and 75% accuracy on unseen identities. Further, recognizing the identity in the image by knowledge transfer using a combination of shared and modality specific representations, resulted in an unprecedented performance of 85% rank-1 accuracy for caricatures and 95% rank-1 accuracy for visual images.

Abstract (translated)

从不同的方式中学习是一项具有挑战性的任务。在本文中,我们将研究交叉模态面部验证和漫画与视觉图像模态之间识别的挑战性问题。漫画夸张了一个人的面部特征。由于漫画的显着变化,用于识别和验证来自该模态的数据的建立视觉模型是极其具有挑战性的任务。具有显着较少量扭曲的视觉图像可以充当分析漫画形态的桥梁。我们介绍了一个公开的大型漫画 - VIsual数据集[CaVI],其中包含来自两种模式的图像,这些图像捕获了身份漫画中丰富的变化。本文介绍了第一种交叉模态体系结构,该体系结构使用深度学习网络处理极端扭曲的漫画,该网络学习了各种模态中的类似表示。我们使用两个卷积网络以及受正交性约束的变换来捕获共享和模态特定表示。与先前的研究相反,我们的方法既不依赖于手动提取的面部地标来学习表示,也不依赖于用于执行验证的人的身份。学习的共享表示在验证看不见的图像方面达到91%的准确率,在看不见的身份上达到75%的准确度。此外,通过使用共享和模态特定表示的组合的知识转移来识别图像中的身份,导致对于漫画的85%秩-1准确度和针对视觉图像的95%秩-1准确度的前所未有的性能。

URL

https://arxiv.org/abs/1807.11688

PDF

https://arxiv.org/pdf/1807.11688.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot