Paper Reading AI Learner

Isometric Transformation Invariant Graph-based Deep Neural Network

2018-08-21 14:32:52
Renata Khasanova, Pascal Frossard

Abstract

Learning transformation invariant representations of visual data is an important problem in computer vision. Deep convolutional networks have demonstrated remarkable results for image and video classification tasks. However, they have achieved only limited success in the classification of images that undergo geometric transformations. In this work we present a novel Transformation Invariant Graph-based Network (TIGraNet), which learns graph-based features that are inherently invariant to isometric transformations such as rotation and translation of input images. In particular, images are represented as signals on graphs, which permits to replace classical convolution and pooling layers in deep networks with graph spectral convolution and dynamic graph pooling layers that together contribute to invariance to isometric transformation. Our experiments show high performance on rotated and translated images from the test set compared to classical architectures that are very sensitive to transformations in the data. The inherent invariance properties of our framework provide key advantages, such as increased resiliency to data variability and sustained performance with limited training sets. Our code is available online.

Abstract (translated)

学习视觉数据的变换不变表示是计算机视觉中的重要问题。深度卷积网络已经为图像和视频分类任务展示了显着的结果。然而,它们在经历几何变换的图像分类中仅取得了有限的成功。在这项工作中,我们提出了一种新颖的基于变换不变图的网络(TIGraNet),它可以学习基于图形的特征​​,这些特征对于等距变换(例如输入图像的旋转和平移)具有固有的不变性。特别地,图像在图上表示为信号,这允许用图谱分解和动态图池池替换深网络中的经典卷积和汇集层,这些层一起有助于等距变换的不变性。我们的实验表明,与对数据转换非常敏感的经典架构相比,来自测试集的旋转和翻译图像具有高性能。我们框架的固有不变性属性提供了关键优势,例如增加数据可变性的弹性和有限训练集的持续性能。我们的代码可在线获取。

URL

https://arxiv.org/abs/1808.07366

PDF

https://arxiv.org/pdf/1808.07366.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot