Abstract
Learning transformation invariant representations of visual data is an important problem in computer vision. Deep convolutional networks have demonstrated remarkable results for image and video classification tasks. However, they have achieved only limited success in the classification of images that undergo geometric transformations. In this work we present a novel Transformation Invariant Graph-based Network (TIGraNet), which learns graph-based features that are inherently invariant to isometric transformations such as rotation and translation of input images. In particular, images are represented as signals on graphs, which permits to replace classical convolution and pooling layers in deep networks with graph spectral convolution and dynamic graph pooling layers that together contribute to invariance to isometric transformation. Our experiments show high performance on rotated and translated images from the test set compared to classical architectures that are very sensitive to transformations in the data. The inherent invariance properties of our framework provide key advantages, such as increased resiliency to data variability and sustained performance with limited training sets. Our code is available online.
Abstract (translated)
学习视觉数据的变换不变表示是计算机视觉中的重要问题。深度卷积网络已经为图像和视频分类任务展示了显着的结果。然而,它们在经历几何变换的图像分类中仅取得了有限的成功。在这项工作中,我们提出了一种新颖的基于变换不变图的网络(TIGraNet),它可以学习基于图形的特征,这些特征对于等距变换(例如输入图像的旋转和平移)具有固有的不变性。特别地,图像在图上表示为信号,这允许用图谱分解和动态图池池替换深网络中的经典卷积和汇集层,这些层一起有助于等距变换的不变性。我们的实验表明,与对数据转换非常敏感的经典架构相比,来自测试集的旋转和翻译图像具有高性能。我们框架的固有不变性属性提供了关键优势,例如增加数据可变性的弹性和有限训练集的持续性能。我们的代码可在线获取。
URL
https://arxiv.org/abs/1808.07366