Abstract
Although the convolutional neural network (CNN) has achieved excellent performance in vision tasks by extracting the intra-sample representation, it will take a higher training expense because of stacking numerous convolutional layers. Recently, as the bilinear models, graph neural networks (GNN) have succeeded in exploring the underlying topological relationship among the graph data with a few graph neural layers. Unfortunately, it cannot be directly utilized on non-graph data due to the lack of graph structure and has high inference latency on large-scale scenarios. Inspired by these complementary strengths and weaknesses, \textit{we discuss a natural question, how to bridge these two heterogeneous networks?} In this paper, we propose a novel CNN2GNN framework to unify CNN and GNN together via distillation. Firstly, to break the limitations of GNN, a differentiable sparse graph learning module is designed as the head of networks to dynamically learn the graph for inductive learning. Then, a response-based distillation is introduced to transfer the knowledge from CNN to GNN and bridge these two heterogeneous networks. Notably, due to extracting the intra-sample representation of a single instance and the topological relationship among the datasets simultaneously, the performance of distilled ``boosted'' two-layer GNN on Mini-ImageNet is much higher than CNN containing dozens of layers such as ResNet152.
Abstract (translated)
虽然卷积神经网络(CNN)通过提取内部样本表示在视觉任务中取得了优秀的性能,但由于堆叠多个卷积层,其训练成本会更高。最近,作为一种线性模型,图神经网络(GNN)通过几个图神经网络层成功探索了图数据的潜在拓扑关系。然而,由于缺乏图形结构,在大型场景上,它无法直接应用于非图形数据。受到这些互补优势和劣势的启发,我们提出了一个自然的问题:如何将这些异构网络连接起来?本文我们提出了一种名为CNN2GNN的新CNN-GNN框架,通过蒸馏将CNN和GNN统一在一起。首先,为了克服GNN的限制,我们设计了一个可导稀疏图学习模块,作为网络的头部,通过归纳学习动态地学习图形。然后,引入了基于响应的蒸馏,将CNN中的知识传递给GNN,并桥接这两个异构网络。值得注意的是,由于同时提取了单个实例的内部表示和数据集中的拓扑关系,用于微调的“加强”二层GNN在Mini-ImageNet上的性能比包含数十层ResNet152的CNN更高。
URL
https://arxiv.org/abs/2404.14822