Paper Reading AI Learner

CNN2GNN: How to Bridge CNN with GNN

2024-04-23 08:19:08
Ziheng Jiao, Hongyuan Zhang, Xuelong Li

Abstract

Although the convolutional neural network (CNN) has achieved excellent performance in vision tasks by extracting the intra-sample representation, it will take a higher training expense because of stacking numerous convolutional layers. Recently, as the bilinear models, graph neural networks (GNN) have succeeded in exploring the underlying topological relationship among the graph data with a few graph neural layers. Unfortunately, it cannot be directly utilized on non-graph data due to the lack of graph structure and has high inference latency on large-scale scenarios. Inspired by these complementary strengths and weaknesses, \textit{we discuss a natural question, how to bridge these two heterogeneous networks?} In this paper, we propose a novel CNN2GNN framework to unify CNN and GNN together via distillation. Firstly, to break the limitations of GNN, a differentiable sparse graph learning module is designed as the head of networks to dynamically learn the graph for inductive learning. Then, a response-based distillation is introduced to transfer the knowledge from CNN to GNN and bridge these two heterogeneous networks. Notably, due to extracting the intra-sample representation of a single instance and the topological relationship among the datasets simultaneously, the performance of distilled ``boosted'' two-layer GNN on Mini-ImageNet is much higher than CNN containing dozens of layers such as ResNet152.

Abstract (translated)

虽然卷积神经网络(CNN)通过提取内部样本表示在视觉任务中取得了优秀的性能,但由于堆叠多个卷积层,其训练成本会更高。最近,作为一种线性模型,图神经网络(GNN)通过几个图神经网络层成功探索了图数据的潜在拓扑关系。然而,由于缺乏图形结构,在大型场景上,它无法直接应用于非图形数据。受到这些互补优势和劣势的启发,我们提出了一个自然的问题:如何将这些异构网络连接起来?本文我们提出了一种名为CNN2GNN的新CNN-GNN框架,通过蒸馏将CNN和GNN统一在一起。首先,为了克服GNN的限制,我们设计了一个可导稀疏图学习模块,作为网络的头部,通过归纳学习动态地学习图形。然后,引入了基于响应的蒸馏,将CNN中的知识传递给GNN,并桥接这两个异构网络。值得注意的是,由于同时提取了单个实例的内部表示和数据集中的拓扑关系,用于微调的“加强”二层GNN在Mini-ImageNet上的性能比包含数十层ResNet152的CNN更高。

URL

https://arxiv.org/abs/2404.14822

PDF

https://arxiv.org/pdf/2404.14822.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot