Paper Reading AI Learner

Delayed Bottlenecking: Alleviating Forgetting in Pre-trained Graph Neural Networks

2024-04-23 11:35:35
Zhe Zhao, Pengkun Wang, Xu Wang, Haibin Wen, Xiaolong Xie, Zhengyang Zhou, Qingfu Zhang, Yang Wang

Abstract

Pre-training GNNs to extract transferable knowledge and apply it to downstream tasks has become the de facto standard of graph representation learning. Recent works focused on designing self-supervised pre-training tasks to extract useful and universal transferable knowledge from large-scale unlabeled data. However, they have to face an inevitable question: traditional pre-training strategies that aim at extracting useful information about pre-training tasks, may not extract all useful information about the downstream task. In this paper, we reexamine the pre-training process within traditional pre-training and fine-tuning frameworks from the perspective of Information Bottleneck (IB) and confirm that the forgetting phenomenon in pre-training phase may cause detrimental effects on downstream tasks. Therefore, we propose a novel \underline{D}elayed \underline{B}ottlenecking \underline{P}re-training (DBP) framework which maintains as much as possible mutual information between latent representations and training data during pre-training phase by suppressing the compression operation and delays the compression operation to fine-tuning phase to make sure the compression can be guided with labeled fine-tuning data and downstream tasks. To achieve this, we design two information control objectives that can be directly optimized and further integrate them into the actual model design. Extensive experiments on both chemistry and biology domains demonstrate the effectiveness of DBP.

Abstract (translated)

将预训练的图神经网络提取可转移知识并将其应用于下游任务的实际标准已经成为了图形表示学习的事实标准。 最近的工作集中在设计自监督的预训练任务,以从大规模未标注数据中提取有用的和通用的可转移知识。 然而,他们必须面对一个不可避免的质疑: 旨在提取预训练任务的有用信息的传统预训练策略,可能无法提取下游任务的全部有用信息。 在本文中,我们重新审视了传统预训练和微调框架中的预训练过程,从信息瓶颈(IB)的角度出发,证实了预训练阶段遗忘现象可能会对下游任务造成严重损害。 因此,我们提出了一个新颖的 \underline{D}elayed \underline{B}ottlenecking \underline{P}re-training (DBP)框架,该框架在预训练阶段通过抑制压缩操作来尽可能保持潜在表示和训练数据之间的互信息,并将压缩操作延迟到微调阶段,以确保压缩可以引导有标签的微调数据和下游任务。 为了实现这一目标,我们设计了一个可以直接优化且可以进一步集成到实际模型设计中的两个信息控制目标。 在化学和生物学领域进行的大量实验证明DBP的有效性。

URL

https://arxiv.org/abs/2404.14941

PDF

https://arxiv.org/pdf/2404.14941.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot