Paper Reading AI Learner

Break It Down: Evidence for Structural Compositionality in Neural Networks

2023-01-26 00:53:11
Michael A. Lepori, Thomas Serre, Ellie Pavlick

Abstract

Many tasks can be described as compositions over subroutines. Though modern neural networks have achieved impressive performance on both vision and language tasks, we know little about the functions that they implement. One possibility is that neural networks implicitly break down complex tasks into subroutines, implement modular solutions to these subroutines, and compose them into an overall solution to a task -- a property we term structural compositionality. Or they may simply learn to match new inputs to memorized representations, eliding task decomposition entirely. Here, we leverage model pruning techniques to investigate this question in both vision and language, across a variety of architectures, tasks, and pretraining regimens. Our results demonstrate that models oftentimes implement solutions to subroutines via modular subnetworks, which can be ablated while maintaining the functionality of other subroutines. This suggests that neural networks may be able to learn to exhibit compositionality, obviating the need for specialized symbolic mechanisms.

Abstract (translated)

许多任务可以被视为子任务的组合。尽管现代神经网络在视觉和语言任务上取得了令人印象深刻的表现,但我们对它们实现的功能了解较少。一种可能的解释是神经网络 implicit地分解复杂的任务成子任务,实现这些子任务的模块解决方案,并将它们组合成一个任务的整体解决方案——我们称之为结构组合性。或者它们可能只是学习将新的输入与记忆表示相匹配,完全省略了任务分解。在这里,我们利用模型压缩技术在视觉和语言的各种架构、任务和预训练 regimen上研究这个问题。我们的结果显示,模型常常通过模块化子任务实现解决方案,可以在减少其他子任务功能的同时保持其功能。这暗示着神经网络可能能够学习表现出组合性,从而不必使用专门的符号机制。

URL

https://arxiv.org/abs/2301.10884

PDF

https://arxiv.org/pdf/2301.10884.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot