Paper Reading AI Learner

Soft Conditional Computation

2019-04-10 01:46:48
Brandon Yang, Gabriel Bender, Quoc V. Le, Jiquan Ngiam

Abstract

Conditional computation aims to increase the size and accuracy of a network, at a small increase in inference cost. Previous hard-routing models explicitly route the input to a subset of experts. We propose soft conditional computation, which, in contrast, utilizes all experts while still permitting efficient inference through parameter routing. Concretely, for a given convolutional layer, we wish to compute a linear combination of $n$ experts $\alpha_1 \cdot (W_1 * x) + \ldots + \alpha_n \cdot (W_n * x)$, where $\alpha_1, \ldots, \alpha_n$ are functions of the input learned through gradient descent. A straightforward evaluation requires $n$ convolutions. We propose an equivalent form of the above computation, $(\alpha_1 W_1 + \ldots + \alpha_n W_n) * x$, which requires only a single convolution. We demonstrate the efficacy of our method, named CondConv, by scaling up the MobileNetV1, MobileNetV2, and ResNet-50 model architectures to achieve higher accuracy while retaining efficient inference. On the ImageNet classification dataset, CondConv improves the top-1 validation accuracy of the MobileNetV1(0.5x) model from 63.8% to 71.6% while only increasing inference cost by 27%. On COCO object detection, CondConv improves the minival mAP of a MobileNetV1(1.0x) SSD model from 20.3 to 22.4 with just a 4% increase in inference cost.

Abstract (translated)

条件计算的目的是以较小的推理成本增加网络的规模和精度。以前的硬路由模型明确地将输入路由到专家的子集。我们提出了软条件计算,相比之下,软条件计算利用了所有专家,但仍然允许通过参数路由进行有效的推理。具体地说,对于给定的卷积层,我们希望计算$N$experts$alpha_1cdot(w_1*x)+ldots+alpha_ncdot(w_n*x)$的线性组合,其中$alpha_1、ldots、alpha_n$是通过梯度下降学习的输入函数。简单的评估需要$N$卷积。我们提出了上述计算的等价形式,$(alpha_1 w_1+ldots+alpha_n w_n)*x$,只需要一个卷积。我们通过扩展mobilenetv1、mobilenetv2和resnet-50模型体系结构,在保持有效推理的同时,实现更高的精度,从而证明了我们的方法condconv的有效性。在ImageNet分类数据集上,Condconv将MobileNetv1(0.5x)模型的前1验证精度从63.8%提高到71.6%,同时只增加了27%的推理成本。在COCO对象检测方面,Condconv将MobileNetv1(1.0x)SSD模型的最小值映射从20.3提高到22.4,推理成本仅增加4%。

URL

https://arxiv.org/abs/1904.04971

PDF

https://arxiv.org/pdf/1904.04971.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot