Paper Reading AI Learner

DiCENet: Dimension-wise Convolutions for Efficient Networks

2019-06-08 20:17:06
Sachin Mehta, Hannaneh Hajishirzi, Mohammad Rastegari

Abstract

In this paper, we propose a new CNN model DiCENet, that is built using: (1) dimension-wise convolutions and (2) efficient channel fusion. The introduced blocks maximize the use of information in the input tensor by learning representations across all dimensions while simultaneously reducing the complexity of the network and achieving high accuracy. Our model shows significant improvements over state-of-the-art models across various visual recognition tasks, including image classification, object detection, and semantic segmentation. Our model delivers either the same or better performance than existing models with fewer FLOPs, including task-specific models. Notably, DiCENet delivers competitive performance to neural architecture search-based methods at fewer FLOPs (70-100 MFLOPs). On the MS-COCO object detection, DiCENet is 4.5% more accurate and has 5.6 times fewer FLOPs than YOLOv2. On the PASCAL VOC 2012 semantic segmentation dataset, DiCENet is 4.3% more accurate and has 3.2 times fewer FLOPs than a recent efficient semantic segmentation network, ESPNet. Our source code is available at \url{this https URL}

Abstract (translated)

本文提出了一种新的CNN模型DICENET,该模型使用:(1)维卷积和(2)有效的信道融合。引入的块通过学习所有维度的表示,最大限度地利用输入张量中的信息,同时降低网络的复杂性并实现高精度。我们的模型在不同的视觉识别任务(包括图像分类、对象检测和语义分割)上比最先进的模型有了显著的改进。我们的模型提供了与现有模型相同或更好的性能,具有更少的触发器,包括特定于任务的模型。值得注意的是,Dicenet以更少的触发器(70-100 mFlops)为基于神经架构搜索的方法提供了具有竞争力的性能。在MS-COCO目标检测中,Dicenet比Yolov2精确4.5%,并且比Yolov2少5.6倍。在PascalVOC 2012语义分割数据集上,Dicenet比最近的高效语义分割网络(ESPNET)精确4.3%,并且具有3.2倍的失败率。我们的源代码位于url此https url_

URL

https://arxiv.org/abs/1906.03516

PDF

https://arxiv.org/pdf/1906.03516.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot