Paper Reading AI Learner

Rethinking Binary Neural Network for Accurate Image Classification and Semantic Segmentation

2018-11-22 05:24:17
Bohan Zhuang, Chunhua Shen, Mingkui Tan, Lingqiao Liu, Ian Reid

Abstract

In this paper, we propose to train a network with both binary weights and binary activations, designed specifically for mobile devices with limited computation capacity and power consumption. Previous works on quantizing CNNs uncritically assume the same architecture with full-precision networks, which we term value approximation. Their objective is to preserve the floating-point information using a set of discrete values. However, we take a novel view---for best performance it is very likely that a different architecture may be better suited to deal with binary weights as well as binary activations. Thus we directly design such a highly accurate binary network structure, which is termed structure approximation. In particular, we propose a "network decomposition" strategy in which we divide the networks into groups and aggregate a set of homogeneous binary branches to implicitly reconstruct the full-precision intermediate feature maps. In addition, we also learn the connections between each group. We further provide a comprehensive comparison among all quantization categories. Experiments on ImageNet classification tasks demonstrate the superior performance of the proposed model, named Group-Net, over various popular architectures. In particular, we outperform the previous best binary neural network in terms of accuracy as well as saving huge computational complexity. Furthermore, the proposed Group-Net can effectively utilize task specific properties for strong generalization. In particular, we propose to extend Group-Net for \textbf{lossless} semantic segmentation. This is the first work proposed on solving dense pixels prediction based on BNNs in the literature. Actually, we claim that considering both value and structure approximation should be the future development direction of BNNs.

Abstract (translated)

URL

https://arxiv.org/abs/1811.10413

PDF

https://arxiv.org/pdf/1811.10413.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot