Paper Reading AI Learner

Improved training of binary networks for human pose estimation and image recognition

2019-04-11 17:55:06
Adrian Bulat, Georgios Tzimiropoulos, Jean Kossaifi, Maja Pantic

Abstract

Big neural networks trained on large datasets have advanced the state-of-the-art for a large variety of challenging problems, improving performance by a large margin. However, under low memory and limited computational power constraints, the accuracy on the same problems drops considerable. In this paper, we propose a series of techniques that significantly improve the accuracy of binarized neural networks (i.e networks where both the features and the weights are binary). We evaluate the proposed improvements on two diverse tasks: fine-grained recognition (human pose estimation) and large-scale image recognition (ImageNet classification). Specifically, we introduce a series of novel methodological changes including: (a) more appropriate activation functions, (b) reverse-order initialization, (c) progressive quantization, and (d) network stacking and show that these additions improve existing state-of-the-art network binarization techniques, significantly. Additionally, for the first time, we also investigate the extent to which network binarization and knowledge distillation can be combined. When tested on the challenging MPII dataset, our method shows a performance improvement of more than 4% in absolute terms. Finally, we further validate our findings by applying the proposed techniques for large-scale object recognition on the Imagenet dataset, on which we report a reduction of error rate by 4%.

Abstract (translated)

在大型数据集上训练的大型神经网络已经为各种各样的挑战性问题提供了最先进的技术,大大提高了性能。然而,在低内存和有限的计算能力限制下,对相同问题的精度下降了相当大。在本文中,我们提出了一系列显著提高二值化神经网络(即特征和权重都是二值的网络)准确性的技术。我们评估了两个不同任务的改进:细粒度识别(人体姿态估计)和大规模图像识别(ImageNet分类)。具体地说,我们介绍了一系列新的方法学变化,包括:(a)更合适的激活函数,(b)逆序初始化,(c)渐进量化,和(d)网络叠加,并表明这些增加改善现有的最先进的网络二值化技术,显着。此外,我们还首次研究了网络二值化和知识蒸馏的结合程度。当在具有挑战性的MPII数据集上测试时,我们的方法的性能绝对提高超过4%。最后,我们将所提出的大规模对象识别技术应用于ImageNet数据集,进一步验证了我们的发现,在此基础上,我们报告了误差率降低了4%。

URL

https://arxiv.org/abs/1904.05868

PDF

https://arxiv.org/pdf/1904.05868.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot