Paper Reading AI Learner

Mixture of Balanced Information Bottlenecks for Long-Tailed Visual Recognition

2025-09-01 22:14:12
Yifan Lan, Xin Cai, Jun Cheng, Shan Tan

Abstract

Deep neural networks (DNNs) have achieved significant success in various applications with large-scale and balanced data. However, data in real-world visual recognition are usually long-tailed, bringing challenges to efficient training and deployment of DNNs. Information bottleneck (IB) is an elegant approach for representation learning. In this paper, we propose a balanced information bottleneck (BIB) approach, in which loss function re-balancing and self-distillation techniques are integrated into the original IB network. BIB is thus capable of learning a sufficient representation with essential label-related information fully preserved for long-tailed visual recognition. To further enhance the representation learning capability, we also propose a novel structure of mixture of multiple balanced information bottlenecks (MBIB), where different BIBs are responsible for combining knowledge from different network layers. MBIB facilitates an end-to-end learning strategy that trains representation and classification simultaneously from an information theory perspective. We conduct experiments on commonly used long-tailed datasets, including CIFAR100-LT, ImageNet-LT, and iNaturalist 2018. Both BIB and MBIB reach state-of-the-art performance for long-tailed visual recognition.

Abstract (translated)

深度神经网络(DNN)在大规模且平衡的数据集上取得了显著的成功,应用于各种场景。然而,在现实世界的视觉识别任务中,数据通常呈现出长尾分布的特点,这给DNN的高效训练和部署带来了挑战。信息瓶颈(IB)是一种优雅的表示学习方法。在这篇论文中,我们提出了一种平衡信息瓶颈(BIB)的方法,该方法在原始IB网络的基础上集成了损失函数再平衡和自蒸馏技术。通过这种方式,BIB能够在长尾视觉识别任务中学习到足够的表示,并且充分保留了与标签相关的本质信息。 为了进一步增强表示学习的能力,我们还提出了一种新颖的混合多重平衡信息瓶颈(MBIB)结构,在这种结构中,不同的BIB各自负责从不同网络层提取知识。MBIB提供了一种端到端的学习策略,从信息论的角度同时训练表示和分类器。 我们在常用的长尾数据集上进行了实验,包括CIFAR100-LT、ImageNet-LT以及iNaturalist 2018。无论是BIB还是MBIB,在长尾视觉识别任务中均达到了业界领先的表现水平。

URL

https://arxiv.org/abs/2509.01804

PDF

https://arxiv.org/pdf/2509.01804.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot