Paper Reading AI Learner

EncodeNet: A Framework for Boosting DNN Accuracy with Entropy-driven Generalized Converting Autoencoder

2024-04-21 20:45:18
Hasanul Mahmud, Kevin Desai, Palden Lama, Sushil K. Prasad

Abstract

Image classification is a fundamental task in computer vision, and the quest to enhance DNN accuracy without inflating model size or latency remains a pressing concern. We make a couple of advances in this regard, leading to a novel EncodeNet design and training framework. The first advancement involves Converting Autoencoders, a novel approach that transforms images into an easy-to-classify image of its class. Our prior work that applied the Converting Autoencoder and a simple classifier in tandem achieved moderate accuracy over simple datasets, such as MNIST and FMNIST. However, on more complex datasets like CIFAR-10, the Converting Autoencoder has a large reconstruction loss, making it unsuitable for enhancing DNN accuracy. To address these limitations, we generalize the design of Converting Autoencoders by leveraging a larger class of DNNs, those with architectures comprising feature extraction layers followed by classification layers. We incorporate a generalized algorithmic design of the Converting Autoencoder and intraclass clustering to identify representative images, leading to optimized image feature learning. Next, we demonstrate the effectiveness of our EncodeNet design and training framework, improving the accuracy of well-trained baseline DNNs while maintaining the overall model size. EncodeNet's building blocks comprise the trained encoder from our generalized Converting Autoencoders transferring knowledge to a lightweight classifier network - also extracted from the baseline DNN. Our experimental results demonstrate that EncodeNet improves the accuracy of VGG16 from 92.64% to 94.05% on CIFAR-10 and RestNet20 from 74.56% to 76.04% on CIFAR-100. It outperforms state-of-the-art techniques that rely on knowledge distillation and attention mechanisms, delivering higher accuracy for models of comparable size.

Abstract (translated)

图像分类是计算机视觉中的一个基本任务,而通过不增加模型大小或延迟来提高DNN准确性仍然是一个迫切需要解决的问题。在这方面,我们做出了一些进展,导致了一种新颖的EncodeNet设计和训练框架。这一进步包括将变压器转换为将图像转换为其类别的容易分类图像的新方法。我们在之前的工作中,将变压器与简单的分类器一起应用,在简单的数据集(如MNIST和FMNIST)上取得了中等准确度。然而,在更复杂的数据集如CIFAR-10上,变压器的重建损失很大,使得它不适合提高DNN准确性。为了克服这些限制,我们通过利用更大类别的DNN,那些由特征提取层 followed by 分类层组成的方法,对变压器的设计进行了扩展。我们将变压器的通用算法设计以及内类聚类引入EncodeNet,以识别具有代表性的图像,从而实现优化图像特征学习。接下来,我们证明了EncodeNet的设计和训练框架的有效性,同时优化了预训练基线DNN的准确性,而模型的整体大小保持不变。EncodeNet的构建模块包括从扩展的变压器中获得的训练好的编码器,以及提取自基线DNN的轻量级分类器网络。我们的实验结果表明,EncodeNet在CIFAR-10和RestNet20上的准确性从92.64%提高到了94.05%,而在CIFAR-100上从74.56%提高到了76.04%。它优于依赖于知识蒸馏和注意机制的先进技术,为具有类似大小的模型提供了更高的准确性。

URL

https://arxiv.org/abs/2404.13770

PDF

https://arxiv.org/pdf/2404.13770.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot