Paper Reading AI Learner

Learn More for Food Recognition via Progressive Self-Distillation

2023-03-09 07:11:30
Yaohui Zhu, Linhu Liu, Jiang Tian

Abstract

Food recognition has a wide range of applications, such as health-aware recommendation and self-service restaurants. Most previous methods of food recognition firstly locate informative regions in some weakly-supervised manners and then aggregate their features. However, location errors of informative regions limit the effectiveness of these methods to some extent. Instead of locating multiple regions, we propose a Progressive Self-Distillation (PSD) method, which progressively enhances the ability of network to mine more details for food recognition. The training of PSD simultaneously contains multiple self-distillations, in which a teacher network and a student network share the same embedding network. Since the student network receives a modified image from its teacher network by masking some informative regions, the teacher network outputs stronger semantic representations than the student network. Guided by such teacher network with stronger semantics, the student network is encouraged to mine more useful regions from the modified image by enhancing its own ability. The ability of the teacher network is also enhanced with the shared embedding network. By using progressive training, the teacher network incrementally improves its ability to mine more discriminative regions. In inference phase, only the teacher network is used without the help of the student network. Extensive experiments on three datasets demonstrate the effectiveness of our proposed method and state-of-the-art performance.

Abstract (translated)

食品识别具有广泛的应用,例如健康意识的推荐和自助餐厅。以往的食品识别方法首先通过一些弱监督的方式找到 informative 区域,然后将它们的特征聚合起来。然而, informative 区域的位置错误在一定程度上限制了这些方法的有效性。我们提出了一种渐进式的自我蒸馏方法(PSD),该方法渐进地增强网络在食品识别中挖掘更多细节的能力。在 PSD 的训练过程中,同时包含多个自我蒸馏,其中老师网络和学生网络共享相同的嵌入网络。由于学生网络通过掩盖一些 informative 区域从老师网络接收到了修改的图像,老师网络输出比学生网络更强的语义表示。受到更强语义的老师网络的指导,学生网络被鼓励从修改的图像中挖掘更多的有用区域,并增强自身的能力。通过渐进式的训练,老师网络逐步改进了挖掘更 discriminative 区域的能力。在推理阶段,仅使用老师网络,而不需要学生网络的帮助。对三个数据集进行的广泛实验证明了我们提出的方法和最先进的性能。

URL

https://arxiv.org/abs/2303.05073

PDF

https://arxiv.org/pdf/2303.05073.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot