Abstract
Food recognition has a wide range of applications, such as health-aware recommendation and self-service restaurants. Most previous methods of food recognition firstly locate informative regions in some weakly-supervised manners and then aggregate their features. However, location errors of informative regions limit the effectiveness of these methods to some extent. Instead of locating multiple regions, we propose a Progressive Self-Distillation (PSD) method, which progressively enhances the ability of network to mine more details for food recognition. The training of PSD simultaneously contains multiple self-distillations, in which a teacher network and a student network share the same embedding network. Since the student network receives a modified image from its teacher network by masking some informative regions, the teacher network outputs stronger semantic representations than the student network. Guided by such teacher network with stronger semantics, the student network is encouraged to mine more useful regions from the modified image by enhancing its own ability. The ability of the teacher network is also enhanced with the shared embedding network. By using progressive training, the teacher network incrementally improves its ability to mine more discriminative regions. In inference phase, only the teacher network is used without the help of the student network. Extensive experiments on three datasets demonstrate the effectiveness of our proposed method and state-of-the-art performance.
Abstract (translated)
食品识别具有广泛的应用,例如健康意识的推荐和自助餐厅。以往的食品识别方法首先通过一些弱监督的方式找到 informative 区域,然后将它们的特征聚合起来。然而, informative 区域的位置错误在一定程度上限制了这些方法的有效性。我们提出了一种渐进式的自我蒸馏方法(PSD),该方法渐进地增强网络在食品识别中挖掘更多细节的能力。在 PSD 的训练过程中,同时包含多个自我蒸馏,其中老师网络和学生网络共享相同的嵌入网络。由于学生网络通过掩盖一些 informative 区域从老师网络接收到了修改的图像,老师网络输出比学生网络更强的语义表示。受到更强语义的老师网络的指导,学生网络被鼓励从修改的图像中挖掘更多的有用区域,并增强自身的能力。通过渐进式的训练,老师网络逐步改进了挖掘更 discriminative 区域的能力。在推理阶段,仅使用老师网络,而不需要学生网络的帮助。对三个数据集进行的广泛实验证明了我们提出的方法和最先进的性能。
URL
https://arxiv.org/abs/2303.05073