Paper Reading AI Learner

Muti-Stage Hierarchical Food Classification

2023-09-03 04:45:44
Xinyue Pan, Jiangpeng He, Fengqing Zhu

Abstract

Food image classification serves as a fundamental and critical step in image-based dietary assessment, facilitating nutrient intake analysis from captured food images. However, existing works in food classification predominantly focuses on predicting 'food types', which do not contain direct nutritional composition information. This limitation arises from the inherent discrepancies in nutrition databases, which are tasked with associating each 'food item' with its respective information. Therefore, in this work we aim to classify food items to align with nutrition database. To this end, we first introduce VFN-nutrient dataset by annotating each food image in VFN with a food item that includes nutritional composition information. Such annotation of food items, being more discriminative than food types, creates a hierarchical structure within the dataset. However, since the food item annotations are solely based on nutritional composition information, they do not always show visual relations with each other, which poses significant challenges when applying deep learning-based techniques for classification. To address this issue, we then propose a multi-stage hierarchical framework for food item classification by iteratively clustering and merging food items during the training process, which allows the deep model to extract image features that are discriminative across labels. Our method is evaluated on VFN-nutrient dataset and achieve promising results compared with existing work in terms of both food type and food item classification.

Abstract (translated)

食品图像分类在基于图像的膳食评估中扮演着至关重要且关键的步骤,便于从捕获的食品图像中分析营养素摄入。然而,现有的食品分类工作主要关注预测“食品类型”,这些食品类型并没有直接的营养组成信息。这种限制源于营养数据库之间的固有差异,其任务是将每个“食品 item”与相应的信息关联起来。因此,在本工作中,我们旨在将食品 items 与营养数据库对齐,实现食品 item 分类。为此,我们首先介绍了 VFN 营养数据集,通过在 VFN 中为每个食品图像标注包含营养组成信息的食品 item。这种食品 item 的标注,比食品类型更具体,在数据集中创造了层级结构。然而,由于食品 item 标注仅基于营养组成信息,它们并不总是表现出视觉关系,这在应用深度学习技术进行分类时提出了重大挑战。为了解决这个问题,我们提出了一个多级Hierarchical 框架,通过迭代地簇集和合并食品 items during 训练过程,从而使深度模型能够提取跨越标签的视觉特征。我们的方法在 VFN 营养数据集上进行评估,与现有工作在食品类型和食品 item 分类方面相比,取得了令人鼓舞的结果。

URL

https://arxiv.org/abs/2309.01075

PDF

https://arxiv.org/pdf/2309.01075.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot