Abstract
Image-based methods to analyze food images have alleviated the user burden and biases associated with traditional methods. However, accurate portion estimation remains a major challenge due to the loss of 3D information in the 2D representation of foods captured by smartphone cameras or wearable devices. In this paper, we propose a new framework to estimate both food volume and energy from 2D images by leveraging the power of 3D food models and physical reference in the eating scene. Our method estimates the pose of the camera and the food object in the input image and recreates the eating occasion by rendering an image of a 3D model of the food with the estimated poses. We also introduce a new dataset, SimpleFood45, which contains 2D images of 45 food items and associated annotations including food volume, weight, and energy. Our method achieves an average error of 31.10 kCal (17.67%) on this dataset, outperforming existing portion estimation methods.
Abstract (translated)
基于图像的方法来分析食品图像已经减轻了与传统方法相关的用户负担和偏见。然而,由于智能手机相机或可穿戴设备捕获的食品图像中丢失了3D信息,准确估计食品体积和能量仍然是一个主要挑战。在本文中,我们提出了一种新的框架,通过利用3D食物模型和人体参考来估计从2D图像中食品的体积和能量。我们的方法估计输入图像中相机的姿态和食品对象的姿态,并通过渲染一个3D模型来重建共进餐场景。我们还引入了一个新的数据集SimpleFood45,其中包含45个食品的2D图像和相关注释,包括食品体积、重量和能量。我们的方法在SimpleFood45数据集上的平均误差为31.10 kCal(17.67%),优于现有食品体积估计方法。
URL
https://arxiv.org/abs/2404.12257