Abstract
Existing datasets for attribute value extraction (AVE) predominantly focus on explicit attribute values while neglecting the implicit ones, lack product images, are often not publicly available, and lack an in-depth human inspection across diverse domains. To address these limitations, we present ImplicitAVE, the first, publicly available multimodal dataset for implicit attribute value extraction. ImplicitAVE, sourced from the MAVE dataset, is carefully curated and expanded to include implicit AVE and multimodality, resulting in a refined dataset of 68k training and 1.6k testing data across five domains. We also explore the application of multimodal large language models (MLLMs) to implicit AVE, establishing a comprehensive benchmark for MLLMs on the ImplicitAVE dataset. Six recent MLLMs with eleven variants are evaluated across diverse settings, revealing that implicit value extraction remains a challenging task for MLLMs. The contributions of this work include the development and release of ImplicitAVE, and the exploration and benchmarking of various MLLMs for implicit AVE, providing valuable insights and potential future research directions. Dataset and code are available at this https URL
Abstract (translated)
目前的数据集(AVE)主要关注显式属性值,而忽略了隐含属性值,缺乏产品图像,通常不公开提供,并且不同领域的隐含属性值缺乏深入的人检查。为了应对这些限制,我们提出了ImplicitAVE,第一个公开可用的多模态数据集,用于隐含属性值提取。ImplicitAVE来自MAVE数据集,经过精心挑选和扩展,包括隐含AVE和多模态,从而形成了一个五个领域的精炼数据集,包括68k个训练数据和16k个测试数据。我们还探讨了将多模态大型语言模型(MLLMs)应用于隐含AVE的应用,为MLLMs在ImplicitAVE数据集上建立了全面的基准。六种最近的多模态大型语言模型(MLLMs)在各种设置中进行了评估,揭示了MLLMs在隐含AVE方面的挑战仍然存在。本工作的贡献包括ImplicitAVE的开发和发布,以及探索和评估各种MLLMs用于隐含AVE,为未来的研究提供了宝贵的见解和可能的研究方向。数据集和代码都可以在上述链接中找到。
URL
https://arxiv.org/abs/2404.15592