Abstract
Compositional Zero-Shot Learning (CZSL) aims to learn semantic primitives (attributes and objects) from seen compositions and recognize unseen attribute-object compositions. Existing CZSL datasets focus on single attributes, neglecting the fact that objects naturally exhibit multiple interrelated attributes. Real-world objects often possess multiple interrelated attributes, and current datasets' narrow attribute scope and single attribute labeling introduce annotation biases, undermining model performance and evaluation. To address these limitations, we introduce the Multi-Attribute Composition (MAC) dataset, encompassing 18,217 images and 11,067 compositions with comprehensive, representative, and diverse attribute annotations. MAC includes an average of 30.2 attributes per object and 65.4 objects per attribute, facilitating better multi-attribute composition predictions. Our dataset supports deeper semantic understanding and higher-order attribute associations, providing a more realistic and challenging benchmark for the CZSL task. We also develop solutions for multi-attribute compositional learning and propose the MM-encoder to disentangling the attributes and objects.
Abstract (translated)
组合零 shot 学习(CZSL)旨在通过观察到的作品学习语义原型(属性)并识别未见到的属性-对象组合。现有的 CZSL 数据集集中只关注单个属性,忽视了对象自然表现出多种相关属性的事实。现实世界的物体通常具有多个相关属性,而当前数据集的狭窄属性范围和单属性标注导致标注偏差,削弱了模型的性能和评估。为了克服这些限制,我们引入了多属性组合(MAC)数据集,包括18,217个图像和11,067个组合,具有全面的、代表性的、多样性的属性注释。MAC 包括每个对象的平均30.2个属性以及每个属性的65.4个对象,从而促进更好的多属性组合预测。我们的数据集支持更深的语义理解和高阶属性关联,为 CZSL 任务提供了一个更真实和具有挑战性的基准。我们还开发了多属性组合学习解决方案,并提出 MM-编码器来解离属性和对象。
URL
https://arxiv.org/abs/2406.12757