Semi-Supervised Image Captioning by Adversarially Propagating Labeled Data

Abstract
Abstract (translated)
URL
PDF

Abstract

We present a novel data-efficient semi-supervised framework to improve the generalization of image captioning models. Constructing a large-scale labeled image captioning dataset is an expensive task in terms of labor, time, and cost. In contrast to manually annotating all the training samples, separately collecting uni-modal datasets is immensely easier, e.g., a large-scale image dataset and a sentence dataset. We leverage such massive unpaired image and caption data upon standard paired data by learning to associate them. To this end, our proposed semi-supervised learning method assigns pseudo-labels to unpaired samples in an adversarial learning fashion, where the joint distribution of image and caption is learned. Our method trains a captioner to learn from a paired data and to progressively associate unpaired data. This approach shows noticeable performance improvement even in challenging scenarios including out-of-task data (i.e., relational captioning, where the target task is different from the unpaired data) and web-crawled data. We also show that our proposed method is theoretically well-motivated and has a favorable global optimal property. Our extensive and comprehensive empirical results both on (1) image-based and (2) dense region-based captioning datasets followed by comprehensive analysis on the scarcely-paired COCO dataset demonstrate the consistent effectiveness of our semisupervised learning method with unpaired data compared to competing methods.

Abstract (translated)

我们提出了一种高效利用数据的新半监督框架,以改善图像摘要模型的泛化能力。建造大规模的标记图像摘要数据集是一项高昂的任务,在劳动、时间和成本方面。与手动标注所有训练样本相比,单独收集两向数据是更加容易的事情,例如大规模图像数据和句子数据集。我们利用标准配对数据来利用这种巨大的无配对图像和摘要数据,通过学习它们的关系来建立它们的联合分布。我们的 proposed 半监督学习方法将无配对样本赋予伪标签,以进行对抗性学习,学习图像和摘要的联合分布。我们的方法训练一个摘要器从配对数据学习,并逐渐与无配对数据建立关系。这种方法即使在包括任务外数据(即关系摘要,目标任务与无配对数据不同)和爬取数据等挑战性场景中也表现出显著的性能改进。我们还表明,我们的 proposed 方法在理论和实验上都有良好动机,并具有有利的全局最优性质。我们的广泛和全面的经验证据,分别基于 (1) 图像数据和 (2) 密集区域摘要数据,并针对很少配对的COCO数据集进行了全面分析,证明了我们半监督学习方法与无配对数据相比,与竞争方法的一致性有效性。

URL

https://arxiv.org/abs/2301.11174

PDF

https://arxiv.org/pdf/2301.11174.pdf