Generating Diverse and Meaningful Captions

2018-12-19 18:10:18

Annika Lindh, Robert J. Ross, Abhijit Mahalunkar, Giancarlo Salton, John D. Kelleher

arXiv_CV

Abstract
Abstract (translated)
URL
PDF

Abstract

Image Captioning is a task that requires models to acquire a multi-modal understanding of the world and to express this understanding in natural language text. While the state-of-the-art for this task has rapidly improved in terms of n-gram metrics, these models tend to output the same generic captions for similar images. In this work, we address this limitation and train a model that generates more diverse and specific captions through an unsupervised training approach that incorporates a learning signal from an Image Retrieval model. We summarize previous results and improve the state-of-the-art on caption diversity and novelty. We make our source code publicly available online.

Abstract (translated)

URL

https://arxiv.org/abs/1812.08126

PDF

https://arxiv.org/pdf/1812.08126.pdf