Abstract
This paper discusses and demonstrates the outcomes from our experimentation on Image Captioning. Image captioning is a much more involved task than image recognition or classification, because of the additional challenge of recognizing the interdependence between the objects/concepts in the image and the creation of a succinct sentential narration. Experiments on several labeled datasets show the accuracy of the model and the fluency of the language it learns solely from image descriptions. As a toy application, we apply image captioning to create video captions, and we advance a few hypotheses on the challenges we encountered.
Abstract (translated)
本文讨论并演示了我们对图像字幕进行实验的结果。由于识别图像中物体/概念之间的相互依赖以及创建简洁的句子叙述的额外挑战,图像字幕比图像识别或分类涉及更多的任务。对几个标记数据集的实验显示模型的准确性以及仅从图像描述中学习的语言的流畅性。作为一款玩具应用,我们将图像字幕应用于制作视频标题,并就我们遇到的挑战提出了一些假设。
URL
https://arxiv.org/abs/1805.09137