DeepDiary: Automatic Caption Generation for Lifelogging Image Streams

Abstract
Abstract (translated)
URL
PDF

Abstract

Lifelogging cameras capture everyday life from a first-person perspective, but generate so much data that it is hard for users to browse and organize their image collections effectively. In this paper, we propose to use automatic image captioning algorithms to generate textual representations of these collections. We develop and explore novel techniques based on deep learning to generate captions for both individual images and image streams, using temporal consistency constraints to create summaries that are both more compact and less noisy. We evaluate our techniques with quantitative and qualitative results, and apply captioning to an image retrieval application for finding potentially private images. Our results suggest that our automatic captioning algorithms, while imperfect, may work well enough to help users manage lifelogging photo collections.

Abstract (translated)

从第一人称角度来看，自动记录摄像头捕捉日常生活，但生成的数据太多，以致用户难以有效地浏览和组织其图像集合。在本文中，我们建议使用自动图像字幕算法来生成这些集合的文本表示。我们开发并探索基于深度学习的新技术，为单个图像和图像流生成标题，使用时间一致性约束创建更紧凑和更少噪音的摘要。我们使用定量和定性结果评估我们的技术，并将字幕应用于图像检索应用程序以查找潜在的私人图像。我们的结果表明，我们的自动字幕算法虽然不完美，但可能足以帮助用户管理生活照片集。

URL

https://arxiv.org/abs/1608.03819

PDF

https://arxiv.org/pdf/1608.03819.pdf