Abstract
Human conversation is a complex mechanism with subtle nuances. It is hence an ambitious goal to develop artificial intelligence agents that can participate fluently in a conversation. While we are still far from achieving this goal, recent progress in visual question answering, image captioning, and visual question generation shows that dialog systems may be realizable in the not too distant future. To this end, a novel dataset was introduced recently and encouraging results were demonstrated, particularly for question answering. In this paper, we demonstrate a simple symmetric discriminative baseline, that can be applied to both predicting an answer as well as predicting a question. We show that this method performs on par with the state of the art, even memory net based methods. In addition, for the first time on the visual dialog dataset, we assess the performance of a system asking questions, and demonstrate how visual dialog can be generated from discriminative question generation and question answering.
Abstract (translated)
人类交谈是一个复杂的机制,具有微妙的细微差别。因此,开发能够在对话中流畅地参与的人工智能代理是一个雄心勃勃的目标。尽管我们还远未实现这一目标,但近期在视觉问题回答,图像字幕和视觉问题生成方面取得的进展表明,对话系统可能在不远的将来实现。为此,最近引入了一个新的数据集,并且展示了令人鼓舞的结果,特别是对于问题回答。在本文中,我们展示了一个简单的对称区分基线,可用于预测答案以及预测问题。我们证明这种方法与现有技术相同,即使是基于记忆网络的方法。此外,我们首次在视觉对话数据集上评估系统提问的性能,并演示如何从歧视性问题生成和问题回答中生成视觉对话。
URL
https://arxiv.org/abs/1803.11186