Abstract
We introduce an interactive learning framework for the development and testing of intelligent visual systems, called learning-by-asking (LBA). We explore LBA in context of the Visual Question Answering (VQA) task. LBA differs from standard VQA training in that most questions are not observed during training time, and the learner must ask questions it wants answers to. Thus, LBA more closely mimics natural learning and has the potential to be more data-efficient than the traditional VQA setting. We present a model that performs LBA on the CLEVR dataset, and show that it automatically discovers an easy-to-hard curriculum when learning interactively from an oracle. Our LBA generated data consistently matches or outperforms the CLEVR train data and is more sample efficient. We also show that our model asks questions that generalize to state-of-the-art VQA models and to novel test time distributions.
Abstract (translated)
我们为智能视觉系统的开发和测试引入了一种交互式学习框架,称为按问题学习(LBA)。我们在视觉问答(VQA)任务的背景下探索LBA。 LBA不同于标准的VQA培训,因为在培训时间内大多数问题没有被观察到,学习者必须提出它想要回答的问题。因此,LBA更接近模仿自然学习,并有可能比传统的VQA设置更具数据效率。我们提供了一个在CLEVR数据集上执行LBA的模型,并表明它在从Oracle预先交互学习时会自动发现一个易于编程的课程。我们的LBA生成的数据始终匹配或优于CLEVR列车数据,并且更高效。我们还展示了我们的模型提出的问题可以推广到最先进的VQA模型和新的测试时间分布。
URL
https://arxiv.org/abs/1712.01238