Abstract
Image and video retrieval by their semantic content has been an important and challenging task for years, because it ultimately requires bridging the symbolic/subsymbolic gap. Recent successes in deep learning enabled detection of objects belonging to many classes greatly outperforming traditional computer vision techniques. However, deep learning solutions capable of executing retrieval queries are still not available. We propose a hybrid solution consisting of a deep neural network for object detection and a cognitive architecture for query execution. Specifically, we use YOLOv2 and OpenCog. Queries allowing the retrieval of video frames containing objects of specified classes and specified spatial arrangement are implemented.
Abstract (translated)
图像和视频检索的语义内容多年来一直是一项重要且具有挑战性的任务,因为它最终需要弥合符号/子代码差距。最近在深度学习方面取得的成功使检测属于许多类别的对象的性能远远优于传统的计算机视觉技术。但是,能够执行检索查询的深度学习解决方案仍然不可用。我们提出了一个混合解决方案,包括用于对象检测的深度神经网络和用于查询执行的认知架构。具体来说,我们使用YOLOv2和OpenCog。允许检索包含指定类别的对象和指定的空间布置的视频帧的查询被实现。
URL
https://arxiv.org/abs/1806.06946