Abstract
Images from social media can reflect diverse viewpoints, heated arguments, and expressions of creativity --- adding new complexity to search tasks. Researchers working on Content-Based Image Retrieval (CBIR) have traditionally tuned their search algorithms to match filtered results with user search intent. However, we are now bombarded with composite images of unknown origin, authenticity, and even meaning. With such uncertainty, users may not have an initial idea of what the results of a search query should look like. For instance, hidden people, spliced objects, and subtly altered scenes can be difficult for a user to detect initially in a meme image, but may contribute significantly to its composition. We propose a new framework for image retrieval that models object-level regions using image keypoints retrieved from an image index, which are then used to accurately weight small contributing objects within the results, without the need for costly object detection steps. We call this method Needle-Haystack (NH) scoring, and it is optimized for fast matrix operations on CPUs. We show that this method not only performs comparably to state-of-the-art methods in classic CBIR problems, but also outperforms them in fine-grained object- and instance-level retrieval on the Oxford 5K, Paris 6K, Google-Landmarks, and NIST MFC2018 datasets, as well as meme-style imagery from Reddit.
Abstract (translated)
来自社交媒体的图片可以反映出不同的观点、激烈的争论和创造性的表达——这给搜索任务增加了新的复杂性。研究基于内容的图像检索(CBIR)的研究人员传统上已经调整了他们的搜索算法,以便将过滤后的结果与用户的搜索意图相匹配。然而,我们现在受到了来自未知来源、真实性甚至意义的合成图像的轰炸。有了这样的不确定性,用户可能对搜索查询的结果应该是什么样子没有初步的了解。例如,对于用户来说,隐藏的人、拼接的对象和细微变化的场景在最初的Meme图像中很难检测到,但可能对其组成有很大影响。我们提出了一种新的图像检索框架,该框架使用从图像索引中检索到的图像关键点对对象级区域进行建模,然后使用图像索引精确地加权结果中的小贡献对象,而不需要昂贵的对象检测步骤。我们称这种方法为针草垛(NH)评分,它是针对CPU上的快速矩阵操作而优化的。我们证明,这种方法不仅在经典CBIR问题上的性能与最先进的方法相当,而且在牛津5K、巴黎6K、谷歌地标和NIST MFC2018数据集的细粒度对象和实例级检索以及Reddit的MEME风格图像方面也优于它们。
URL
https://arxiv.org/abs/1903.10019