Paper Reading AI Learner

Needle in a Haystack: A Framework for Seeking Small Objects in Big Datasets

2019-03-24 16:57:21
Joel Brogan, Aparna Bharati, Daniel Moreira, Kevin Bowyer, Patrick Flynn, Anderson Rocha, Walter Scheirer

Abstract

Images from social media can reflect diverse viewpoints, heated arguments, and expressions of creativity --- adding new complexity to search tasks. Researchers working on Content-Based Image Retrieval (CBIR) have traditionally tuned their search algorithms to match filtered results with user search intent. However, we are now bombarded with composite images of unknown origin, authenticity, and even meaning. With such uncertainty, users may not have an initial idea of what the results of a search query should look like. For instance, hidden people, spliced objects, and subtly altered scenes can be difficult for a user to detect initially in a meme image, but may contribute significantly to its composition. We propose a new framework for image retrieval that models object-level regions using image keypoints retrieved from an image index, which are then used to accurately weight small contributing objects within the results, without the need for costly object detection steps. We call this method Needle-Haystack (NH) scoring, and it is optimized for fast matrix operations on CPUs. We show that this method not only performs comparably to state-of-the-art methods in classic CBIR problems, but also outperforms them in fine-grained object- and instance-level retrieval on the Oxford 5K, Paris 6K, Google-Landmarks, and NIST MFC2018 datasets, as well as meme-style imagery from Reddit.

Abstract (translated)

来自社交媒体的图片可以反映出不同的观点、激烈的争论和创造性的表达——这给搜索任务增加了新的复杂性。研究基于内容的图像检索(CBIR)的研究人员传统上已经调整了他们的搜索算法,以便将过滤后的结果与用户的搜索意图相匹配。然而,我们现在受到了来自未知来源、真实性甚至意义的合成图像的轰炸。有了这样的不确定性,用户可能对搜索查询的结果应该是什么样子没有初步的了解。例如,对于用户来说,隐藏的人、拼接的对象和细微变化的场景在最初的Meme图像中很难检测到,但可能对其组成有很大影响。我们提出了一种新的图像检索框架,该框架使用从图像索引中检索到的图像关键点对对象级区域进行建模,然后使用图像索引精确地加权结果中的小贡献对象,而不需要昂贵的对象检测步骤。我们称这种方法为针草垛(NH)评分,它是针对CPU上的快速矩阵操作而优化的。我们证明,这种方法不仅在经典CBIR问题上的性能与最先进的方法相当,而且在牛津5K、巴黎6K、谷歌地标和NIST MFC2018数据集的细粒度对象和实例级检索以及Reddit的MEME风格图像方面也优于它们。

URL

https://arxiv.org/abs/1903.10019

PDF

https://arxiv.org/pdf/1903.10019.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot