Paper Reading AI Learner

DAVE -- A Detect-and-Verify Paradigm for Low-Shot Counting

2024-04-25 14:07:52
Jer Pelhan, Alan Lukežič, Vitjan Zavrtanik, Matej Kristan

Abstract

Low-shot counters estimate the number of objects corresponding to a selected category, based on only few or no exemplars annotated in the image. The current state-of-the-art estimates the total counts as the sum over the object location density map, but does not provide individual object locations and sizes, which are crucial for many applications. This is addressed by detection-based counters, which, however fall behind in the total count accuracy. Furthermore, both approaches tend to overestimate the counts in the presence of other object classes due to many false positives. We propose DAVE, a low-shot counter based on a detect-and-verify paradigm, that avoids the aforementioned issues by first generating a high-recall detection set and then verifying the detections to identify and remove the outliers. This jointly increases the recall and precision, leading to accurate counts. DAVE outperforms the top density-based counters by ~20% in the total count MAE, it outperforms the most recent detection-based counter by ~20% in detection quality and sets a new state-of-the-art in zero-shot as well as text-prompt-based counting.

Abstract (translated)

低 shot 计数器根据图像中仅标注了几个或没有示例的类别的对象数量来估计选定类别的物体数量。 目前的最佳估计将总数计数视为物体位置密度图的求和,但并未提供物体位置和大小,这对于许多应用程序至关重要。这是通过基于检测的计数器来解决的,尽管在总数准确性上落后于最先进的计数器。此外,两种方法都容易在存在其他类别的物体时高估计数,由于许多误检。我们提出了 DAVE,一种基于检测和验证范式的低 shot 计数器,通过首先生成一个高召回度的检测集,然后验证检测结果来识别和删除异常值,从而避免了上述问题。这种方法共同提高了召回率和精度,导致准确的计数。与最先进的密度基于计数器相比,DAVE 在总数计数 MAE 上约 20% 更优,而在检测质量和零 shot 计数方面与最先进的检测基于计数器相当。此外,DAVE 在零 shot 和文本提示基于计数方面达到了最先进水平。

URL

https://arxiv.org/abs/2404.16622

PDF

https://arxiv.org/pdf/2404.16622.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot