Dense Distinct Query for End-to-End Object Detection

Abstract
Abstract (translated)
URL
PDF

Abstract

One-to-one label assignment in object detection has successfully obviated the need for non-maximum suppression (NMS) as postprocessing and makes the pipeline end-to-end. However, it triggers a new dilemma as the widely used sparse queries cannot guarantee a high recall, while dense queries inevitably bring more similar queries and encounter optimization difficulties. As both sparse and dense queries are problematic, then what are the expected queries in end-to-end object detection? This paper shows that the solution should be Dense Distinct Queries (DDQ). Concretely, we first lay dense queries like traditional detectors and then select distinct ones for one-to-one assignments. DDQ blends the advantages of traditional and recent end-to-end detectors and significantly improves the performance of various detectors including FCN, R-CNN, and DETRs. Most impressively, DDQ-DETR achieves 52.1 AP on MS-COCO dataset within 12 epochs using a ResNet-50 backbone, outperforming all existing detectors in the same setting. DDQ also shares the benefit of end-to-end detectors in crowded scenes and achieves 93.8 AP on CrowdHuman. We hope DDQ can inspire researchers to consider the complementarity between traditional methods and end-to-end detectors. The source code can be found at \url{this https URL}.

Abstract (translated)

在物体检测中,一对一的标签分配已经成功避免了使用非最大抑制(NMS)作为后续处理并实现了整个流程的端到端。然而,它引发了一个新困境,因为广泛使用的稀疏查询不能保证高召回率,而Dense查询不可避免地会导致更多的类似查询并遇到优化难题。由于稀疏和Dense查询都存在问题,那么什么是端到端物体检测中预期查询的问题?本论文表明,解决方案应该是Dense distinct queries(DDQ)。具体来说,我们首先像传统的探测器一样布置Dense查询,然后选择唯一的DDQ作为一对一的分配。DDQ将传统的和最近的端到端探测器的优势相结合,并显著改进了包括Fcn、R-CNN和DeTRs等多种探测器的性能。最令人瞩目的是,DDQ-DETR在MS-COCO数据集上使用ResNet-50骨架在12 epochs内实现了52.1AP,在所有现有探测器中表现最优。DDQ也在拥挤的场景中分享了端到端探测器的优势,并在人群人类数据集上实现了93.8AP。我们希望DDQ能够激励研究人员考虑传统方法和端到端探测器之间的互补性。源代码可以在\url{this https URL}找到。

URL

https://arxiv.org/abs/2303.12776

PDF

https://arxiv.org/pdf/2303.12776.pdf