Paper Reading AI Learner

Improving Performance of Object Detection using the Mechanisms of Visual Recognition in Humans

2023-01-23 19:09:36
Amir Ghasemi, Fatemeh Mottaghian, Akram Bayat

Abstract

Object recognition systems are usually trained and evaluated on high resolution images. However, in real world applications, it is common that the images have low resolutions or have small sizes. In this study, we first track the performance of the state-of-the-art deep object recognition network, Faster- RCNN, as a function of image resolution. The results reveals negative effects of low resolution images on recognition performance. They also show that different spatial frequencies convey different information about the objects in recognition process. It means multi-resolution recognition system can provides better insight into optimal selection of features that results in better recognition of objects. This is similar to the mechanisms of the human visual systems that are able to implement multi-scale representation of a visual scene simultaneously. Then, we propose a multi-resolution object recognition framework rather than a single-resolution network. The proposed framework is evaluated on the PASCAL VOC2007 database. The experimental results show the performance of our adapted multi-resolution Faster-RCNN framework outperforms the single-resolution Faster-RCNN on input images with various resolutions with an increase in the mean Average Precision (mAP) of 9.14% across all resolutions and 1.2% on the full-spectrum images. Furthermore, the proposed model yields robustness of the performance over a wide range of spatial frequencies.

Abstract (translated)

对象识别系统通常需要在高分辨率图像上进行训练和评估。然而,在实际应用中,图像通常有低分辨率或大小较小。在本研究中,我们首先跟踪了先进的深度学习对象识别网络 Faster- RCNN 的性能,以图像分辨率为函数。实验结果表明,低分辨率图像对识别性能有负面影响。此外,还表明不同空间频率传递了关于识别过程中对象的不同信息。这意味着多分辨率识别系统可以更好地了解最优特征选择,从而更好地识别对象。这与人类视觉系统的机制类似,能够同时实现多尺度图像表示。然后,我们提出了一种多分辨率对象识别框架,而不是一个分辨率网络。该框架在PASCAL VOC2007数据库中进行评估。实验结果表明,我们适应的多分辨率 Faster- RCNN 框架在多种分辨率的输入图像中比单个分辨率的 Faster- RCNN 表现更好,同时平均精度(mAP)在所有分辨率上增加了9.14%,并在全频图像上增加了1.2%。此外,该模型表现出对不同空间频率范围内的性能鲁棒性。

URL

https://arxiv.org/abs/2301.09667

PDF

https://arxiv.org/pdf/2301.09667.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot