Paper Reading AI Learner

Local Contrast and Global Contextual Information Make Infrared Small Object Salient Again

2023-01-28 05:18:13
Chenyi Wang, Huan Wang, Peiwen Pan

Abstract

Infrared small object detection (ISOS) aims to segment small objects only covered with several pixels from clutter background in infrared images. It's of great challenge due to: 1) small objects lack of sufficient intensity, shape and texture information; 2) small objects are easily lost in the process where detection models, say deep neural networks, obtain high-level semantic features and image-level receptive fields through successive downsampling. This paper proposes a reliable detection model for ISOS, dubbed UCFNet, which can handle well the two issues. It builds upon central difference convolution (CDC) and fast Fourier convolution (FFC). On one hand, CDC can effectively guide the network to learn the contrast information between small objects and the background, as the contrast information is very essential in human visual system dealing with the ISOS task. On the other hand, FFC can gain image-level receptive fields and extract global information while preventing small objects from being overwhelmed.Experiments on several public datasets demonstrate that our method significantly outperforms the state-of-the-art ISOS models, and can provide useful guidelines for designing better ISOS deep models. Codes will be available soon.

Abstract (translated)

红外小物体检测(ISOS)的目标是在红外图像中只包含几像素的杂乱背景中识别小物体。这是一个具有巨大挑战的任务,因为:1) 小物体缺乏足够的强度、形状和纹理信息;2) 小物体很容易在检测模型(如深度学习网络)通过连续降采样获得高级别语义特征和图像级响应面的过程中丢失。本文提出了一个可靠的ISOS检测模型,称为UCFNet,它能够处理这两个问题。它基于中心差分卷积(CDC)和快速傅里叶卷积(FFC)。一方面,CDC能够有效地指导网络学习小物体和背景之间的对比信息,因为对比信息在人类视觉系统中处理ISOS任务非常重要。另一方面,FFC能够获得图像级响应面并提取全局信息,同时防止小物体被淹没。在多个公共数据集上的实验表明,我们的方法 significantly outperforms 现有的ISOS模型,并可以为设计更好的ISOS深度模型提供有用的指导。代码将很快可用。

URL

https://arxiv.org/abs/2301.12093

PDF

https://arxiv.org/pdf/2301.12093.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot