Paper Reading AI Learner

Rotated Feature Network for multi-orientation object detection

2019-03-23 16:10:11
Zhixin Zhang, Xudong Chen, Jie Lie, Kaibo Zhou

Abstract

General detectors follow the pipeline that feature maps extracted from ConvNets are shared between classification and regression tasks. However, there exists obvious conflicting requirements in multi-orientation object detection that classification is insensitive to orientations, while regression is quite sensitive. To address this issue, we provide an Encoder-Decoder architecture, called Rotated Feature Network (RFN), which produces rotation-sensitive feature maps (RS) for regression and rotation-invariant feature maps (RI) for classification. Specifically, the Encoder unit assigns weights for rotated feature maps. The Decoder unit extracts RS and RI by performing resuming operator on rotated and reweighed feature maps, respectively. To make the rotation-invariant characteristics more reliable, we adopt a metric to quantitatively evaluate the rotation-invariance by adding a constrain item in the loss, yielding a promising detection performance. Compared with the state-of-the-art methods, our method can achieve significant improvement on NWPU VHR-10 and RSOD datasets. We further evaluate the RFN on the scene classification in remote sensing images and object detection in natural images, demonstrating its good generalization ability. The proposed RFN can be integrated into an existing framework, leading to great performance with only a slight increase in model complexity.

Abstract (translated)

一般的检测器遵循这样的流程:从convnets中提取的特征图在分类和回归任务之间共享。然而,多方位目标检测中存在着明显的冲突要求,即分类对方位不敏感,而回归则相当敏感。为了解决这个问题,我们提供了一个编码器-解码器架构,称为旋转特征网络(RFN),它为回归生成旋转敏感特征映射(RS),为分类生成旋转不变特征映射(RI)。具体来说,编码器单元为旋转特征图指定权重。译码器单元通过分别对旋转和重新称重的特征图执行恢复操作来提取RS和RI。为了使旋转不变量特性更加可靠,我们采用了一种度量方法,通过在损失中添加一个约束项来定量评估旋转不变性,从而获得了一种有前途的检测性能。与目前最先进的方法相比,我们的方法可以在nwpu vhr-10和rsod数据集上取得显著的改进。对遥感图像中的场景分类和自然图像中的目标检测进行了进一步的评价,证明了其良好的泛化能力。提出的RFN可以集成到现有的框架中,从而在只增加模型复杂性的情况下获得良好的性能。

URL

https://arxiv.org/abs/1903.09839

PDF

https://arxiv.org/pdf/1903.09839.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot