Paper Reading AI Learner

Achieving Rotation Invariance in Convolution Operations: Shifting from Data-Driven to Mechanism-Assured

2024-04-17 12:21:57
Hanlin Mo, Guoying Zhao

Abstract

Achieving rotation invariance in deep neural networks without relying on data has always been a hot research topic. Intrinsic rotation invariance can enhance the model's feature representation capability, enabling better performance in tasks such as multi-orientation object recognition and detection. Based on various types of non-learnable operators, including gradient, sort, local binary pattern, maximum, etc., this paper designs a set of new convolution operations that are natually invariant to arbitrary rotations. Unlike most previous studies, these rotation-invariant convolutions (RIConvs) have the same number of learnable parameters and a similar computational process as conventional convolution operations, allowing them to be interchangeable. Using the MNIST-Rot dataset, we first verify the invariance of these RIConvs under various rotation angles and compare their performance with previous rotation-invariant convolutional neural networks (RI-CNNs). Two types of RIConvs based on gradient operators achieve state-of-the-art results. Subsequently, we combine RIConvs with different types and depths of classic CNN backbones. Using the OuTex_00012, MTARSI, and NWPU-RESISC-45 datasets, we test their performance on texture recognition, aircraft type recognition, and remote sensing image classification tasks. The results show that RIConvs significantly improve the accuracy of these CNN backbones, especially when the training data is limited. Furthermore, we find that even with data augmentation, RIConvs can further enhance model performance.

Abstract (translated)

在不依赖数据的情况下实现深度神经网络的旋转不变性一直是一个热门的研究课题。固有旋转不变性可以增强模型的特征表示能力,从而在诸如多方向物体识别和检测等任务中取得更好的性能。根据各种非学习操作类型,包括梯度、排序、局部二值模式、最大等,本文设计了一组新的卷积操作,它们自然对任意旋转对称。与大多数之前的研究不同,这些旋转不变的卷积操作(RIConvs)具有与传统卷积操作相同的学习参数和类似的计算过程,因此可以互换。使用MNIST-Rot数据集,我们首先验证这些RIConvs在各种旋转角度下的不变性,并将其性能与之前的目标不变卷积神经网络(RI-CNNs)进行比较。基于梯度操作的两种RIConvs达到了最先进的结果。接着,我们将RIConvs与不同类型和深度的经典卷积网络骨干相结合。使用OuTex_00012、MTARSI和NWPU-RESISC-45数据集,我们测试了它们在纹理识别、飞机类型识别和遥感图像分类任务上的性能。结果表明,RIConvs显著提高了这些卷积网络骨干的准确性,特别是在训练数据有限的情况下。此外,我们发现,即使进行数据增强,RIConvs也可以进一步提高模型性能。

URL

https://arxiv.org/abs/2404.11309

PDF

https://arxiv.org/pdf/2404.11309.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot