Paper Reading AI Learner

Combining Deep Metric Learning Approaches for Aerial Scene Classification

2023-03-20 18:49:39
Fabio A. Faria, Luiz H. Buris, Fábio A. M. Cappabianco, Luis A. M. Pereira

Abstract

Aerial scene classification, which aims to semantically label remote sensing images in a set of predefined classes (e.g., agricultural, beach, and harbor), is a very challenging task in remote sensing due to high intra-class variability and the different scales and orientations of the objects present in the dataset images. In remote sensing area, the use of CNN architectures as an alternative solution is also a reality for scene classification tasks. Generally, these CNNs are used to perform the traditional image classification task. However, another less used way to classify remote sensing image might be the one that uses deep metric learning (DML) approaches. In this sense, this work proposes to employ six DML approaches for aerial scene classification tasks, analysing their behave with four different pre-trained CNNs as well as combining them through the use of evolutionary computation algorithm (UMDA). In performed experiments, it is possible to observe than DML approaches can achieve the best classification results when compared to traditional pre-trained CNNs for three well-known remote sensing aerial scene datasets. In addition, the UMDA algorithm proved to be a promising strategy to combine DML approaches when there is diversity among them, managing to improve at least 5.6% of accuracy in the classification results using almost 50\% of the available classifiers for the construction of the final ensemble of classifiers.

Abstract (translated)

空中场景分类旨在对预定义类别的遥感图像进行语义分类(例如农业、海滩和港口),这是一个在遥感中非常具有挑战性的任务,因为这些类别内部存在高度的多样性和图像数据中物体的尺度和方向不同。在遥感区域,使用CNN架构作为替代方案也是一个重要的任务。一般而言,这些CNN用于执行传统的图像分类任务。然而,另一种不太常用的分类方法可能是使用深度度量学习(DML)方法。因此,本工作提议使用六个MLD方法来处理空中场景分类任务,分析它们与四种不同的预训练CNN的行为,并使用进化计算算法(UMDA)将它们组合起来。在执行的实验中,可以观察到MLD方法能够相较于传统预训练CNNs在三个著名的遥感空中场景数据集上实现最佳分类结果。此外,UMDA算法证明是一种有前途的方法,当其中存在多样性时,将它们组合起来,通过使用几乎可用的Classifiers的50%。

URL

https://arxiv.org/abs/2303.11389

PDF

https://arxiv.org/pdf/2303.11389.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot