Paper Reading AI Learner

Label-free Anomaly Detection in Aerial Agricultural Images with Masked Image Modeling

2024-04-13 08:49:17
Sambal Shikhar, Anupam Sobti

Abstract

Detecting various types of stresses (nutritional, water, nitrogen, etc.) in agricultural fields is critical for farmers to ensure maximum productivity. However, stresses show up in different shapes and sizes across different crop types and varieties. Hence, this is posed as an anomaly detection task in agricultural images. Accurate anomaly detection in agricultural UAV images is vital for early identification of field irregularities. Traditional supervised learning faces challenges in adapting to diverse anomalies, necessitating extensive annotated data. In this work, we overcome this limitation with self-supervised learning using a masked image modeling approach. Masked Autoencoders (MAE) extract meaningful normal features from unlabeled image samples which produces high reconstruction error for the abnormal pixels during reconstruction. To remove the need of using only ``normal" data while training, we use an anomaly suppression loss mechanism that effectively minimizes the reconstruction of anomalous pixels and allows the model to learn anomalous areas without explicitly separating ``normal" images for training. Evaluation on the Agriculture-Vision data challenge shows a mIOU score improvement in comparison to prior state of the art in unsupervised and self-supervised methods. A single model generalizes across all the anomaly categories in the Agri-Vision Challenge Dataset

Abstract (translated)

检测不同类型的压力(包括营养、水、氮等)在农田中的情况对农民确保最大产量至关重要。然而,不同作物类型和品种的农田中,压力呈现出各种形状和大小。因此,在农业图像中,这被视为一个异常检测任务。在农业UAV图像中准确检测异常对于早期识别田地不规则是至关重要的。然而,传统的监督学习在适应多样异常方面面临挑战,需要大量注释数据。在本文中,我们通过使用带遮罩的图像建模方法克服了这一限制。带遮罩的自动编码器(MAE)从未标记的图像样本中提取有意义的正常特征,在重构过程中对异常像素产生高重建误差。为了在训练过程中不需要仅使用“正常”数据,我们使用异常抑制损失机制,有效地最小化异常像素的重建,并允许模型在训练过程中学习异常区域而无需明确分离“正常”图像。在农业视觉数据挑战上进行的评估显示,与未经监督和自监督方法的前沿状态相比,mIOU得分有所提高。在Agri-Vision挑战数据集上,单个模型在所有异常类别上具有泛化能力。

URL

https://arxiv.org/abs/2404.08931

PDF

https://arxiv.org/pdf/2404.08931.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot