Abstract
Plant disease classification via imaging is a critical task in precision agriculture. We propose XMACNet, a novel light-weight Convolutional Neural Network (CNN) that integrates self-attention and multi-modal fusion of visible imagery and vegetation indices for chili disease detection. XMACNet uses an EfficientNetV2S backbone enhanced by a self-attention module and a fusion branch that processes both RGB images and computed vegetation index maps (NDVI, NPCI, MCARI). We curated a new dataset of 12,000 chili leaf images across six classes (five disease types plus healthy), augmented synthetically via StyleGAN to mitigate data scarcity. Trained on this dataset, XMACNet achieves high accuracy, F1-score, and AUC, outperforming baseline models such as ResNet-50, MobileNetV2, and a Swin Transformer variant. Crucially, XMACNet is explainable: we use Grad-CAM++ and SHAP to visualize and quantify the models focus on disease features. The models compact size and fast inference make it suitable for edge deployment in real-world farming scenarios.
Abstract (translated)
通过图像识别进行植物病害分类是精准农业中的关键任务。我们提出了一种名为XMACNet的新颖轻量级卷积神经网络(CNN),该网络结合了自注意力机制以及可见光图像和植被指数的多模态融合,用于辣椒病害检测。XMACNet采用EfficientNetV2S骨干网,并通过添加一个自注意模块和一个多模态融合分支来增强其性能,该分支同时处理RGB图像和计算出的植被指数图(NDVI、NPCCI、MCARI)。我们整理了一个新的数据集,包含12,000张辣椒叶片图片,涵盖了六种类别(五种病害类型加上健康状态),并通过StyleGAN合成增广来缓解数据不足的问题。在该数据集上训练后,XMACNet取得了高准确率、F1值和AUC值,并优于基准模型如ResNet-50、MobileNetV2以及Swin Transformer变体的表现。尤为重要的是,XMACNet具有可解释性:我们通过Grad-CAM++和SHAP来可视化并量化模型对病害特征的聚焦情况。该模型的小巧体积和快速推理使其非常适合部署在实际农业场景中的边缘设备上。
URL
https://arxiv.org/abs/2603.06750