Paper Reading AI Learner

A multi-stage augmented multimodal interaction network for fish feeding intensity quantification

2025-06-17 04:09:43
Shulong Zhang, Mingyuan Yao, Jiayin Zhao, Xiao Liu, Haihua Wang

Abstract

In recirculating aquaculture systems, accurate and effective assessment of fish feeding intensity is crucial for reducing feed costs and calculating optimal feeding times. However, current studies have limitations in modality selection, feature extraction and fusion, and co-inference for decision making, which restrict further improvement in the accuracy, applicability and reliability of multimodal fusion models. To address this problem, this study proposes a Multi-stage Augmented Multimodal Interaction Network (MAINet) for quantifying fish feeding intensity. Firstly, a general feature extraction framework is proposed to efficiently extract feature information from input image, audio and water wave datas. Second, an Auxiliary-modality Reinforcement Primary-modality Mechanism (ARPM) is designed for inter-modal interaction and generate enhanced features, which consists of a Channel Attention Fusion Network (CAFN) and a Dual-mode Attention Fusion Network (DAFN). Finally, an Evidence Reasoning (ER) rule is introduced to fuse the output results of each modality and make decisions, thereby completing the quantification of fish feeding intensity. The experimental results show that the constructed MAINet reaches 96.76%, 96.78%, 96.79% and 96.79% in accuracy, precision, recall and F1-Score respectively, and its performance is significantly higher than the comparison models. Compared with models that adopt single-modality, dual-modality fusion and different decision-making fusion methods, it also has obvious advantages. Meanwhile, the ablation experiments further verified the key role of the proposed improvement strategy in improving the robustness and feature utilization efficiency of model, which can effectively improve the accuracy of the quantitative results of fish feeding intensity.

Abstract (translated)

在循环水养殖系统中,准确且有效地评估鱼类喂食强度对于降低饲料成本和计算最佳喂食时间至关重要。然而,目前的研究在模态选择、特征提取与融合以及协同推理以进行决策制定方面存在局限性,这些限制了多模态融合模型的准确性、适用性和可靠性进一步提升的可能性。为解决这一问题,本研究提出了一种多层次增强多模态交互网络(MAINet)来量化鱼类喂食强度。 首先,本文提出了一个通用特征提取框架,能够有效地从输入图像、音频和水波数据中提取特征信息。其次,设计了辅助模态强化主模态机制(ARPM),用于跨模态互动并生成增强的特征,其中包括通道注意融合网络(CAFN)和双模式注意融合网络(DAFN)。最后,引入证据推理(ER)规则来融合各模态的输出结果,并做出决策,从而完成对鱼类喂食强度的量化。 实验结果显示,构建的MAINet在准确性、精确性、召回率和F1分数方面分别达到了96.76%、96.78%、96.79%和96.79%,其性能显著高于对比模型。与采用单一模态融合、双模态融合以及不同决策融合方法的模型相比,该网络也有明显的优势。同时,消融实验进一步验证了所提出的改进策略在提升模型鲁棒性和特征利用效率的关键作用,能够有效提高鱼类喂食强度量化结果的准确性。

URL

https://arxiv.org/abs/2506.14170

PDF

https://arxiv.org/pdf/2506.14170.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot