Paper Reading AI Learner

TinyVQA: Compact Multimodal Deep Neural Network for Visual Question Answering on Resource-Constrained Devices

2024-04-04 16:38:49
Hasib-Al Rashid, Argho Sarkar, Aryya Gangopadhyay, Maryam Rahnemoonfar, Tinoosh Mohsenin

Abstract

Traditional machine learning models often require powerful hardware, making them unsuitable for deployment on resource-limited devices. Tiny Machine Learning (tinyML) has emerged as a promising approach for running machine learning models on these devices, but integrating multiple data modalities into tinyML models still remains a challenge due to increased complexity, latency, and power consumption. This paper proposes TinyVQA, a novel multimodal deep neural network for visual question answering tasks that can be deployed on resource-constrained tinyML hardware. TinyVQA leverages a supervised attention-based model to learn how to answer questions about images using both vision and language modalities. Distilled knowledge from the supervised attention-based VQA model trains the memory aware compact TinyVQA model and low bit-width quantization technique is employed to further compress the model for deployment on tinyML devices. The TinyVQA model was evaluated on the FloodNet dataset, which is used for post-disaster damage assessment. The compact model achieved an accuracy of 79.5%, demonstrating the effectiveness of TinyVQA for real-world applications. Additionally, the model was deployed on a Crazyflie 2.0 drone, equipped with an AI deck and GAP8 microprocessor. The TinyVQA model achieved low latencies of 56 ms and consumes 693 mW power while deployed on the tiny drone, showcasing its suitability for resource-constrained embedded systems.

Abstract (translated)

传统的机器学习模型通常需要强大的硬件,这使得它们不适合在资源受限的设备上部署。Tiny Machine Learning (tinyML) 作为一种有前景的方法,为在资源受限的设备上运行机器学习模型提供了良好的途径。然而,将多个数据模态集成到 tinyML 模型中仍然具有挑战性,因为增加了复杂性、延迟和功耗。本文提出了一种名为 TinyVQA 的新颖的多模态深度神经网络,用于在资源受限的 tinyML 硬件上部署视觉问答任务。TinyVQA 利用监督注意力为基础的模型学习如何使用视觉和语言模态回答问题。从监督注意力为基础的 VQA 模型获得的蒸馏知识训练了内存感知紧凑型 TinyVQA 模型,并采用低位宽量化技术进一步压缩了模型,以适应部署在 tinyML 设备上。TinyVQA 模型在 FloodNet 数据集上进行了评估,该数据集用于灾害损失评估。紧凑型模型实现了 79.5% 的准确率,证明了 TinyVQA 在现实应用中的有效性。此外,该模型还部署在配备 AI 阵列和 GAP8 微处理器的疯狂飞行器 2.0 上。TinyVQA 模型在部署在 tiny无人机上时,具有低延迟(56ms)和低功耗(693mW),展示了其在资源受限嵌入式系统中的适用性。

URL

https://arxiv.org/abs/2404.03574

PDF

https://arxiv.org/pdf/2404.03574.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot