Paper Reading AI Learner

Deep Neural Network Architecture Search for Accurate Visual Pose Estimation aboard Nano-UAVs

2023-03-03 14:02:09
Elia Cereda, Luca Crupi, Matteo Risso, Alessio Burrello, Luca Benini, Alessandro Giusti, Daniele Jahier Pagliari, Daniele Palossi

Abstract

Miniaturized autonomous unmanned aerial vehicles (UAVs) are an emerging and trending topic. With their form factor as big as the palm of one hand, they can reach spots otherwise inaccessible to bigger robots and safely operate in human surroundings. The simple electronics aboard such robots (sub-100mW) make them particularly cheap and attractive but pose significant challenges in enabling onboard sophisticated intelligence. In this work, we leverage a novel neural architecture search (NAS) technique to automatically identify several Pareto-optimal convolutional neural networks (CNNs) for a visual pose estimation task. Our work demonstrates how real-life and field-tested robotics applications can concretely leverage NAS technologies to automatically and efficiently optimize CNNs for the specific hardware constraints of small UAVs. We deploy several NAS-optimized CNNs and run them in closed-loop aboard a 27-g Crazyflie nano-UAV equipped with a parallel ultra-low power System-on-Chip. Our results improve the State-of-the-Art by reducing the in-field control error of 32% while achieving a real-time onboard inference-rate of ~10Hz@10mW and ~50Hz@90mW.

Abstract (translated)

小型自主无人机(UAV)是一个新兴且趋势性的议题。它们的形态 factor 像手 palm 大小,可以在难以到达的点上安全地操作,对人类环境进行安全地操作。这些机器人上的简单电子设备( sub-100mW )使其特别便宜和吸引人,但为实现车内高级智能而面临的挑战特别大。在这项工作中,我们利用一种新的神经网络架构搜索(NAS)技术,自动识别了视觉姿态估计任务中的几个 Pareto 最优卷积神经网络(CNNs)。我们的工作展示了如何使用实际生产和测试的机器人应用 concretely 利用NAS技术,以自动且高效优化小型无人机的特殊硬件限制的 CNNs。我们部署了几只NAS 优化的 CNNs,并在配备并行 ultra-low 功率系统-on-chip 的 27-g Crazyflie 纳米级无人机上闭环运行。我们的结果提高了技术水平,通过减少现场控制误差 32%,同时实现 ~10Hz@10mW 和 ~50Hz@90mW 的实时车内推理速率。

URL

https://arxiv.org/abs/2303.01931

PDF

https://arxiv.org/pdf/2303.01931.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot