Paper Reading AI Learner

Exosense: A Vision-Centric Scene Understanding System For Safe Exoskeleton Navigation

2024-03-21 11:41:39
Jianeng Wang, Matias Mattamala, Christina Kassab, Lintong Zhang, Maurice Fallon

Abstract

Exoskeletons for daily use by those with mobility impairments are being developed. They will require accurate and robust scene understanding systems. Current research has used vision to identify immediate terrain and geometric obstacles, however these approaches are constrained to detections directly in front of the user and are limited to classifying a finite range of terrain types (e.g., stairs, ramps and level-ground). This paper presents Exosense, a vision-centric scene understanding system which is capable of generating rich, globally-consistent elevation maps, incorporating both semantic and terrain traversability information. It features an elastic Atlas mapping framework associated with a visual SLAM pose graph, embedded with open-vocabulary room labels from a Vision-Language Model (VLM). The device's design includes a wide field-of-view (FoV) fisheye multi-camera system to mitigate the challenges introduced by the exoskeleton walking pattern. We demonstrate the system's robustness to the challenges of typical periodic walking gaits, and its ability to construct accurate semantically-rich maps in indoor settings. Additionally, we showcase its potential for motion planning -- providing a step towards safe navigation for exoskeletons.

Abstract (translated)

为那些行动不便的人开发了一种可日常使用的外骨骼。它们需要准确且可靠的场景理解系统。目前的研究已经利用视觉来识别立即的地形和几何障碍,然而这些方法仅限于在用户前直接检测到,并且局限于对有限范围的地面类型(如楼梯、斜坡和水平地面)进行分类。本文介绍了Exosense,一种以视觉为核心场景理解系统,能够生成丰富、全球一致的地形图,同时包含语义和地形可穿越信息。它采用了一个具有视觉SLAM姿态图的弹性的Atlas映射框架,附带从Vision-Language Model (VLM) 中的开放式词汇房间标签嵌入的房间标签。设备的设计包括一个广角鱼眼多相机系统,以减轻由外骨骼步行模式带来的挑战。我们展示了系统对典型周期性步行姿态的鲁棒性以及其在室内环境中的准确语义丰富地图的构建能力。此外,我们还展示了它在运动规划方面的潜力——为外骨骼的 safe navigation 迈出一步。

URL

https://arxiv.org/abs/2403.14320

PDF

https://arxiv.org/pdf/2403.14320.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot