Paper Reading AI Learner

GRADE: Generating Realistic Animated Dynamic Environments for Robotics Research

2023-03-08 09:36:47
Elia Bonetto, Chenghao Xu, Aamir Ahmad


Simulation engines like Gazebo, Unity and Webots are widely adopted in robotics. However, they lack either full simulation control, ROS integration, realistic physics, or photorealism. Recently, synthetic data generation and realistic rendering advanced tasks like target tracking and human pose estimation. However, when focusing on vision applications, there is usually a lack of information like sensor measurements (e.g. IMU, LiDAR, joint state), or time continuity. On the other hand, simulations for most robotics applications are obtained in (semi)static environments, with specific sensor settings and low visual fidelity. In this work, we present a solution to these issues with a fully customizable framework for generating realistic animated dynamic environments (GRADE) for robotics research. The data produced can be post-processed, e.g. to add noise, and easily expanded with new information using the tools that we provide. To demonstrate GRADE, we use it to generate an indoor dynamic environment dataset and then compare different SLAM algorithms on the produced sequences. By doing that, we show how current research over-relies on well-known benchmarks and fails to generalize. Furthermore, our tests with YOLO and Mask R-CNN provide evidence that our data can improve training performance and generalize to real sequences. Finally, we show GRADE's flexibility by using it for indoor active SLAM, with diverse environment sources, and in a multi-robot scenario. In doing that, we employ different control, asset placement, and simulation techniques. The code, results, implementation details, and generated data are provided as open-source. The main project page is this https URL while the accompanying video can be found at this https URL.

Abstract (translated)

像Gazebo、Unity和Webots这样的仿真引擎在机器人领域中被广泛应用,但它们通常缺乏完整的仿真控制、ROS集成、现实物理学或照片级渲染。最近,合成数据生成和 realistic 渲染先进的任务,例如目标跟踪和人类姿态估计。然而,当关注视觉应用时,通常缺乏传感器测量(例如 IMU、LiDAR、关节状态)或时间连续性信息。另一方面,大多数机器人应用仿真是在(半)静态环境中获得的,具有特定的传感器设置和低视觉 fidelity。在本文中,我们提出了解决这些问题的方法,使用一个完全可定制的框架,为机器人研究生成 realistic 动态环境(grade)。生成的数据可以通过后处理(例如添加噪声)轻松扩展,使用我们提供的工具。为了展示 grade,我们使用它生成一个室内动态环境数据集,然后比较不同的 SLAM 算法在生成的序列中的表现。通过这样做,我们表明当前研究过度依赖已知的基准,无法泛化。我们的实验与 YOLO 和Mask R-CNN一起提供证据,我们的数据可以提高训练性能,并泛化到真实的序列。最后,我们展示了 grade 的灵活性,使用它在室内积极 SLAM 中,利用不同的环境来源和多机器人场景。在这样做时,我们采用不同的控制、资产放置和仿真技术。代码、结果、实现细节和生成数据均为开源提供。主要项目页面是这个 https URL,而伴随的视频可以在这个 https URL 中找到。



3D Action Action_Localization Action_Recognition Activity Adversarial Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot