Within academia and industry, there has been a need for expansive simulation frameworks that include model-based simulation of sensors, mobile vehicles, and the environment around them. To this end, the modular, real-time, and open-source AirSim framework has been a popular community-built system that fulfills some of those needs. However, the framework required adding systems to serve some complex industrial applications, including designing and testing new sensor modalities, Simultaneous Localization And Mapping (SLAM), autonomous navigation algorithms, and transfer learning with machine learning models. In this work, we discuss the modification and additions to our open-source version of the AirSim simulation framework, including new sensor modalities, vehicle types, and methods to generate realistic environments with changeable objects procedurally. Furthermore, we show the various applications and use cases the framework can serve.
在学术界和工业界,需要有扩展性的模拟框架,其中包括基于模型的传感器、移动车辆及其周围环境的模拟。为此,模块化、实时且开源的AirSim框架已成为一个受欢迎的社区构建系统,满足了其中一些需求。然而,框架需要添加系统以服务一些复杂的工业应用,包括设计和测试新的传感器模式、同时定位和地图(SLAM)、自主导航算法以及与机器学习模型的转移学习。在这项工作中,我们讨论了我们开源版本的AirSim模拟框架的修改和添加,包括新的传感器模式、车辆类型和方法,以生成具有可变化对象的实际环境。此外,我们展示了框架可以服务的多种应用和 use cases。
https://arxiv.org/abs/2303.13381
This work presents a novel RGB-D-inertial dynamic SLAM method that can enable accurate localisation when the majority of the camera view is occluded by multiple dynamic objects over a long period of time. Most dynamic SLAM approaches either remove dynamic objects as outliers when they account for a minor proportion of the visual input, or detect dynamic objects using semantic segmentation before camera tracking. Therefore, dynamic objects that cause large occlusions are difficult to detect without prior information. The remaining visual information from the static background is also not enough to support localisation when large occlusion lasts for a long period. To overcome these problems, our framework presents a robust visual-inertial bundle adjustment that simultaneously tracks camera, estimates cluster-wise dense segmentation of dynamic objects and maintains a static sparse map by combining dense and sparse features. The experiment results demonstrate that our method achieves promising localisation and object segmentation performance compared to other state-of-the-art methods in the scenario of long-term large occlusion.
这项工作提出了一种 novel RGB-D-inertial 动态 SLAM 方法,能够在长时间内多个动态物体遮挡大部分摄像头视图的情况下实现准确的定位。大多数动态 SLAM 方法要么在动态物体占据视觉输入的较小比例时将其视为异常值并删除,要么在跟踪摄像头之前使用语义分割方法检测动态物体。因此,在没有先前信息的情况下难以检测造成大规模遮挡的动态物体。在长时间大规模遮挡的情况下,剩余的静态背景视觉信息不足以支持定位。因此,我们框架提出了一种稳健的视觉-inertial Bundle 调整方法,可以同时跟踪摄像头并估计动态物体的密集群组分割,并通过结合密集和稀疏特征维持静态稀疏地图。实验结果显示,与我们在其他长期大规模遮挡场景中使用的先进方法相比,我们的方法实现了 promising Localization 和物体分割性能。
https://arxiv.org/abs/2303.13316
Benchmarking Simultaneous Localization and Mapping (SLAM) algorithms is important to scientists and users of robotic systems alike. But through their many configuration options in hardware and software, SLAM systems feature a vast parameter space that scientists up to now were not able to explore. The proposed SLAM Hive Benchmarking Suite is able to analyze SLAM algorithms in 1000's of mapping runs, through its utilization of container technology and deployment in a cluster. This paper presents the architecture and open source implementation of SLAM Hive and compares it to existing efforts on SLAM evaluation. Furthermore, we highlight the function of SLAM Hive by exploring some open source algorithms on public datasets in terms of accuracy. We compare the algorithms against each other and evaluate how parameters effect not only accuracy but also CPU and memory usage. Through this we show that SLAM Hive can become an essential tool for proper comparisons and evaluations of SLAM algorithms and thus drive the scientific development in the research on SLAM.
对同时定位和地图(SLAM)算法进行基准测试对于科学家和机器人系统用户都非常重要。但是,由于硬件和软件中的许多配置选项,SLAM系统具有大量的参数空间,科学家至今无法探索。因此,提出的SLAM Hive基准测试套件能够通过使用容器技术和部署在一个集群中的方式,在数千次地图运行中对SLAM算法进行分析。本文介绍了SLAM Hive的架构和开源实现,并将其与现有的SLAM评估努力进行比较。此外,我们重点突出了SLAM Hive的功能,通过探索一些公共数据集上的开源算法,以精度为指标进行比较。我们比较了各种算法,并评估了参数不仅影响精度,还影响CPU和内存使用情况。通过这些比较,我们表明,SLAM Hive可以成为正确比较和评估SLAM算法的重要工具,从而推动SLAM研究中的科学发展。
https://arxiv.org/abs/2303.11854
This research paper focuses on the problem of dynamic objects and their impact on effective motion planning and localization. The paper proposes a two-step process to address this challenge, which involves finding the dynamic objects in the scene using a Flow-based method and then using a deep Video inpainting algorithm to remove them. The study aims to test the validity of this approach by comparing it with baseline results using two state-of-the-art SLAM algorithms, ORB-SLAM2 and LSD, and understanding the impact of dynamic objects and the corresponding trade-offs. The proposed approach does not require any significant modifications to the baseline SLAM algorithms, and therefore, the computational effort required remains unchanged. The paper presents a detailed analysis of the results obtained and concludes that the proposed method is effective in removing dynamic objects from the scene, leading to improved SLAM performance.
本研究专注于动态物体及其对有效运动规划和定位的影响问题。本文提出了一个两步骤的方法来解决这一挑战。该方法涉及使用流的方法在场景中查找动态物体,然后使用深度视频插值算法将它们删除。研究旨在测试这种方法的有效性,通过使用两个先进的SLAM算法,ORB-SLAM2和LSD,与基线结果进行比较,并理解动态物体的影响和相应的权衡。 proposed approach 不需要对基线SLAM算法进行任何重大修改,因此所需的计算 effort 保持不变。本文详细分析了所取得的结果,并得出结论,即该方法能够有效地从场景中删除动态物体,从而提高了SLAM性能。
https://arxiv.org/abs/2303.10923
In this work, we propose a simultaneous localization and mapping (SLAM) system using a monocular camera and Ultra-wideband (UWB) sensors. Our system, referred to as VRSLAM, is a multi-stage framework that leverages the strengths and compensates for the weaknesses of each sensor. Firstly, we introduce a UWB-aided 7 degree-of-freedom (scale factor, 3D position, and 3D orientation) global alignment module to initialize the visual odometry (VO) system in the world frame defined by the UWB anchors. This module loosely fuses up-to-scale VO and ranging data using either a quadratically constrained quadratic programming (QCQP) or nonlinear least squares (NLS) algorithm based on whether a good initial guess is available. Secondly, we provide an accompanied theoretical analysis that includes the derivation and interpretation of the Fisher Information Matrix (FIM) and its determinant. Thirdly, we present UWBaided bundle adjustment (UBA) and UWB-aided pose graph optimization (UPGO) modules to improve short-term odometry accuracy, reduce long-term drift as well as correct any alignment and scale errors. Extensive simulations and experiments show that our solution outperforms UWB/camera-only and previous approaches, can quickly recover from tracking failure without relying on visual relocalization, and can effortlessly obtain a global map even if there are no loop closures.
在本研究中,我们提出了一种利用单目相机和超宽带(UWB)传感器同时定位和绘图的系统,我们称之为VRSLAM。我们的系统被称为VRSLAM,它是一个多阶段框架,利用每个传感器的优势并补偿其劣势。首先,我们引入了一个UWB辅助的7自由度(尺度、三维位置和三维取向)全球对齐模块,以在由UWB锚点定义的世界框架中初始化视觉导航(VO)系统。这个模块松散地结合到Scale-aware VO和距离数据,根据是否存在良好的初始猜测,使用quadratically constrained quadratic programming(QCQP)或非线性最小二乘法(NLS)算法进行非线性最小平方优化。其次,我们提供了伴随的理论分析,包括费舍尔信息矩阵(FIM)的推导和解释以及其决定值的阐述。第三,我们介绍了UWB辅助分组调整(UBA)和UWB辅助姿态图优化(UPGO)模块,以提高短期导航精度、减少长期漂移并纠正任何对齐和尺度错误。广泛的模拟和实验表明,我们的解决方案比仅使用UWB和相机的方法出色,能够迅速从跟踪失败中恢复,无需依赖视觉重定向,并且即使不存在循环终点,也能轻松获得全球地图。
https://arxiv.org/abs/2303.10903
Simultaneous localization and mapping (SLAM) is a critical technology that enables autonomous robots to be aware of their surrounding environment. With the development of deep learning, SLAM systems can achieve a higher level of perception of the environment, including the semantic and text levels. However, current works are limited in their ability to achieve a natural-language level of perception of the world. To address this limitation, we propose LP-SLAM, the first language-perceptive SLAM system that leverages large language models (LLMs). LP-SLAM has two major features: (a) it can detect text in the scene and determine whether it represents a landmark to be stored during the tracking and mapping phase, and (b) it can understand natural language input from humans and provide guidance based on the generated map. We illustrated three usages of the LLM in the system including text cluster, landmark judgment, and natural language navigation. Our proposed system represents an advancement in the field of LLMs based SLAM and opens up new possibilities for autonomous robots to interact with their environment in a more natural and intuitive way.
同步定位与地图编制(SLAM)是一种关键技术,使自主机器人能够感知其周围环境。随着深度学习的发展,SLAM系统能够感知更高级别的环境,包括语义和文本级别。然而,当前的研究能力有限,无法达到自然语言级别的感知。为了解决这个问题,我们提出了LP-SLAM,它是第一个利用大型语言模型(LLM)的Language-perceive SLAM系统。LP-SLAM有两个主要特性:(a)它可以在场景中检测文本,并确定它是否代表需要在跟踪和地图编制阶段存储的地标,(b)它可以从人类输入的自然语言中理解并基于生成的地图提供指导。我们举例说明了三个使用LLM的系统应用,包括文本簇、地标判断和自然语言导航。我们提出的系统代表了基于LLM的SLAM领域的进步,为自主机器人以更自然和直观的方式与环境交互打开了新的可能性。
https://arxiv.org/abs/2303.10089
Performing accurate localization while maintaining the low-level communication bandwidth is an essential challenge of multi-robot simultaneous localization and mapping (MR-SLAM). In this paper, we tackle this problem by generating a compact yet discriminative feature descriptor with minimum inference time. We propose descriptor distillation that formulates the descriptor generation into a learning problem under the teacher-student framework. To achieve real-time descriptor generation, we design a compact student network and learn it by transferring the knowledge from a pre-trained large teacher model. To reduce the descriptor dimensions from the teacher to the student, we propose a novel loss function that enables the knowledge transfer between two different dimensional descriptors. The experimental results demonstrate that our model is 30% lighter than the state-of-the-art model and produces better descriptors in patch matching. Moreover, we build a MR-SLAM system based on the proposed method and show that our descriptor distillation can achieve higher localization performance for MR-SLAM with lower bandwidth.
在进行准确定位的同时保持低级别通信带宽是多机器人同时定位和地图建立(MR-SLAM)面临的重要挑战。在本文中,我们提出了一种方法,用于生成紧凑但具有明显特征的特征描述符,以最小化推理时间。我们提出了描述符蒸馏方法,将其作为教师和学生框架下的学习问题来解决。为了实现实时特征生成,我们设计了一个紧凑的学生网络,并通过从预先训练的大型教师模型中传输知识来学习它。为了从教师到学生的特征维度减少,我们提出了一种新的损失函数,从而实现不同维度特征之间的知识传输。实验结果表明,我们的模型比最先进的模型轻30%,并在 patch 匹配中产生更好的特征描述符。此外,我们基于该方法构建了MR-SLAM系统,并表明我们的特征蒸馏方法可以在低通信带宽的MR-SLAM系统中实现更高的定位性能。
https://arxiv.org/abs/2303.08420
In this paper, we address the problem of using visuo-tactile feedback for 6-DoF localization and 3D reconstruction of unknown in-hand objects. We propose FingerSLAM, a closed-loop factor graph-based pose estimator that combines local tactile sensing at finger-tip and global vision sensing from a wrist-mount camera. FingerSLAM is constructed with two constituent pose estimators: a multi-pass refined tactile-based pose estimator that captures movements from detailed local textures, and a single-pass vision-based pose estimator that predicts from a global view of the object. We also design a loop closure mechanism that actively matches current vision and tactile images to previously stored key-frames to reduce accumulated error. FingerSLAM incorporates the two sensing modalities of tactile and vision, as well as the loop closure mechanism with a factor graph-based optimization framework. Such a framework produces an optimized pose estimation solution that is more accurate than the standalone estimators. The estimated poses are then used to reconstruct the shape of the unknown object incrementally by stitching the local point clouds recovered from tactile images. We train our system on real-world data collected with 20 objects. We demonstrate reliable visuo-tactile pose estimation and shape reconstruction through quantitative and qualitative real-world evaluations on 6 objects that are unseen during training.
在本文中,我们解决了使用视觉和触觉反馈进行未知手部物体6自由度定位和3D重建的问题。我们提出了 FingerSLAM 算法,它是一种闭环因子图形based姿态估计算法,将手指尖端的局部触觉感知和手持相机的全球视觉感知相结合。 FingerSLAM 由两个组成的姿态估计算法:一种多级 refined 触觉based 姿态估计算法,从详细的局部纹理中捕获运动,一种单一级的视觉based 姿态估计算法,从物体的全局视角中预测。我们还设计了循环闭环机制, actively match current vision and tactile images to previously stored key-frames to reduce accumulated error。 FingerSLAM 将触觉和视觉的两种感知方式相结合,并使用一个因子图形based 优化框架来整合循环闭环机制。这种框架产生了比单独姿态估计算法更准确的优化姿态估计解决方案。 估计的姿态被用来逐步拼接从触觉图像恢复的局部点云,以重建未知物体的形状。我们使用20个物体收集了真实的世界数据来训练我们的系统。我们通过定量和定性的真实世界评估展示了可靠的视觉和触觉姿态估计和形状重建。我们在训练期间未曾观测到的第6个物体上进行了评估。
https://arxiv.org/abs/2303.07997
We present NeuSE, a novel Neural SE(3)-Equivariant Embedding for objects, and illustrate how it supports object SLAM for consistent spatial understanding with long-term scene changes. NeuSE is a set of latent object embeddings created from partial object observations. It serves as a compact point cloud surrogate for complete object models, encoding full shape information while transforming SE(3)-equivariantly in tandem with the object in the physical world. With NeuSE, relative frame transforms can be directly derived from inferred latent codes. Our proposed SLAM paradigm, using NeuSE for object shape and pose characterization, can operate independently or in conjunction with typical SLAM systems. It directly infers SE(3) camera pose constraints that are compatible with general SLAM pose graph optimization, while also maintaining a lightweight object-centric map that adapts to real-world changes. Our approach is evaluated on synthetic and real-world sequences featuring changed objects and shows improved localization accuracy and change-aware mapping capability, when working either standalone or jointly with a common SLAM pipeline.
我们提出了 NeuSE 对象特性编码方案,一种全新的 Neural SE(3)-Equivariant Embedding 方法,用于实现对象 SLAM,并展示它如何支持对象在长期场景变化中 consistent 空间理解。NeuSE 是从部分对象观测中提取的隐含对象Embeddings,充当完整对象模型的紧凑点云模拟,同时与现实世界的对象协同编码 full 形状信息,实现 SE(3)-equivariant 变换。与 NeuSE 配合使用,可以直接从推断的隐含编码中推导出相对帧变换。我们提出的 SLAM 范式,使用 NeuSE 对对象形状和姿态进行特征化,可以独立运行或与典型 SLAM 系统协同工作。它直接推断与一般 SLAM 姿态图优化兼容的 SE(3)相机姿态限制,同时保持轻量级的对象中心地图,适应现实世界的变化。我们的方法在模拟和实际场景中进行了验证,展示了在单独运行或与通用 SLAM 流程共同工作时提高定位精度和变化感知能力的能力。
https://arxiv.org/abs/2303.07308
Existence of symmetric objects, whose observation at different viewpoints can be identical, can deteriorate the performance of simultaneous localization and mapping(SLAM). This work proposes a system for robustly optimizing the pose of cameras and objects even in the presence of symmetric objects. We classify objects into three categories depending on their symmetry characteristics, which is efficient and effective in that it allows to deal with general objects and the objects in the same category can be associated with the same type of ambiguity. Then we extract only the unambiguous parameters corresponding to each category and use them in data association and joint optimization of the camera and object pose. The proposed approach provides significant robustness to the SLAM performance by removing the ambiguous parameters and utilizing as much useful geometric information as possible. Comparison with baseline algorithms confirms the superior performance of the proposed system in terms of object tracking and pose estimation, even in challenging scenarios where the baseline fails.
存在对称对象,其在不同视角下的观察结果完全相同,可能会恶化同时定位和映射(SLAM)的性能。该研究提出了一种系统,能够在存在对称对象的情况下, robustly 优化相机和对象的姿态。我们将对象按照其对称性特征分为三个类别,这种方法既高效又有效,因为它可以处理一般对象,同一类别中的 objects 可以具有相同的歧义。然后,我们只提取每个类别中的无歧义参数,并将其用于相机和对象姿态的数据关联和联合优化。该方法提供了对 SLAM 性能的重大鲁棒性,通过去除歧义参数并尽可能利用有用的几何信息。与基准算法进行比较确认了 proposed 系统在对象跟踪和姿态估计方面的优势,即使在基准算法失败的情况下也是如此。
https://arxiv.org/abs/2303.07872
Global place recognition and 3D relocalization are one of the most important components in the loop closing detection for 3D LiDAR Simultaneous Localization and Mapping (SLAM). In order to find the accurate global 6-DoF transform by feature matching approach, various end-to-end architectures have been proposed. However, existing methods do not consider the false correspondence of the features, thereby unnecessary features are also involved in global place recognition and relocalization. In this paper, we introduce a robust correspondence estimation method by removing unnecessary features and highlighting necessary features simultaneously. To focus on the necessary features and ignore the unnecessary ones, we use the geometric correlation between two scenes represented in the 3D LiDAR point clouds. We introduce the correspondence auxiliary loss that finds key correlations based on the point align algorithm and enables end-to-end training of the proposed networks with robust correspondence estimation. Since the ground with many plane patches acts as an outlier during correspondence estimation, we also propose a preprocessing step to consider negative correspondence by removing dominant plane patches. The evaluation results on the dynamic urban driving dataset, show that our proposed method can improve the performances of both global place recognition and relocalization tasks. We show that estimating the robust feature correspondence is one of the important factors in place recognition and relocalization.
全球地点识别和三维重定向是3D 激光雷达同时定位和地图(SLAM)中最为重要的组件之一。为了通过特征匹配方法找到准确的全球6自由度转换,已经提出了各种端到端架构。然而,现有的方法没有考虑特征之间的虚假对应,因此不必要的特征也参与了全球地点识别和重定向。在本文中,我们介绍了一种可靠的对应估计方法,通过删除不必要的特征并同时突出必要的特征来实现。为了专注于必要的特征并忽略不必要的特征,我们使用在3D 激光雷达点云中表示的两个场景的几何关系。我们引入了对应辅助损失,基于点对齐算法找到关键对应关系,并实现端到端训练提议的网络,具有可靠的对应估计。由于许多平面 patch 在对应估计期间被视为异常值,我们还提出了一种预处理步骤,通过删除主导平面 patch 来考虑负对应。在动态城市驾驶数据集上的评估结果显示,我们提议的方法可以提高全球地点识别和重定向任务的表现。我们表明,估计鲁棒的特征对应是地点识别和重定向中的重要因素之一。
https://arxiv.org/abs/2303.06308
When a mobile robot lacks high onboard computing or networking capabilities, it can rely on remote computing architecture for its control and autonomy. This paper introduces a novel collaborative Simulation Twin (ST) strategy for control and autonomy on resource-constrained robots. The practical implementation of such a strategy entails a mobile robot system divided into a cyber (simulated) and physical (real) space separated over a communication channel where the physical robot resides on the site of operation guided by a simulated autonomous agent from a remote location maintained over a network. Building on top of the digital twin concept, our collaborative twin is capable of autonomous navigation through an advanced SLAM-based path planning algorithm, while the physical robot is capable of tracking the Simulated twin's velocity and communicating feedback generated through interaction with its environment. We proposed a prioritized path planning application to the test in a collaborative teleoperation system of a physical robot guided by ST's autonomous navigation. We examine the performance of a physical robot led by autonomous navigation from the Collaborative Twin and assisted by a predicted force received from the physical robot. The experimental findings indicate the practicality of the proposed simulation-physical twinning approach and provide computational and network performance improvements compared to typical remote computing (or offloading), and digital twin approaches.
当移动机器人缺乏高性能内置计算或网络能力时,它可以利用远程计算架构对其控制和自主进行依赖。本文介绍了一种针对资源受限机器人的控制和自主的新型合作模拟双头(ST)策略。这种策略的实际实施包括将移动机器人系统划分为 cyber(模拟)和物理(实际)空间,在通信通道上分离,其中物理机器人位于操作site由远程维护的模拟自主代理引导的一个模拟位置。在数字双胞胎概念的基础上,我们的合作双头可以通过先进的 SLAM 路径规划算法自主导航,而物理机器人可以跟踪模拟双头的速度并产生与环境互动产生的交流反馈。我们提出了一种优先路径规划应用,用于测试由 ST 的自主导航引导的物理机器人的协作远程操作系统。我们检查了由合作双头领导的物理机器人的表现,并借助从物理机器人接收的预测力进行协助。实验结果表明,提出的模拟物理孪生方法的实际可行性,并相比典型的远程计算(或卸载)和数字孪生方法提供了计算和网络性能改进。
https://arxiv.org/abs/2303.06172
Recent work has shown impressive localization performance using only images of ground textures taken with a downward facing monocular camera. This provides a reliable navigation method that is robust to feature sparse environments and challenging lighting conditions. However, these localization methods require an existing map for comparison. Our work aims to relax the need for a map by introducing a full simultaneous localization and mapping (SLAM) system. By not requiring an existing map, setup times are minimized and the system is more robust to changing environments. This SLAM system uses a combination of several techniques to accomplish this. Image keypoints are identified and projected into the ground plane. These keypoints, visual bags of words, and several threshold parameters are then used to identify overlapping images and revisited areas. The system then uses robust M-estimators to estimate the transform between robot poses with overlapping images and revisited areas. These optimized estimates make up the map used for navigation. We show, through experimental data, that this system performs reliably on many ground textures, but not all.
最近的工作表明,仅使用一张向下倾斜的单目相机拍摄的土地纹理图像,可以表现出令人印象深刻的的定位性能。这种方法提供了一种可靠的导航方法,能够 robustly 应对稀疏环境以及挑战性的照明条件。然而,这些定位方法需要与已有地图进行比较。我们的目标是通过引入全同时定位和映射(SLAM)系统来放松对地图的需求。不再需要已有地图,setup时间被最小化,系统更加 robust 于环境变化。这个 SLAM 系统使用了多种技术来实现这一点。图像关键点被识别并投影到地面平面上。这些关键点、词汇视觉包和几个阈值参数被用来识别重叠图像和重访区域。系统 then 使用鲁棒的 M-估计器来估计重叠图像和重访区域之间的机器人姿态变换。这些优化估计组成了用于导航的地图。通过实验数据,我们表明,该系统在许多土地纹理上表现出可靠的性能,但并非所有情况下。
https://arxiv.org/abs/2303.05946
Most current LiDAR simultaneous localization and mapping (SLAM) systems build maps in point clouds, which are sparse when zoomed in, even though they seem dense to human eyes. Dense maps are essential for robotic applications, such as map-based navigation. Due to the low memory cost, mesh has become an attractive dense model for mapping in recent years. However, existing methods usually produce mesh maps by using an offline post-processing step to generate mesh maps. This two-step pipeline does not allow these methods to use the built mesh maps online and to enable localization and meshing to benefit each other. To solve this problem, we propose the first CPU-only real-time LiDAR SLAM system that can simultaneously build a mesh map and perform localization against the mesh map. A novel and direct meshing strategy with Gaussian process reconstruction realizes the fast building, registration, and updating of mesh maps. We perform experiments on several public datasets. The results show that our SLAM system can run at around $40$Hz. The localization and meshing accuracy also outperforms the state-of-the-art methods, including the TSDF map and Poisson reconstruction. Our code and video demos are available at: this https URL.
当前的激光雷达同时定位和绘图(SLAM)系统通常以点云的形式构建地图,尽管点云在放大时看起来较少。Dense地图对于机器人应用,例如基于地图的导航,至关重要。由于内存成本较低,网格已经成为近年来Mapping中一种吸引人的密集模型。然而,现有的方法通常使用 offline 后处理步骤生成网格地图。此两步流程不允许这些方法使用生成的网格地图并实现定位和网格之间的相互帮助。为了解决这一问题,我们提出了第一个仅使用CPU的实时激光雷达 SLAM 系统,可以同时构建网格地图并对抗网格地图进行定位。一种独特的直接网格重构策略实现了快速构建、注册和更新网格地图。我们在多个公共数据集上进行了实验。结果表明,我们的 SLAM 系统可以运行在约 $40$ 赫兹。定位和网格精度也优于最先进的方法,包括TSDF地图和泊松重构。我们的代码和视频演示可用在此 https URL 上。
https://arxiv.org/abs/2303.05252
Lines are interesting geometrical features commonly seen in indoor and urban environments. There is missing a complete benchmark where one can evaluate lines from a sequential stream of images in all its stages: Line detection, Line Association and Pose error. To do so, we present a complete and exhaustive benchmark for visual lines in a SLAM front-end, both for RGB and RGBD, by providing a plethora of complementary metrics. We have also labelled data from well-known SLAM datasets in order to have all in one poses and accurately annotated lines. In particular, we have evaluated 17 line detection algorithms, 5 line associations methods and the resultant pose error for aligning a pair of frames with several combinations of detector-association. We have packaged all methods and evaluations metrics and made them publicly available on web-page this https URL.
线条是室内和城市环境中常见的有趣的几何特征。目前缺少一个完整的基准,可以从Sequential Image SLAM(序列图像解SLAM)前端中评估线条的各个阶段:线条检测、线条匹配和姿态错误。为了实现这一点,我们提供了丰富的辅助测量指标,包括RGB和RGBD两种颜色的多通道测量指标。我们还标记了著名的SLAM数据集的数据,以便获得单一的姿态和精确注释的线条。特别地,我们评估了17个线条检测算法、5个线条匹配方法以及它们所产生的姿态错误,以将两个帧与多个检测-匹配组合对齐。我们将它们打包并公开发布在https://www.example.com/论文的网站上。
https://arxiv.org/abs/2303.05162
In this paper, we address the lack of datasets for - and the issue of reproducibility in - collaborative SLAM pose graph optimizers by providing a novel pose graph generator. Our pose graph generator, kollagen, is based on a random walk in a planar grid world, similar to the popular M3500 dataset for single agent SLAM. It is simple to use and the user can set several parameters, e.g., the number of agents, the number of nodes, loop closure generation probabilities, and standard deviations of the measurement noise. Furthermore, a qualitative execution time analysis of our pose graph generator showcases the speed of the generator in the tunable parameters. In addition to the pose graph generator, our paper provides two example datasets that researchers can use out-of-the-box to evaluate their algorithms. One of the datasets has 8 agents, each with 3500 nodes, and 67645 constraints in the pose graphs, while the other has 5 agents, each with 10000 nodes, and 76134 constraints. In addition, we show that current state-of-the-art pose graph optimizers are able to process our generated datasets and perform pose graph optimization. The data generator can be found at this https URL.
在本文中,我们通过提供一个新的姿态图生成器来解决缺少 - 以及在协作式SLAM姿态图优化中重现性问题 - 的问题,同时也提供了两个可用于评估算法的示例数据集。我们的数据集生成器Kollagen基于平面网格世界的随机漫步,类似于单Agent SLAM中流行的M3500数据集。它易于使用,用户可设置多个参数,例如每个代理的数量、节点数、循环闭生成概率以及测量噪声的标准差。此外,我们对数据集生成器进行了定性执行时间分析,展示了生成器在可调整参数方面的速度和性能。除了数据集生成器外,我们提供了两个示例数据集,研究人员可以从中框中选择使用来评估算法。其中一个数据集包含8个代理,每个代理有3500个节点,67645个约束,另一个是5个代理,每个代理有10000个节点,76134个约束。此外,我们展示了当前最先进的姿态图优化器能够处理我们生成的数据集并进行姿态图优化。数据生成器可在本URL中找到。
https://arxiv.org/abs/2303.04753
Hand-eye calibration is the problem of estimating the spatial transformation between a reference frame, usually the base of a robot arm or its gripper, and the reference frame of one or multiple cameras. Generally, this calibration is solved as a non-linear optimization problem, what instead is rarely done is to exploit the underlying graph structure of the problem itself. Actually, the problem of hand-eye calibration can be seen as an instance of the Simultaneous Localization and Mapping (SLAM) problem. Inspired by this fact, in this work we present a pose-graph approach to the hand-eye calibration problem that extends a recent state-of-the-art solution in two different ways: i) by formulating the solution to eye-on-base setups with one camera; ii) by covering multi-camera robotic setups. The proposed approach has been validated in simulation against standard hand-eye calibration methods. Moreover, a real application is shown. In both scenarios, the proposed approach overcomes all alternative methods. We release with this paper an open-source implementation of our graph-based optimization framework for multi-camera setups.
手眼校准是指测量一个参考框架(通常是机器人手臂的基座或抓手)与一个或多个相机的相对位置变换的问题。一般来说,这个问题是通过非线性优化问题来解决的,而很少使用的是利用问题本身的 underlying graph 结构。实际上,手眼校准的问题可以被视为同时定位和地图绘制(SLAM)问题的一个实例。基于这个事实,在本文中我们提出了一种姿态图方法来解决手眼校准问题,以两种方式扩展了最近先进的解决方案:第一种方式是通过制定一个针对单个相机的基础视角解决方案;第二种方式是通过覆盖多个相机的机器人 setups。该 proposed 方法是在模拟中验证了与标准手眼校准方法的兼容性。此外,还展示了一个实际应用场景。在这两种情况下,该 proposed 方法克服了所有其他方法。本文同时发布了我们为多相机 setups 开发的开放源代码实现的基于 graph 的优化框架。
https://arxiv.org/abs/2303.04747
Simulation engines like Gazebo, Unity and Webots are widely adopted in robotics. However, they lack either full simulation control, ROS integration, realistic physics, or photorealism. Recently, synthetic data generation and realistic rendering advanced tasks like target tracking and human pose estimation. However, when focusing on vision applications, there is usually a lack of information like sensor measurements (e.g. IMU, LiDAR, joint state), or time continuity. On the other hand, simulations for most robotics applications are obtained in (semi)static environments, with specific sensor settings and low visual fidelity. In this work, we present a solution to these issues with a fully customizable framework for generating realistic animated dynamic environments (GRADE) for robotics research. The data produced can be post-processed, e.g. to add noise, and easily expanded with new information using the tools that we provide. To demonstrate GRADE, we use it to generate an indoor dynamic environment dataset and then compare different SLAM algorithms on the produced sequences. By doing that, we show how current research over-relies on well-known benchmarks and fails to generalize. Furthermore, our tests with YOLO and Mask R-CNN provide evidence that our data can improve training performance and generalize to real sequences. Finally, we show GRADE's flexibility by using it for indoor active SLAM, with diverse environment sources, and in a multi-robot scenario. In doing that, we employ different control, asset placement, and simulation techniques. The code, results, implementation details, and generated data are provided as open-source. The main project page is this https URL while the accompanying video can be found at this https URL.
像Gazebo、Unity和Webots这样的仿真引擎在机器人领域中被广泛应用,但它们通常缺乏完整的仿真控制、ROS集成、现实物理学或照片级渲染。最近,合成数据生成和 realistic 渲染先进的任务,例如目标跟踪和人类姿态估计。然而,当关注视觉应用时,通常缺乏传感器测量(例如 IMU、LiDAR、关节状态)或时间连续性信息。另一方面,大多数机器人应用仿真是在(半)静态环境中获得的,具有特定的传感器设置和低视觉 fidelity。在本文中,我们提出了解决这些问题的方法,使用一个完全可定制的框架,为机器人研究生成 realistic 动态环境(grade)。生成的数据可以通过后处理(例如添加噪声)轻松扩展,使用我们提供的工具。为了展示 grade,我们使用它生成一个室内动态环境数据集,然后比较不同的 SLAM 算法在生成的序列中的表现。通过这样做,我们表明当前研究过度依赖已知的基准,无法泛化。我们的实验与 YOLO 和Mask R-CNN一起提供证据,我们的数据可以提高训练性能,并泛化到真实的序列。最后,我们展示了 grade 的灵活性,使用它在室内积极 SLAM 中,利用不同的环境来源和多机器人场景。在这样做时,我们采用不同的控制、资产放置和仿真技术。代码、结果、实现细节和生成数据均为开源提供。主要项目页面是这个 https URL,而伴随的视频可以在这个 https URL 中找到。
https://arxiv.org/abs/2303.04466
Searching for objects is a fundamental skill for robots. As such, we expect object search to eventually become an off-the-shelf capability for robots, similar to e.g., object detection and SLAM. In contrast, however, no system for 3D object search exists that generalizes across real robots and environments. In this paper, building upon a recent theoretical framework that exploited the octree structure for representing belief in 3D, we present GenMOS (Generalized Multi-Object Search), the first general-purpose system for multi-object search (MOS) in a 3D region that is robot-independent and environment-agnostic. GenMOS takes as input point cloud observations of the local region, object detection results, and localization of the robot's view pose, and outputs a 6D viewpoint to move to through online planning. In particular, GenMOS uses point cloud observations in three ways: (1) to simulate occlusion; (2) to inform occupancy and initialize octree belief; and (3) to sample a belief-dependent graph of view positions that avoid obstacles. We evaluate our system both in simulation and on two real robot platforms. Our system enables, for example, a Boston Dynamics Spot robot to find a toy cat hidden underneath a couch in under one minute. We further integrate 3D local search with 2D global search to handle larger areas, demonstrating the resulting system in a 25m$^2$ lobby area.
寻找对象是机器人的基本技能。因此,我们希望对象搜索最终能够成为机器人的常备能力,类似于物体检测和SLAM。然而,然而,不存在适用于真实机器人和环境的3D对象搜索系统。在本文中,基于利用octree结构在3D中表示信念的最新理论框架,我们提出了GenMOS(通用多物体搜索),它是第一个在3D区域中通用的多物体搜索(MOS)系统。 GenMOS以本地区域点云观测、物体检测结果和机器人视图姿态的定位作为输入,并输出一个6D视角,通过在线规划进行移动。特别是,GenMOS通过以下方式使用点云观测:(1)模拟遮挡;(2)通知占据并初始化octree信念;(3)样本避免障碍物的信念依赖图形。我们在模拟和两个真实机器人平台上评估了我们的系统。我们的系统使例如波士顿动力的Spot机器人能够在不到一分钟的时间内找到藏在 couch下面的玩具猫。我们进一步将3D本地搜索与2D全球搜索集成,以处理更大的区域,并在一个25m$^2$的展览区域展示了 resulting 系统。
https://arxiv.org/abs/2303.03178
Integrating multiple LiDAR sensors can significantly enhance a robot's perception of the environment, enabling it to capture adequate measurements for simultaneous localization and mapping (SLAM). Indeed, solid-state LiDARs can bring in high resolution at a low cost to traditional spinning LiDARs in robotic applications. However, their reduced field of view (FoV) limits performance, particularly indoors. In this paper, we propose a tightly-coupled multi-modal multi-LiDAR-inertial SLAM system for surveying and mapping tasks. By taking advantage of both solid-state and spinnings LiDARs, and built-in inertial measurement units (IMU), we achieve both robust and low-drift ego-estimation as well as high-resolution maps in diverse challenging indoor environments (e.g., small, featureless rooms). First, we use spatial-temporal calibration modules to align the timestamp and calibrate extrinsic parameters between sensors. Then, we extract two groups of feature points including edge and plane points, from LiDAR data. Next, with pre-integrated IMU data, an undistortion module is applied to the LiDAR point cloud data. Finally, the undistorted point clouds are merged into one point cloud and processed with a sliding window based optimization module. From extensive experiment results, our method shows competitive performance with state-of-the-art spinning-LiDAR-only or solid-state-LiDAR-only SLAM systems in diverse environments. More results, code, and dataset can be found at \href{this https URL}{this https URL}.
集成多个激光雷达传感器可以显著增强机器人对周围环境的感知,使其能够捕捉适当的测量,实现同时定位和地图构建(SLAM)。事实上,固态激光雷达在机器人应用中可以提供高分辨率,而传统旋转激光雷达的成本较低。然而,它们的的视角减少限制了性能,特别是在室内。在本文中,我们提出了一种紧密耦合的多模态多激光雷达惯性SLAM系统,以进行测量和地图构建任务。利用固态和旋转激光雷达的优点,以及内置惯性测量单元(IMU),我们在各种挑战性的室内环境中(例如小型无特征房间)实现了 robust 和低漂移的自估计值,以及高分辨率地图。首先,我们使用空间时间校准模块对齐传感器之间的外部参数。然后,我们从激光雷达数据中提取了两个组特征点,包括边缘和平面点。接下来,使用预先集成的IMU数据,应用一个无扭曲模块对激光雷达点云数据进行处理。最后,无扭曲点云合并成一个点云,并通过滑动窗口based优化模块进行处理。从广泛的实验结果中,我们的方法表现出与最先进的旋转激光雷达 only 或固态激光雷达 only 的SLAM系统在多种环境中的竞争性表现。更多结果、代码和数据可查阅 href{this https URL}{this https URL}。
https://arxiv.org/abs/2303.02684