We propose RTG-SLAM, a real-time 3D reconstruction system with an RGBD camera for large-scale environments using Gaussian splatting. RTG-SLAM features a compact Gaussian representation and a highly efficient on-the-fly Gaussian optimization scheme. We force each Gaussian to be either opaque or nearly transparent, with the opaque ones fitting the surface and dominant colors, and transparent ones fitting residual colors. By rendering depth in a different way from color rendering, we let a single opaque Gaussian well fit a local surface region without the need of multiple overlapping Gaussians, hence largely reducing the memory and computation cost. For on-the-fly Gaussian optimization, we explicitly add Gaussians for three types of pixels per frame: newly observed, with large color errors and with large depth errors. We also categorize all Gaussians into stable and unstable ones, where the stable Gaussians are expected to well fit previously observed RGBD images and otherwise unstable. We only optimize the unstable Gaussians and only render the pixels occupied by unstable Gaussians. In this way, both the number of Gaussians to be optimized and pixels to be rendered are largely reduced, and the optimization can be done in real time. We show real-time reconstructions of a variety of real large scenes. Compared with the state-of-the-art NeRF-based RGBD SLAM, our system achieves comparable high-quality reconstruction but with around twice the speed and half the memory cost, and shows superior performance in the realism of novel view synthesis and camera tracking accuracy.
我们提出了RTG-SLAM,一种基于Gaussian分割的大规模环境下的实时3D重建系统。RTG-SLAM具有紧凑的Gaussian表示和高效的on-the-fly Gaussian优化方案。我们强制每个Gaussian要么是透明的,要么是几乎透明的,其中透明的Gaussian适合于表面和主导颜色,而透明的Gaussian适合于残余颜色。通过以与颜色渲染不同的方式渲染深度,我们使得一个透明的Gaussian可以适应用户本地表面区域,而无需多个重叠的Gaussian,从而大大降低了内存和计算成本。 对于on-the-fly Gaussian优化,我们明确地添加了每帧三种不同类型的像素的Gaussian:新观察到的,具有大的颜色误差和大的深度误差。我们还将所有Gaussian分为稳定和不稳定两类,其中稳定Gaussian预计将很好地适应用户之前观察到的RGBD图像,而其他Gaussian则是不稳定的。我们仅优化不稳定Gaussian,并仅渲染稳定Gaussian占用的像素。 通过这种方式,Gaussians要优化的数量和需要渲染的像素数量都大大减少,优化可以在实时过程中进行。我们展示了各种真实大场景的实时重构。与基于NeRF的RGBD SLAM的状态相比,我们的系统在质量和高速度方面具有相似的表现,同时将速度和内存成本降低约一半,并在新颖视图合成和相机跟踪精度的现实性方面具有卓越的表现。
https://arxiv.org/abs/2404.19706
Pose graph optimization (PGO) is a well-known technique for solving the pose-based simultaneous localization and mapping (SLAM) problem. In this paper, we represent the rotation and translation by a unit quaternion and a three-dimensional vector, and propose a new PGO model based on the von Mises-Fisher distribution. The constraints derived from the unit quaternions are spherical manifolds, and the projection onto the constraints can be calculated by normalization. Then a proximal linearized Riemannian alternating direction method of multipliers (PieADMM) is developed to solve the proposed model, which not only has low memory requirements, but also can update the poses in parallel. Furthermore, we establish the iteration complexity of $O(1/\epsilon^{2})$ of PieADMM for finding an $\epsilon$-stationary solution of our model. The efficiency of our proposed algorithm is demonstrated by numerical experiments on two synthetic and four 3D SLAM benchmark datasets.
姿态图优化(PGO)是一种解决基于姿态的的同时定位和映射(SLAM)问题的著名技术。在本文中,我们用单位四元数和三维向量表示旋转和平移,并基于Von Mises-Fisher分布提出了一种新的PGO模型。由单位四元数导出的约束是球形曼哈顿空间,投影到约束上可以通过归一化计算。然后,我们开发了一种代理线性化黎曼ian交替方向方法(PieADMM)来求解所提出的模型,该模型具有较低的内存要求,并且可以并行更新姿态。此外,我们还计算了PieADMM在找到我们模型的$\epsilon$-稳定解时的迭代复杂度。我们提出的算法的效率通过在两个合成和四个3D SLAM基准数据集上的数值实验得到了充分证明。
https://arxiv.org/abs/2404.18560
This paper proposes a photorealistic real-time dense 3D mapping system that utilizes a learning-based image enhancement method and mesh-based map representation. Due to the characteristics of the underwater environment, where problems such as hazing and low contrast occur, it is hard to apply conventional simultaneous localization and mapping (SLAM) methods. Furthermore, for sensitive tasks like inspecting cracks, photorealistic mapping is very important. However, the behavior of Autonomous Underwater Vehicle (AUV) is computationally constrained. In this paper, we utilize a neural network-based image enhancement method to improve pose estimation and mapping quality and apply a sliding window-based mesh expansion method to enable lightweight, fast, and photorealistic mapping. To validate our results, we utilize real-world and indoor synthetic datasets. We performed qualitative validation with the real-world dataset and quantitative validation by modeling images from the indoor synthetic dataset as underwater scenes.
本文提出了一种利用基于学习的图像增强方法和基于网格的地图表示的等距实时三维映射系统。由于水下环境的特性,例如雾和低对比度问题,因此很难应用传统的同时定位和映射(SLAM)方法。此外,对于诸如检查裂纹等敏感任务,等距实时映射非常重要。然而,自主水下车辆(AUV)的行为是计算受限的。在本文中,我们利用基于神经网络的图像增强方法来提高姿态估计和映射质量,并采用滑动窗口基础的网格扩展方法来实现轻量、快速和等距实时映射。为了验证我们的结果,我们利用真实世界和室内合成数据集。我们通过真实世界数据集进行定性评估,并通过将室内合成数据集中的图像建模为水下场景进行定量评估。
https://arxiv.org/abs/2404.18395
Multi-robot simultaneous localization and mapping (SLAM) enables a robot team to achieve coordinated tasks relying on a common map. However, centralized processing of robot observations is undesirable because it creates a single point of failure and requires pre-existing infrastructure and significant multi-hop communication throughput. This paper formulates multi-robot object SLAM as a variational inference problem over a communication graph. We impose a consensus constraint on the objects maintained by different nodes to ensure agreement on a common map. To solve the problem, we develop a distributed mirror descent algorithm with a regularization term enforcing consensus. Using Gaussian distributions in the algorithm, we derive a distributed multi-state constraint Kalman filter (MSCKF) for multi-robot object SLAM. Experiments on real and simulated data show that our method improves the trajectory and object estimates, compared to individual-robot SLAM, while achieving better scaling to large robot teams, compared to centralized multi-robot SLAM. Code is available at this https URL.
多机器人同时定位与映射(SLAM)使得机器人团队能够依靠共同的地图实现协同任务。然而,集中式处理机器人观测是一个不愉快的特点,因为它创造了一个单点故障,并需要依赖预先存在的设施和显著的多跳通信带宽。本文将多机器人对象SLAM建模为通信图上的变分推理问题。我们在不同节点维护的物体之间施加共识约束,以确保对共同地图的一致同意。为了解决这个问题,我们开发了一个具有正则化项的分布式镜像下降算法。使用高斯分布算法,我们推导出多机器人对象SLAM的分布式多状态约束Kalman滤波器(MSCKF)。在真实和模拟数据上的实验表明,与单独机器人SLAM相比,我们的方法提高了轨迹和物体估计,同时实现了更好的对大型机器人团队的比例扩展。代码可在此处访问:https://www.xxx.com/
https://arxiv.org/abs/2404.18331
With the emergence of Neural Radiance Fields (NeRF), neural implicit representations have gained widespread applications across various domains, including simultaneous localization and mapping. However, current neural implicit SLAM faces a challenging trade-off problem between performance and the number of parameters. To address this problem, we propose sparse tri-plane encoding, which efficiently achieves scene reconstruction at resolutions up to 512 using only 2~4% of the commonly used tri-plane parameters (reduced from 100MB to 2~4MB). On this basis, we design S3-SLAM to achieve rapid and high-quality tracking and mapping through sparsifying plane parameters and integrating orthogonal features of tri-plane. Furthermore, we develop hierarchical bundle adjustment to achieve globally consistent geometric structures and reconstruct high-resolution appearance. Experimental results demonstrate that our approach achieves competitive tracking and scene reconstruction with minimal parameters on three datasets. Source code will soon be available.
随着Neural Radiance Fields(NeRF)的出现,神经隐式表示已经在各种领域得到了广泛应用,包括同时定位和映射。然而,当前的神经隐式SLAM在性能和参数数量之间存在一个具有挑战性的权衡问题。为了解决这个问题,我们提出了稀疏三平面编码,它通过仅使用2~4%的常用三平面参数(从100MB减少到2~4MB)在高达512的分辨率下高效地实现场景重构。在此基础上,我们设计了一个S3-SLAM,通过稀疏化三平面的参数并整合三平面的正交特征,实现了快速且高质量的跟踪和映射。此外,我们还开发了层次结构Bundle Adjustment以实现全局一致的几何结构和重构高分辨率的外观。实验结果表明,我们的方法在三个数据集上实现了具有竞争力的跟踪和场景重构,且参数数量最小。源代码不久将可用。
https://arxiv.org/abs/2404.18284
Accurate localization is an essential technology for the flexible navigation of robots in large-scale environments. Both SLAM-based and map-based localization will increase the computing load due to the increase in map size, which will affect downstream tasks such as robot navigation and services. To this end, we propose a localization system based on Block Maps (BMs) to reduce the computational load caused by maintaining large-scale maps. Firstly, we introduce a method for generating block maps and the corresponding switching strategies, ensuring that the robot can estimate the state in large-scale environments by loading local map information. Secondly, global localization according to Branch-and-Bound Search (BBS) in the 3D map is introduced to provide the initial pose. Finally, a graph-based optimization method is adopted with a dynamic sliding window that determines what factors are being marginalized whether a robot is exposed to a BM or switching to another one, which maintains the accuracy and efficiency of pose tracking. Comparison experiments are performed on publicly available large-scale datasets. Results show that the proposed method can track the robot pose even though the map scale reaches more than 6 kilometers, while efficient and accurate localization is still guaranteed on NCLT and M2DGR.
准确的局部定位是机器人在大规模环境中有弹性地导航的必要技术。基于SLAM和基于地图的局部定位都会因为地图大小增加而增加计算负载,从而影响下游任务,如机器人导航和服务。为此,我们提出了基于块图(BMs)的局部定位系统,以减轻维护大型地图所带来的计算负载。首先,我们引入了一种生成块图和相应的切换策略的方法,确保机器人可以通过加载局部地图信息来估计大型环境中的状态。其次,引入了根据Branch-and-Bound Search(BBS)进行全局定位的方法,以提供初始姿态。最后,采用基于图的优化方法,动态滑动窗口确定机器人是否暴露于BM或切换到另一个,保持姿态跟踪的准确性和效率。我们在公开的大规模数据集上进行了比较实验。结果表明,与BM相比,即使地图尺度达到6公里以上,所提出的方法仍可以跟踪机器人的姿态,而NCLT和M2DGR仍然可以保证高效的局部定位。
https://arxiv.org/abs/2404.18192
We introduce a high-fidelity neural implicit dense visual Simultaneous Localization and Mapping (SLAM) system, termed DF-SLAM. In our work, we employ dictionary factors for scene representation, encoding the geometry and appearance information of the scene as a combination of basis and coefficient factors. Compared to neural implicit SLAM methods that directly encode scene information as features, our method exhibits superior scene detail reconstruction capabilities and more efficient memory usage, while our model size is insensitive to the size of the scene map, making our method more suitable for large-scale scenes. Additionally, we employ feature integration rendering to accelerate color rendering speed while ensuring color rendering quality, further enhancing the real-time performance of our neural SLAM method. Extensive experiments on synthetic and real-world datasets demonstrate that our method is competitive with existing state-of-the-art neural implicit SLAM methods in terms of real-time performance, localization accuracy, and scene reconstruction quality. Our source code is available at this https URL.
我们介绍了一种高保真度的神经隐式密集视觉同时定位与映射(SLAM)系统,称为DF-SLAM。在我们的工作中,我们使用词典因子来表示场景表示,将场景的几何和视觉信息表示为基和系数因子的组合。与直接编码场景信息的神经隐式SLAM方法相比,我们的方法具有卓越的场景详细信息重构能力和更高效的内存使用率,而我们的模型大小对场景图的大小不敏感,使得我们的方法更适合于大规模场景。此外,我们还使用特征集成渲染来加速色彩渲染速度,同时确保色彩渲染质量,进一步提高了我们神经SLAM方法的实时性能。在合成和真实世界数据集上进行的大量实验证明,我们的方法在实时性能、定位准确性和场景重建质量方面与现有神经隐式SLAM方法具有竞争性。我们的源代码可在此处访问:https:// this URL。
https://arxiv.org/abs/2404.17876
Taking over arbitrary tasks like humans do with a mobile service robot in open-world settings requires a holistic scene perception for decision-making and high-level control. This paper presents a human-inspired scene perception model to minimize the gap between human and robotic capabilities. The approach takes over fundamental neuroscience concepts, such as a triplet perception split into recognition, knowledge representation, and knowledge interpretation. A recognition system splits the background and foreground to integrate exchangeable image-based object detectors and SLAM, a multi-layer knowledge base represents scene information in a hierarchical structure and offers interfaces for high-level control, and knowledge interpretation methods deploy spatio-temporal scene analysis and perceptual learning for self-adjustment. A single-setting ablation study is used to evaluate the impact of each component on the overall performance for a fetch-and-carry scenario in two simulated and one real-world environment.
在开放世界环境中,使用类似于人类的移动机器人接管任意任务需要全局场景感知来进行决策和高级控制。本文提出了一种以人类为导向的场景感知模型,以缩小人类和机器人能力之间的差距。该方法接管了基本神经科学概念,如三元感知分为识别、知识表示和知识解释。一个识别系统将背景和前景分割为可交换的图像为基础的物体检测和SLAM,多层知识库以分层结构表示场景信息,并提供高级控制接口,知识解释方法部署了时空场景分析和高感知学习以自我调整。采用单设置消融实验来评估每个组件对在两个模拟和现实世界 fetch-and-carry 场景中总体性能的影响。
https://arxiv.org/abs/2404.17791
Simultaneous localization and mapping (SLAM), i.e., the reconstruction of the environment represented by a (3D) map and the concurrent pose estimation, has made astonishing progress. Meanwhile, large scale applications aiming at the data collection in complex environments like factory halls or construction sites are becoming feasible. However, in contrast to small scale scenarios with building interiors separated to single rooms, shop floors or construction areas require measures at larger distances in potentially texture less areas under difficult illumination. Pose estimation is further aggravated since no GNSS measures are available as it is usual for such indoor applications. In our work, we realize data collection in a large factory hall by a robot system equipped with four stereo cameras as well as a 3D laser scanner. We apply our state-of-the-art LiDAR and visual SLAM approaches and discuss the respective pros and cons of the different sensor types for trajectory estimation and dense map generation in such an environment. Additionally, dense and accurate depth maps are generated by 3D Gaussian splatting, which we plan to use in the context of our project aiming on the automatic construction and site monitoring.
同时定位与映射(SLAM),即通过(3D)地图重建环境,同时进行姿态估计,已经取得了令人惊讶的进展。与此同时,旨在在复杂环境中进行数据收集的大规模应用变得更加可行。然而,与单间建筑内部的小规模场景相比,车间或建筑区需要在大规模距离内测量在可能纹理不足的区域上的措施。由于通常没有GNSS测量方法,因此对于这种室内应用,姿态估计进一步加剧。在我们的工作中,我们通过配备四个立体摄像头和3D激光扫描仪的机器人系统在大型工厂车间中实现了数据收集。我们应用了最先进的LiDAR和视觉SLAM方法,并讨论了不同传感器类型对于轨迹估计和密集地图生成的优缺点。此外,通过3D高斯扩展生成密集且准确的深度图,我们计划在致力于自动建筑和场地监测的项目中使用。
https://arxiv.org/abs/2404.17215
We introduce a new system for Multi-Session SLAM, which tracks camera motion across multiple disjoint videos under a single global reference. Our approach couples the prediction of optical flow with solver layers to estimate camera pose. The backbone is trained end-to-end using a novel differentiable solver for wide-baseline two-view pose. The full system can connect disjoint sequences, perform visual odometry, and global optimization. Compared to existing approaches, our design is accurate and robust to catastrophic failures. Code is available at this http URL
我们介绍了一种新的多会话SLAM系统,该系统在单个全局参考下跟踪相机运动。我们的方法将预测光流与求解层相结合来估计相机姿态。骨架使用一种新的具有差分隐私的求解器进行端到端训练,用于估计宽基线两视图姿态。完整的系统可以连接离散序列,执行视觉姿态估计和全局优化。与现有方法相比,我们的设计准确且对灾难性故障具有鲁棒性。代码可在此处下载:http://www.example.com
https://arxiv.org/abs/2404.15263
This project has conducted research on robot path planning based on Visual SLAM. The main work of this project is as follows: (1) Construction of Visual SLAM system. Research has been conducted on the basic architecture of Visual SLAM. A Visual SLAM system is developed based on ORB-SLAM3 system, which can conduct dense point cloud mapping. (2) The map suitable for two-dimensional path planning is obtained through map conversion. This part converts the dense point cloud map obtained by Visual SLAM system into an octomap and then performs projection transformation to the grid map. The map conversion converts the dense point cloud map containing a large amount of redundant map information into an extremely lightweight grid map suitable for path planning. (3) Research on path planning algorithm based on reinforcement learning. This project has conducted experimental comparisons between the Q-learning algorithm, the DQN algorithm, and the SARSA algorithm, and found that DQN is the algorithm with the fastest convergence and best performance in high-dimensional complex environments. This project has conducted experimental verification of the Visual SLAM system in a simulation environment. The experimental results obtained based on open-source dataset and self-made dataset prove the feasibility and effectiveness of the designed Visual SLAM system. At the same time, this project has also conducted comparative experiments on the three reinforcement learning algorithms under the same experimental condition to obtain the optimal algorithm under the experimental condition.
本项目基于视觉SLAM进行了机器人路径规划的研究。本项目的主要工作如下: (1)构建了视觉SLAM系统的基本架构。本项目基于ORB-SLAM3系统开发了视觉SLAM系统,该系统可以进行密集点云映射。 (2)通过地图转换获得了二维路径规划地图。这部分将视觉SLAM系统获得的密集点云地图转换为八叉树映射,然后对网格图进行投影变换。地图转换将包含大量冗余地图信息的密集点云地图转换为极轻的网格地图,适于路径规划。 (3)基于强化学习路径规划算法的 research。本项目对Q-学习算法、DQN算法和SARSA算法进行了实验比较,发现DQN是在高维复杂环境中具有最快收敛速度和最佳性能的算法。本项目在仿真环境中对视觉SLAM系统进行了实验验证。基于开源数据集和自定义数据集的实验结果证明了设计的视觉SLAM系统的可行性和有效性。同时,本项目还在相同实验条件下对三种强化学习算法进行了比较,以获得在实验条件下最优的算法。
https://arxiv.org/abs/2404.14077
Diffusion models have significantly advanced the state of the art in image, audio, and video generation tasks. However, their applications in practical scenarios are hindered by slow inference speed. Drawing inspiration from the approximation strategies utilized in consistency models, we propose the Sub-path Linear Approximation Model (SLAM), which accelerates diffusion models while maintaining high-quality image generation. SLAM treats the PF-ODE trajectory as a series of PF-ODE sub-paths divided by sampled points, and harnesses sub-path linear (SL) ODEs to form a progressive and continuous error estimation along each individual PF-ODE sub-path. The optimization on such SL-ODEs allows SLAM to construct denoising mappings with smaller cumulative approximated errors. An efficient distillation method is also developed to facilitate the incorporation of more advanced diffusion models, such as latent diffusion models. Our extensive experimental results demonstrate that SLAM achieves an efficient training regimen, requiring only 6 A100 GPU days to produce a high-quality generative model capable of 2 to 4-step generation with high performance. Comprehensive evaluations on LAION, MS COCO 2014, and MS COCO 2017 datasets also illustrate that SLAM surpasses existing acceleration methods in few-step generation tasks, achieving state-of-the-art performance both on FID and the quality of the generated images.
扩散模型在图像、音频和视频生成任务方面显著提高了先进水平。然而,在实际场景中,它们的推理速度较慢,从而限制了其应用。从一致性模型中使用的逼近策略中汲取灵感,我们提出了Sub-path Linear Approximation Model(SLAM),它通过保持高质图像生成的同时加速扩散模型而得到了发展。SLAM将PF-ODE轨迹视为一系列通过采样的点分隔的PF-ODE子路径,并利用子路径线性(SL) ODE形成每个PF-ODE子路径的渐进和连续误差估计。在SL-ODE上进行优化允许SLAM构建具有较小累积近似误差的去噪映射。还开发了一种有效的去雾方法,以促进更复杂的扩散模型的引入,例如潜在扩散模型。我们的广泛实验结果表明,SLAM实现了高效的训练方法,只需6个A100 GPU天的时间就能生产出具有2到4步生成能力的高质量生成模型,具有出色的性能。对LAION、MS COCO 2014和MS COCO 2017数据集的全面评估还证明了SLAM在几步生成任务中超越了现有加速方法,同时在FID和生成图像的质量方面实现了最先进的性能。
https://arxiv.org/abs/2404.13903
Neural Radiance Field (NeRF) has garnered significant attention from both academia and industry due to its intrinsic advantages, particularly its implicit representation and novel view synthesis capabilities. With the rapid advancements in deep learning, a multitude of methods have emerged to explore the potential applications of NeRF in the domain of Autonomous Driving (AD). However, a conspicuous void is apparent within the current literature. To bridge this gap, this paper conducts a comprehensive survey of NeRF's applications in the context of AD. Our survey is structured to categorize NeRF's applications in Autonomous Driving (AD), specifically encompassing perception, 3D reconstruction, simultaneous localization and mapping (SLAM), and simulation. We delve into in-depth analysis and summarize the findings for each application category, and conclude by providing insights and discussions on future directions in this field. We hope this paper serves as a comprehensive reference for researchers in this domain. To the best of our knowledge, this is the first survey specifically focused on the applications of NeRF in the Autonomous Driving domain.
Neural Radiance Field(NeRF)因其固有优势,特别是其隐式表示和新视图合成能力,在学术界和产业界都引起了显著关注。随着深度学习的快速发展,为探索NeRF在自动驾驶(AD)领域的潜在应用,已经涌现出了许多方法。然而,当前文献中显然存在一个明显的空白。为了填补这一空白,本文对NeRF在AD领域中的应用进行全面调查。我们的调查旨在对NeRF的每个应用进行分类,包括感知、3D重建、同时定位与映射(SLAM)和仿真。我们深入分析每个应用类别,并总结了每个应用类别的发现。最后,我们提供了关于未来该领域的发展方向以及见解和讨论。我们希望,本文将成为该领域研究人员的全面参考。据我们所知,这是第一部专门关注NeRF在AD领域应用的调查。
https://arxiv.org/abs/2404.13816
So far, planetary surface exploration depends on various mobile robot platforms. The autonomous navigation and decision-making of these mobile robots in complex terrains largely rely on their terrain-aware perception, localization and mapping capabilities. In this paper we release the TAIL-Plus dataset, a new challenging dataset in deformable granular environments for planetary exploration robots, which is an extension to our previous work, TAIL (Terrain-Aware multI-modaL) dataset. We conducted field experiments on beaches that are considered as planetary surface analog environments for diverse sandy terrains. In TAIL-Plus dataset, we provide more sequences with multiple loops and expand the scene from day to night. Benefit from our sensor suite with modular design, we use both wheeled and quadruped robots for data collection. The sensors include a 3D LiDAR, three downward RGB-D cameras, a pair of global-shutter color cameras that can be used as a forward-looking stereo camera, an RTK-GPS device and an extra IMU. Our datasets are intended to help researchers developing multi-sensor simultaneous localization and mapping (SLAM) algorithms for robots in unstructured, deformable granular terrains. Our datasets and supplementary materials will be available at \url{this https URL}.
到目前为止,行星表面探索主要依赖于各种移动机器人平台。这些移动机器人在复杂的地形环境中进行自主导航和决策,很大程度上取决于它们的地面感知、定位和映射能力。在本文中,我们发布了TAIL-Plus数据集,一种用于行星探索机器人的新型挑战性数据集,它是TAIL (地形感知多模态)数据集的扩展。我们在考虑作为行星表面模拟环境的沙滩上进行了现场实验。在TAIL-Plus数据集中,我们提供了更多具有多个循环的序列,并从白天到黑夜扩大了场景。得益于我们传感器套件的模块化设计,我们使用轮式和四足机器人进行数据采集。传感器包括一个3D激光雷达、三个向下俯视的RGB-D相机、一对可以作为前向立体相机的全球照相机、一个实时全球定位系统(RTK-GPS)设备和一只额外的IMU。我们的数据集旨在帮助研究人员开发用于无结构、变形颗粒地形中的机器人的多传感器同时定位和映射(SLAM)算法。我们的数据集和补充材料将 available at this <https:// this URL >。
https://arxiv.org/abs/2404.13600
We introduce EC-SLAM, a real-time dense RGB-D simultaneous localization and mapping (SLAM) system utilizing Neural Radiance Fields (NeRF). Although recent NeRF-based SLAM systems have demonstrated encouraging outcomes, they have yet to completely leverage NeRF's capability to constrain pose optimization. By employing an effectively constrained global bundle adjustment (BA) strategy, our system makes use of NeRF's implicit loop closure correction capability. This improves the tracking accuracy by reinforcing the constraints on the keyframes that are most pertinent to the optimized current frame. In addition, by implementing a feature-based and uniform sampling strategy that minimizes the number of ineffective constraint points for pose optimization, we mitigate the effects of random sampling in NeRF. EC-SLAM utilizes sparse parametric encodings and the truncated signed distance field (TSDF) to represent the map in order to facilitate efficient fusion, resulting in reduced model parameters and accelerated convergence velocity. A comprehensive evaluation conducted on the Replica, ScanNet, and TUM datasets showcases cutting-edge performance, including enhanced reconstruction accuracy resulting from precise pose estimation, 21 Hz run time, and tracking precision improvements of up to 50\%. The source code is available at this https URL.
我们提出了EC-SLAM,一种利用Neural Radiance Fields(NeRF)实现实时密集的RGB-D同时定位和映射(SLAM)系统。尽管基于NeRF的SLAM系统已经取得了鼓舞人心的成果,但它们尚未完全利用NeRF约束优化姿态的能力。通过采用一种有效约束全局 bundle adjustment(BA)策略,我们的系统利用了NeRF的隐式环路纠正能力。这提高了跟踪精度,通过加强与优化当前帧关键帧相关的约束来提高跟踪精度。此外,通过实现基于特征的统一采样策略,最小化姿态优化的有效约束点的数量,我们减轻了NeRF中随机抽样的影响。EC-SLAM利用稀疏参数编码和截断签名距离场(TSDF)表示地图,以促进高效的融合,从而导致模型参数减少和加速收敛速度。在 Replica、ScanNet 和 TUM 数据集上进行全面评估,展示了尖端性能,包括精确姿态估计、21 Hz 运行时间和跟踪精度提高50%等。源代码可在此处访问:https:// this URL.
https://arxiv.org/abs/2404.13346
We present SLAIM - Simultaneous Localization and Implicit Mapping. We propose a novel coarse-to-fine tracking model tailored for Neural Radiance Field SLAM (NeRF-SLAM) to achieve state-of-the-art tracking performance. Notably, existing NeRF-SLAM systems consistently exhibit inferior tracking performance compared to traditional SLAM algorithms. NeRF-SLAM methods solve camera tracking via image alignment and photometric bundle-adjustment. Such optimization processes are difficult to optimize due to the narrow basin of attraction of the optimization loss in image space (local minima) and the lack of initial correspondences. We mitigate these limitations by implementing a Gaussian pyramid filter on top of NeRF, facilitating a coarse-to-fine tracking optimization strategy. Furthermore, NeRF systems encounter challenges in converging to the right geometry with limited input views. While prior approaches use a Signed-Distance Function (SDF)-based NeRF and directly supervise SDF values by approximating ground truth SDF through depth measurements, this often results in suboptimal geometry. In contrast, our method employs a volume density representation and introduces a novel KL regularizer on the ray termination distribution, constraining scene geometry to consist of empty space and opaque surfaces. Our solution implements both local and global bundle-adjustment to produce a robust (coarse-to-fine) and accurate (KL regularizer) SLAM solution. We conduct experiments on multiple datasets (ScanNet, TUM, Replica) showing state-of-the-art results in tracking and in reconstruction accuracy.
我们提出了SLAIM - 同时定位和隐式映射。我们针对Neural Radiance Field SLAM(NeRF-SLAM)提出了一种新颖的粗-到细跟踪模型,以实现最先进的跟踪性能。值得注意的是,现有的NeRF-SLAM系统与传统SLAM算法相比,跟踪性能 consistently较差。通过图像对齐和光度 bundle-adjustment,NeRF-SLAM方法通过优化损失函数在图像空间中的狭窄吸引域(局部最小值)来解决相机跟踪问题。由于优化损失函数在图像空间中的狭窄吸引域和缺乏初始对应关系,这种优化过程很难优化。通过在NeRF上实现高斯金字塔滤波器,我们通过促进粗-到细跟踪优化策略来缓解这些限制。此外,NeRF系统在有限输入视图下难以收敛到正确的几何形状。虽然之前的 approaches 使用基于 Signed-Distance Function (SDF) 的NeRF并直接通过深度测量通过近似地面真实 SDF 来指导SDF值,但通常会导致次优的几何形状。相比之下,我们的方法采用体积密度表示,并在光线终止分布上引入了一种新颖的KL正则化器,将场景几何限制为空旷空间和透明表面。我们的解决方案同时实现局部和全局束调整,产生一个稳健(粗-到细)和准确的SLAM解决方案。我们在多个数据集(ScanNet,TUM,Replica)上进行实验,展示了在跟踪和重建精度方面的最先进结果。
https://arxiv.org/abs/2404.11419
This paper presents a vision and perception research dataset collected in Rome, featuring RGB data, 3D point clouds, IMU, and GPS data. We introduce a new benchmark targeting visual odometry and SLAM, to advance the research in autonomous robotics and computer vision. This work complements existing datasets by simultaneously addressing several issues, such as environment diversity, motion patterns, and sensor frequency. It uses up-to-date devices and presents effective procedures to accurately calibrate the intrinsic and extrinsic of the sensors while addressing temporal synchronization. During recording, we cover multi-floor buildings, gardens, urban and highway scenarios. Combining handheld and car-based data collections, our setup can simulate any robot (quadrupeds, quadrotors, autonomous vehicles). The dataset includes an accurate 6-dof ground truth based on a novel methodology that refines the RTK-GPS estimate with LiDAR point clouds through Bundle Adjustment. All sequences divided in training and testing are accessible through our website.
本文介绍了一个收集自罗马的数据集,包括RGB数据、3D点云、IMU和GPS数据。我们提出了一个新的基准,针对视觉导航和SLAM,以促进自主机器人和计算机视觉的研究。这项工作通过同时解决多个问题,如环境多样性、运动模式和传感器频率,补充了现有的数据集。它使用最先进的设备,并提供了准确校准传感器固有和外在参数的有效方法,同时解决时间同步问题。在记录过程中,我们覆盖了多层建筑、花园和城市和高速公路场景。结合手持和车载数据收集,我们的系统可以模拟任何机器人(四足、四旋翼、自动驾驶车辆)。该数据集基于一种新方法,通过利用激光雷达点云通过束调整对RTK-GPS估计进行优化。所有训练和测试序列都可以通过我们的网站访问。
https://arxiv.org/abs/2404.11322
Positioning is a prominent field of study, notably focusing on Visual Inertial Odometry (VIO) and Simultaneous Localization and Mapping (SLAM) methods. Despite their advancements, these methods often encounter dead-reckoning errors that leads to considerable drift in estimated platform motion especially during long traverses. In such cases, the drift error is not negligible and should be rectified. Our proposed approach minimizes the drift error by correcting the estimated motion generated by any SLAM method at each epoch. Our methodology treats positioning measurements rendered by the SLAM solution as random variables formulated jointly in a multivariate distribution. In this setting, The correction of the drift becomes equivalent to finding the mode of this multivariate distribution which jointly maximizes the likelihood of a set of relevant geo-spatial priors about the platform motion and environment. Our method is integrable into any SLAM/VIO method as an correction module. Our experimental results shows the effectiveness of our approach in minimizing the drift error by 10x in long treverses.
定位是一个突出的研究领域,特别是集中在视觉惯性导航(VIO)和同时定位与映射(SLAM)方法上。尽管这些方法取得了进步,但它们通常会遭遇死估计误差,导致在长时间穿越过程中估计平台运动的大幅偏差。在这种情况下,偏差误差不容忽视,应该得到纠正。我们提出的方法通过在每个时刻纠正由SLAM方法生成的估计运动来最小化偏差误差。我们的方法将定位测量由SLAM解决方案生成的随机变量视为多维分布中的随机变量。在这样一个设置中,偏差误差的纠正等同于找到这个多维分布中使关于平台运动和相关环境的几何先验的概率最大化的模式。我们的方法可以集成到任何SLAM/VIO方法中作为修正模块。我们的实验结果表明,通过我们的方法可以有效地将偏差误差降低10倍,在长穿越过程中。
https://arxiv.org/abs/2404.10140
Simultaneous Localization and Mapping systems are a key enabler for positioning in both handheld and robotic applications. The Hilti SLAM Challenges organized over the past years have been successful at benchmarking some of the world's best SLAM Systems with high accuracy. However, more capabilities of these systems are yet to be explored, such as platform agnosticism across varying sensor suites and multi-session SLAM. These factors indirectly serve as an indicator of robustness and ease of deployment in real-world applications. There exists no dataset plus benchmark combination publicly available, which considers these factors combined. The Hilti SLAM Challenge 2023 Dataset and Benchmark addresses this issue. Additionally, we propose a novel fiducial marker design for a pre-surveyed point on the ground to be observable from an off-the-shelf LiDAR mounted on a robot, and an algorithm to estimate its position at mm-level accuracy. Results from the challenge show an increase in overall participation, single-session SLAM systems getting increasingly accurate, successfully operating across varying sensor suites, but relatively few participants performing multi-session SLAM.
同时定位与映射系统是实现便携式和机器人应用的关键推动力。在过去的几年里,Hilti SLAM挑战赛已经成功地基准测试了世界上一些最好的SLAM系统,取得了高精度的成果。然而,这些系统还有许多功能有待探索,例如在各种传感器套件之间的平台无关性以及多会话SLAM。这些因素间接地表明了在现实应用中的稳健性和易用性。目前还没有公开可用的数据集和基准组合,同时考虑了这些因素。Hilti SLAM挑战2023数据集和基准解决了这个问题。此外,我们提出了一个新型的二维标记设计,用于从机器人上安装的便携式激光雷达上观察到预先调查的点的位置,并估计其达到毫米级的准确性。挑战赛的结果表明,总体参与度增加,单会话SLAM系统越来越精确,在各种传感器套件上成功运行,但相对较少参与者完成多会话SLAM。
https://arxiv.org/abs/2404.09765
This study presents a comprehensive multi-sensor dataset designed for 3D mapping in challenging indoor and outdoor environments. The dataset comprises data from infrared cameras, depth cameras, LiDAR, and 4D millimeter-wave radar, facilitating exploration of advanced perception and mapping techniques. Integration of diverse sensor data enhances perceptual capabilities in extreme conditions such as rain, snow, and uneven road surfaces. The dataset also includes interactive robot data at different speeds indoors and outdoors, providing a realistic background environment. Slam comparisons between similar routes are conducted, analyzing the influence of different complex scenes on various sensors. Various SLAM algorithms are employed to process the dataset, revealing performance differences among algorithms in different scenarios. In summary, this dataset addresses the problem of data scarcity in special environments, fostering the development of perception and mapping algorithms for extreme conditions. Leveraging multi-sensor data including infrared, depth cameras, LiDAR, 4D millimeter-wave radar, and robot interactions, the dataset advances intelligent mapping and perception capabilities.Our dataset is available at this https URL.
本研究旨在为具有挑战性的室内和室外环境3D建模创建一个全面的传感器数据集。数据集包括来自红外摄像机、深度相机、激光雷达和4D毫米波雷达的数据,促进了先进感知和建模技术的探索。 diverse传感器数据的集成增强了在极端条件下的感知能力,例如雨、雪和不平整的路面。数据集中的室内和室外交互式机器人数据以不同速度运行,提供了一个真实的背景环境。在类似路线的SLAM比较中进行了研究,分析了不同场景下各种复杂场景对各种传感器的影響。采用各种SLAM算法处理数据,揭示了不同情景中算法之间性能的差异。总之,这个数据集解决了在特殊环境中的数据稀缺问题,推动了为极端情况下的感知和建模算法的发展。通过利用包括红外、深度相机、激光雷达、4D毫米波雷达和机器人交互在内的多传感器数据,数据集提高了智能建模和感知能力。我们的数据集可在以下链接处获得:https://www.example.com。
https://arxiv.org/abs/2404.09622