This project has conducted research on robot path planning based on Visual SLAM. The main work of this project is as follows: (1) Construction of Visual SLAM system. Research has been conducted on the basic architecture of Visual SLAM. A Visual SLAM system is developed based on ORB-SLAM3 system, which can conduct dense point cloud mapping. (2) The map suitable for two-dimensional path planning is obtained through map conversion. This part converts the dense point cloud map obtained by Visual SLAM system into an octomap and then performs projection transformation to the grid map. The map conversion converts the dense point cloud map containing a large amount of redundant map information into an extremely lightweight grid map suitable for path planning. (3) Research on path planning algorithm based on reinforcement learning. This project has conducted experimental comparisons between the Q-learning algorithm, the DQN algorithm, and the SARSA algorithm, and found that DQN is the algorithm with the fastest convergence and best performance in high-dimensional complex environments. This project has conducted experimental verification of the Visual SLAM system in a simulation environment. The experimental results obtained based on open-source dataset and self-made dataset prove the feasibility and effectiveness of the designed Visual SLAM system. At the same time, this project has also conducted comparative experiments on the three reinforcement learning algorithms under the same experimental condition to obtain the optimal algorithm under the experimental condition.
本项目基于视觉SLAM进行了机器人路径规划的研究。本项目的主要工作如下: (1)构建了视觉SLAM系统的基本架构。本项目基于ORB-SLAM3系统开发了视觉SLAM系统,该系统可以进行密集点云映射。 (2)通过地图转换获得了二维路径规划地图。这部分将视觉SLAM系统获得的密集点云地图转换为八叉树映射,然后对网格图进行投影变换。地图转换将包含大量冗余地图信息的密集点云地图转换为极轻的网格地图,适于路径规划。 (3)基于强化学习路径规划算法的 research。本项目对Q-学习算法、DQN算法和SARSA算法进行了实验比较,发现DQN是在高维复杂环境中具有最快收敛速度和最佳性能的算法。本项目在仿真环境中对视觉SLAM系统进行了实验验证。基于开源数据集和自定义数据集的实验结果证明了设计的视觉SLAM系统的可行性和有效性。同时,本项目还在相同实验条件下对三种强化学习算法进行了比较,以获得在实验条件下最优的算法。
https://arxiv.org/abs/2404.14077
Diffusion models have significantly advanced the state of the art in image, audio, and video generation tasks. However, their applications in practical scenarios are hindered by slow inference speed. Drawing inspiration from the approximation strategies utilized in consistency models, we propose the Sub-path Linear Approximation Model (SLAM), which accelerates diffusion models while maintaining high-quality image generation. SLAM treats the PF-ODE trajectory as a series of PF-ODE sub-paths divided by sampled points, and harnesses sub-path linear (SL) ODEs to form a progressive and continuous error estimation along each individual PF-ODE sub-path. The optimization on such SL-ODEs allows SLAM to construct denoising mappings with smaller cumulative approximated errors. An efficient distillation method is also developed to facilitate the incorporation of more advanced diffusion models, such as latent diffusion models. Our extensive experimental results demonstrate that SLAM achieves an efficient training regimen, requiring only 6 A100 GPU days to produce a high-quality generative model capable of 2 to 4-step generation with high performance. Comprehensive evaluations on LAION, MS COCO 2014, and MS COCO 2017 datasets also illustrate that SLAM surpasses existing acceleration methods in few-step generation tasks, achieving state-of-the-art performance both on FID and the quality of the generated images.
扩散模型在图像、音频和视频生成任务方面显著提高了先进水平。然而,在实际场景中,它们的推理速度较慢,从而限制了其应用。从一致性模型中使用的逼近策略中汲取灵感,我们提出了Sub-path Linear Approximation Model(SLAM),它通过保持高质图像生成的同时加速扩散模型而得到了发展。SLAM将PF-ODE轨迹视为一系列通过采样的点分隔的PF-ODE子路径,并利用子路径线性(SL) ODE形成每个PF-ODE子路径的渐进和连续误差估计。在SL-ODE上进行优化允许SLAM构建具有较小累积近似误差的去噪映射。还开发了一种有效的去雾方法,以促进更复杂的扩散模型的引入,例如潜在扩散模型。我们的广泛实验结果表明,SLAM实现了高效的训练方法,只需6个A100 GPU天的时间就能生产出具有2到4步生成能力的高质量生成模型,具有出色的性能。对LAION、MS COCO 2014和MS COCO 2017数据集的全面评估还证明了SLAM在几步生成任务中超越了现有加速方法,同时在FID和生成图像的质量方面实现了最先进的性能。
https://arxiv.org/abs/2404.13903
Neural Radiance Field (NeRF) has garnered significant attention from both academia and industry due to its intrinsic advantages, particularly its implicit representation and novel view synthesis capabilities. With the rapid advancements in deep learning, a multitude of methods have emerged to explore the potential applications of NeRF in the domain of Autonomous Driving (AD). However, a conspicuous void is apparent within the current literature. To bridge this gap, this paper conducts a comprehensive survey of NeRF's applications in the context of AD. Our survey is structured to categorize NeRF's applications in Autonomous Driving (AD), specifically encompassing perception, 3D reconstruction, simultaneous localization and mapping (SLAM), and simulation. We delve into in-depth analysis and summarize the findings for each application category, and conclude by providing insights and discussions on future directions in this field. We hope this paper serves as a comprehensive reference for researchers in this domain. To the best of our knowledge, this is the first survey specifically focused on the applications of NeRF in the Autonomous Driving domain.
Neural Radiance Field(NeRF)因其固有优势,特别是其隐式表示和新视图合成能力,在学术界和产业界都引起了显著关注。随着深度学习的快速发展,为探索NeRF在自动驾驶(AD)领域的潜在应用,已经涌现出了许多方法。然而,当前文献中显然存在一个明显的空白。为了填补这一空白,本文对NeRF在AD领域中的应用进行全面调查。我们的调查旨在对NeRF的每个应用进行分类,包括感知、3D重建、同时定位与映射(SLAM)和仿真。我们深入分析每个应用类别,并总结了每个应用类别的发现。最后,我们提供了关于未来该领域的发展方向以及见解和讨论。我们希望,本文将成为该领域研究人员的全面参考。据我们所知,这是第一部专门关注NeRF在AD领域应用的调查。
https://arxiv.org/abs/2404.13816
So far, planetary surface exploration depends on various mobile robot platforms. The autonomous navigation and decision-making of these mobile robots in complex terrains largely rely on their terrain-aware perception, localization and mapping capabilities. In this paper we release the TAIL-Plus dataset, a new challenging dataset in deformable granular environments for planetary exploration robots, which is an extension to our previous work, TAIL (Terrain-Aware multI-modaL) dataset. We conducted field experiments on beaches that are considered as planetary surface analog environments for diverse sandy terrains. In TAIL-Plus dataset, we provide more sequences with multiple loops and expand the scene from day to night. Benefit from our sensor suite with modular design, we use both wheeled and quadruped robots for data collection. The sensors include a 3D LiDAR, three downward RGB-D cameras, a pair of global-shutter color cameras that can be used as a forward-looking stereo camera, an RTK-GPS device and an extra IMU. Our datasets are intended to help researchers developing multi-sensor simultaneous localization and mapping (SLAM) algorithms for robots in unstructured, deformable granular terrains. Our datasets and supplementary materials will be available at \url{this https URL}.
到目前为止,行星表面探索主要依赖于各种移动机器人平台。这些移动机器人在复杂的地形环境中进行自主导航和决策,很大程度上取决于它们的地面感知、定位和映射能力。在本文中,我们发布了TAIL-Plus数据集,一种用于行星探索机器人的新型挑战性数据集,它是TAIL (地形感知多模态)数据集的扩展。我们在考虑作为行星表面模拟环境的沙滩上进行了现场实验。在TAIL-Plus数据集中,我们提供了更多具有多个循环的序列,并从白天到黑夜扩大了场景。得益于我们传感器套件的模块化设计,我们使用轮式和四足机器人进行数据采集。传感器包括一个3D激光雷达、三个向下俯视的RGB-D相机、一对可以作为前向立体相机的全球照相机、一个实时全球定位系统(RTK-GPS)设备和一只额外的IMU。我们的数据集旨在帮助研究人员开发用于无结构、变形颗粒地形中的机器人的多传感器同时定位和映射(SLAM)算法。我们的数据集和补充材料将 available at this <https:// this URL >。
https://arxiv.org/abs/2404.13600
We introduce EC-SLAM, a real-time dense RGB-D simultaneous localization and mapping (SLAM) system utilizing Neural Radiance Fields (NeRF). Although recent NeRF-based SLAM systems have demonstrated encouraging outcomes, they have yet to completely leverage NeRF's capability to constrain pose optimization. By employing an effectively constrained global bundle adjustment (BA) strategy, our system makes use of NeRF's implicit loop closure correction capability. This improves the tracking accuracy by reinforcing the constraints on the keyframes that are most pertinent to the optimized current frame. In addition, by implementing a feature-based and uniform sampling strategy that minimizes the number of ineffective constraint points for pose optimization, we mitigate the effects of random sampling in NeRF. EC-SLAM utilizes sparse parametric encodings and the truncated signed distance field (TSDF) to represent the map in order to facilitate efficient fusion, resulting in reduced model parameters and accelerated convergence velocity. A comprehensive evaluation conducted on the Replica, ScanNet, and TUM datasets showcases cutting-edge performance, including enhanced reconstruction accuracy resulting from precise pose estimation, 21 Hz run time, and tracking precision improvements of up to 50\%. The source code is available at this https URL.
我们提出了EC-SLAM,一种利用Neural Radiance Fields(NeRF)实现实时密集的RGB-D同时定位和映射(SLAM)系统。尽管基于NeRF的SLAM系统已经取得了鼓舞人心的成果,但它们尚未完全利用NeRF约束优化姿态的能力。通过采用一种有效约束全局 bundle adjustment(BA)策略,我们的系统利用了NeRF的隐式环路纠正能力。这提高了跟踪精度,通过加强与优化当前帧关键帧相关的约束来提高跟踪精度。此外,通过实现基于特征的统一采样策略,最小化姿态优化的有效约束点的数量,我们减轻了NeRF中随机抽样的影响。EC-SLAM利用稀疏参数编码和截断签名距离场(TSDF)表示地图,以促进高效的融合,从而导致模型参数减少和加速收敛速度。在 Replica、ScanNet 和 TUM 数据集上进行全面评估,展示了尖端性能,包括精确姿态估计、21 Hz 运行时间和跟踪精度提高50%等。源代码可在此处访问:https:// this URL.
https://arxiv.org/abs/2404.13346
We present SLAIM - Simultaneous Localization and Implicit Mapping. We propose a novel coarse-to-fine tracking model tailored for Neural Radiance Field SLAM (NeRF-SLAM) to achieve state-of-the-art tracking performance. Notably, existing NeRF-SLAM systems consistently exhibit inferior tracking performance compared to traditional SLAM algorithms. NeRF-SLAM methods solve camera tracking via image alignment and photometric bundle-adjustment. Such optimization processes are difficult to optimize due to the narrow basin of attraction of the optimization loss in image space (local minima) and the lack of initial correspondences. We mitigate these limitations by implementing a Gaussian pyramid filter on top of NeRF, facilitating a coarse-to-fine tracking optimization strategy. Furthermore, NeRF systems encounter challenges in converging to the right geometry with limited input views. While prior approaches use a Signed-Distance Function (SDF)-based NeRF and directly supervise SDF values by approximating ground truth SDF through depth measurements, this often results in suboptimal geometry. In contrast, our method employs a volume density representation and introduces a novel KL regularizer on the ray termination distribution, constraining scene geometry to consist of empty space and opaque surfaces. Our solution implements both local and global bundle-adjustment to produce a robust (coarse-to-fine) and accurate (KL regularizer) SLAM solution. We conduct experiments on multiple datasets (ScanNet, TUM, Replica) showing state-of-the-art results in tracking and in reconstruction accuracy.
我们提出了SLAIM - 同时定位和隐式映射。我们针对Neural Radiance Field SLAM(NeRF-SLAM)提出了一种新颖的粗-到细跟踪模型,以实现最先进的跟踪性能。值得注意的是,现有的NeRF-SLAM系统与传统SLAM算法相比,跟踪性能 consistently较差。通过图像对齐和光度 bundle-adjustment,NeRF-SLAM方法通过优化损失函数在图像空间中的狭窄吸引域(局部最小值)来解决相机跟踪问题。由于优化损失函数在图像空间中的狭窄吸引域和缺乏初始对应关系,这种优化过程很难优化。通过在NeRF上实现高斯金字塔滤波器,我们通过促进粗-到细跟踪优化策略来缓解这些限制。此外,NeRF系统在有限输入视图下难以收敛到正确的几何形状。虽然之前的 approaches 使用基于 Signed-Distance Function (SDF) 的NeRF并直接通过深度测量通过近似地面真实 SDF 来指导SDF值,但通常会导致次优的几何形状。相比之下,我们的方法采用体积密度表示,并在光线终止分布上引入了一种新颖的KL正则化器,将场景几何限制为空旷空间和透明表面。我们的解决方案同时实现局部和全局束调整,产生一个稳健(粗-到细)和准确的SLAM解决方案。我们在多个数据集(ScanNet,TUM,Replica)上进行实验,展示了在跟踪和重建精度方面的最先进结果。
https://arxiv.org/abs/2404.11419
This paper presents a vision and perception research dataset collected in Rome, featuring RGB data, 3D point clouds, IMU, and GPS data. We introduce a new benchmark targeting visual odometry and SLAM, to advance the research in autonomous robotics and computer vision. This work complements existing datasets by simultaneously addressing several issues, such as environment diversity, motion patterns, and sensor frequency. It uses up-to-date devices and presents effective procedures to accurately calibrate the intrinsic and extrinsic of the sensors while addressing temporal synchronization. During recording, we cover multi-floor buildings, gardens, urban and highway scenarios. Combining handheld and car-based data collections, our setup can simulate any robot (quadrupeds, quadrotors, autonomous vehicles). The dataset includes an accurate 6-dof ground truth based on a novel methodology that refines the RTK-GPS estimate with LiDAR point clouds through Bundle Adjustment. All sequences divided in training and testing are accessible through our website.
本文介绍了一个收集自罗马的数据集,包括RGB数据、3D点云、IMU和GPS数据。我们提出了一个新的基准,针对视觉导航和SLAM,以促进自主机器人和计算机视觉的研究。这项工作通过同时解决多个问题,如环境多样性、运动模式和传感器频率,补充了现有的数据集。它使用最先进的设备,并提供了准确校准传感器固有和外在参数的有效方法,同时解决时间同步问题。在记录过程中,我们覆盖了多层建筑、花园和城市和高速公路场景。结合手持和车载数据收集,我们的系统可以模拟任何机器人(四足、四旋翼、自动驾驶车辆)。该数据集基于一种新方法,通过利用激光雷达点云通过束调整对RTK-GPS估计进行优化。所有训练和测试序列都可以通过我们的网站访问。
https://arxiv.org/abs/2404.11322
Positioning is a prominent field of study, notably focusing on Visual Inertial Odometry (VIO) and Simultaneous Localization and Mapping (SLAM) methods. Despite their advancements, these methods often encounter dead-reckoning errors that leads to considerable drift in estimated platform motion especially during long traverses. In such cases, the drift error is not negligible and should be rectified. Our proposed approach minimizes the drift error by correcting the estimated motion generated by any SLAM method at each epoch. Our methodology treats positioning measurements rendered by the SLAM solution as random variables formulated jointly in a multivariate distribution. In this setting, The correction of the drift becomes equivalent to finding the mode of this multivariate distribution which jointly maximizes the likelihood of a set of relevant geo-spatial priors about the platform motion and environment. Our method is integrable into any SLAM/VIO method as an correction module. Our experimental results shows the effectiveness of our approach in minimizing the drift error by 10x in long treverses.
定位是一个突出的研究领域,特别是集中在视觉惯性导航(VIO)和同时定位与映射(SLAM)方法上。尽管这些方法取得了进步,但它们通常会遭遇死估计误差,导致在长时间穿越过程中估计平台运动的大幅偏差。在这种情况下,偏差误差不容忽视,应该得到纠正。我们提出的方法通过在每个时刻纠正由SLAM方法生成的估计运动来最小化偏差误差。我们的方法将定位测量由SLAM解决方案生成的随机变量视为多维分布中的随机变量。在这样一个设置中,偏差误差的纠正等同于找到这个多维分布中使关于平台运动和相关环境的几何先验的概率最大化的模式。我们的方法可以集成到任何SLAM/VIO方法中作为修正模块。我们的实验结果表明,通过我们的方法可以有效地将偏差误差降低10倍,在长穿越过程中。
https://arxiv.org/abs/2404.10140
Simultaneous Localization and Mapping systems are a key enabler for positioning in both handheld and robotic applications. The Hilti SLAM Challenges organized over the past years have been successful at benchmarking some of the world's best SLAM Systems with high accuracy. However, more capabilities of these systems are yet to be explored, such as platform agnosticism across varying sensor suites and multi-session SLAM. These factors indirectly serve as an indicator of robustness and ease of deployment in real-world applications. There exists no dataset plus benchmark combination publicly available, which considers these factors combined. The Hilti SLAM Challenge 2023 Dataset and Benchmark addresses this issue. Additionally, we propose a novel fiducial marker design for a pre-surveyed point on the ground to be observable from an off-the-shelf LiDAR mounted on a robot, and an algorithm to estimate its position at mm-level accuracy. Results from the challenge show an increase in overall participation, single-session SLAM systems getting increasingly accurate, successfully operating across varying sensor suites, but relatively few participants performing multi-session SLAM.
同时定位与映射系统是实现便携式和机器人应用的关键推动力。在过去的几年里,Hilti SLAM挑战赛已经成功地基准测试了世界上一些最好的SLAM系统,取得了高精度的成果。然而,这些系统还有许多功能有待探索,例如在各种传感器套件之间的平台无关性以及多会话SLAM。这些因素间接地表明了在现实应用中的稳健性和易用性。目前还没有公开可用的数据集和基准组合,同时考虑了这些因素。Hilti SLAM挑战2023数据集和基准解决了这个问题。此外,我们提出了一个新型的二维标记设计,用于从机器人上安装的便携式激光雷达上观察到预先调查的点的位置,并估计其达到毫米级的准确性。挑战赛的结果表明,总体参与度增加,单会话SLAM系统越来越精确,在各种传感器套件上成功运行,但相对较少参与者完成多会话SLAM。
https://arxiv.org/abs/2404.09765
This study presents a comprehensive multi-sensor dataset designed for 3D mapping in challenging indoor and outdoor environments. The dataset comprises data from infrared cameras, depth cameras, LiDAR, and 4D millimeter-wave radar, facilitating exploration of advanced perception and mapping techniques. Integration of diverse sensor data enhances perceptual capabilities in extreme conditions such as rain, snow, and uneven road surfaces. The dataset also includes interactive robot data at different speeds indoors and outdoors, providing a realistic background environment. Slam comparisons between similar routes are conducted, analyzing the influence of different complex scenes on various sensors. Various SLAM algorithms are employed to process the dataset, revealing performance differences among algorithms in different scenarios. In summary, this dataset addresses the problem of data scarcity in special environments, fostering the development of perception and mapping algorithms for extreme conditions. Leveraging multi-sensor data including infrared, depth cameras, LiDAR, 4D millimeter-wave radar, and robot interactions, the dataset advances intelligent mapping and perception capabilities.Our dataset is available at this https URL.
本研究旨在为具有挑战性的室内和室外环境3D建模创建一个全面的传感器数据集。数据集包括来自红外摄像机、深度相机、激光雷达和4D毫米波雷达的数据,促进了先进感知和建模技术的探索。 diverse传感器数据的集成增强了在极端条件下的感知能力,例如雨、雪和不平整的路面。数据集中的室内和室外交互式机器人数据以不同速度运行,提供了一个真实的背景环境。在类似路线的SLAM比较中进行了研究,分析了不同场景下各种复杂场景对各种传感器的影響。采用各种SLAM算法处理数据,揭示了不同情景中算法之间性能的差异。总之,这个数据集解决了在特殊环境中的数据稀缺问题,推动了为极端情况下的感知和建模算法的发展。通过利用包括红外、深度相机、激光雷达、4D毫米波雷达和机器人交互在内的多传感器数据,数据集提高了智能建模和感知能力。我们的数据集可在以下链接处获得:https://www.example.com。
https://arxiv.org/abs/2404.09622
Vision-based localization for autonomous driving has been of great interest among researchers. When a pre-built 3D map is not available, the techniques of visual simultaneous localization and mapping (SLAM) are typically adopted. Due to error accumulation, visual SLAM (vSLAM) usually suffers from long-term drift. This paper proposes a framework to increase the localization accuracy by fusing the vSLAM with a deep-learning-based ground-to-satellite (G2S) image registration method. In this framework, a coarse (spatial correlation bound check) to fine (visual odometry consistency check) method is designed to select the valid G2S prediction. The selected prediction is then fused with the SLAM measurement by solving a scaled pose graph problem. To further increase the localization accuracy, we provide an iterative trajectory fusion pipeline. The proposed framework is evaluated on two well-known autonomous driving datasets, and the results demonstrate the accuracy and robustness in terms of vehicle localization.
在自动驾驶中,基于视觉的局部定位一直引起了研究人员的极大兴趣。当预先构建的3D地图不可用时,通常采用视觉同时定位与映射(SLAM)技术。由于误差累积,通常会导致视觉SLAM(vSLAM)长期漂移。本文提出了一种通过融合vSLAM与基于深度学习的地面到卫星(G2S)图像配准方法来提高局部定位精度的框架。在这种框架中,设计了一种粗(空间关联边界检查)到细(视觉观测一致性检查)的方法来选择有效的G2S预测。选择后的预测通过求解标量姿态图问题与SLAM测量进行融合。为了进一步提高局部定位精度,我们提供了一个迭代轨迹融合管道。在两个著名的自动驾驶数据集上进行评估,结果显示在车辆定位方面具有准确性和稳健性。
https://arxiv.org/abs/2404.09169
Simultaneous Localization and Mapping (SLAM) technology has been widely applied in various robotic scenarios, from rescue operations to autonomous driving. However, the generalization of SLAM algorithms remains a significant challenge, as current datasets often lack scalability in terms of platforms and environments. To address this limitation, we present FusionPortableV2, a multi-sensor SLAM dataset featuring notable sensor diversity, varied motion patterns, and a wide range of environmental scenarios. Our dataset comprises $27$ sequences, spanning over $2.5$ hours and collected from four distinct platforms: a handheld suite, wheeled and legged robots, and vehicles. These sequences cover diverse settings, including buildings, campuses, and urban areas, with a total length of $38.7km$. Additionally, the dataset includes ground-truth (GT) trajectories and RGB point cloud maps covering approximately $0.3km^2$. To validate the utility of our dataset in advancing SLAM research, we assess several state-of-the-art (SOTA) SLAM algorithms. Furthermore, we demonstrate the dataset's broad applicability beyond traditional SLAM tasks by investigating its potential for monocular depth estimation. The complete dataset, including sensor data, GT, and calibration details, is accessible at this https URL.
同时定位与映射(SLAM)技术已经在各种机器人场景中得到了广泛应用,从救援行动到自动驾驶。然而,SLAM算法的泛化仍然是一个重要的挑战,因为当前的数据集在平台和环境方面缺乏可扩展性。为了解决这个问题,我们提出了FusionPortableV2,一个包含显著传感器多样性、多样运动模式和广泛环境场景的多传感器SLAM数据集。我们的数据集包括4个不同平台的27个序列,总长度超过2.5小时,来自四个不同的平台:手持设备套装、轮式和腿式机器人以及车辆。这些序列涵盖了各种场景,包括建筑物、校园和城市地区,总长度为38.7公里。此外,数据集还包括地面真实(GT)轨迹和覆盖约0.3平方公里的RGB点云地图。为了验证我们数据在推动SLAM研究方面的实用性,我们评估了几种最先进的(SOTA)SLAM算法。此外,我们通过研究其潜在的单目深度估计能力,证明了数据集在传统SLAM任务之外的广泛应用。完整的数据集,包括传感器数据、GT和校准细节,可以通过此链接访问:https://www.example.com/
https://arxiv.org/abs/2404.08563
Due to budgetary constraints, indoor navigation typically employs 2D LiDAR rather than 3D LiDAR. However, the utilization of 2D LiDAR in Simultaneous Localization And Mapping (SLAM) frequently encounters challenges related to motion degeneracy, particularly in geometrically similar environments. To address this problem, this paper proposes a robust, accurate, and multi-sensor-fused 2D LiDAR SLAM system specifically designed for indoor mobile robots. To commence, the original LiDAR data undergoes meticulous processing through point and line extraction. Leveraging the distinctive characteristics of indoor environments, line-line constraints are established to complement other sensor data effectively, thereby augmenting the overall robustness and precision of the system. Concurrently, a tightly-coupled front-end is created, integrating data from the 2D LiDAR, IMU, and wheel odometry, thus enabling real-time state estimation. Building upon this solid foundation, a novel global feature point matching-based loop closure detection algorithm is proposed. This algorithm proves highly effective in mitigating front-end accumulated errors and ultimately constructs a globally consistent map. The experimental results indicate that our system fully meets real-time requirements. When compared to Cartographer, our system not only exhibits lower trajectory errors but also demonstrates stronger robustness, particularly in degeneracy problem.
由于预算限制,室内导航通常采用2D LiDAR而不是3D LiDAR。然而,在同时定位与映射(SLAM)中使用2D LiDAR会经常遇到与运动退化相关的挑战,特别是在几何相似的环境中。为解决这个问题,本文提出了一种专为室内移动机器人设计的 robust、accurate、多传感器融合的2D LiDAR SLAM系统。首先,对原始LiDAR数据进行详细的处理,通过点线提取建立线线约束。利用室内环境的独特特点,建立线线约束以补充其他传感器数据,从而增强系统的整体稳健性和精度。同时,创建了一个紧密耦合的前端,将来自2D LiDAR、IMU和轮径测量的数据进行整合,从而实现实时状态估计。在此基础上,提出了一种基于全局特征点匹配的环闭检测算法。该算法在减轻前端累积误差方面表现出高度的有效性,并最终构建了一个全局一致的地图。实验结果表明,我们的系统完全满足实时要求。与Cartographer相比,我们的系统不仅表现出较低的轨迹误差,而且在退化问题中表现出更强的稳健性。
https://arxiv.org/abs/2404.07644
We present a real-time LiDAR-Inertial-Camera SLAM system with 3D Gaussian Splatting as the mapping backend. Leveraging robust pose estimates from our LiDAR-Inertial-Camera odometry, Coco-LIC, an incremental photo-realistic mapping system is proposed in this paper. We initialize 3D Gaussians from colorized LiDAR points and optimize them using differentiable rendering powered by 3D Gaussian Splatting. Meticulously designed strategies are employed to incrementally expand the Gaussian map and adaptively control its density, ensuring high-quality mapping with real-time capability. Experiments conducted in diverse scenarios demonstrate the superior performance of our method compared to existing radiance-field-based SLAM systems.
我们提出了一个基于真实时间LiDAR-Inertial-Camera SLAM系统,其3D高斯平铺作为映射后端。利用来自LiDAR-Inertial-Camera odometry的稳健姿态估计,我们提出了一个逐步照片现实主义的映射系统——Coco-LIC。我们使用由彩色LiDAR点初始化的3D高斯,并使用基于3D高斯平铺的差分渲染进行优化。我们采用精心设计的策略来逐步扩展高斯图,并适当地控制其密度,以确保具有实时能力的高质量映射。在不同的场景中进行实验,证明了我们的方法与现有基于辐射场SLAM系统的优越性能相比具有更高的性能。
https://arxiv.org/abs/2404.06926
Robust integration of physical knowledge and data is key to improve computational simulations, such as Earth system models. Data assimilation is crucial for achieving this goal because it provides a systematic framework to calibrate model outputs with observations, which can include remote sensing imagery and ground station measurements, with uncertainty quantification. Conventional methods, including Kalman filters and variational approaches, inherently rely on simplifying linear and Gaussian assumptions, and can be computationally expensive. Nevertheless, with the rapid adoption of data-driven methods in many areas of computational sciences, we see the potential of emulating traditional data assimilation with deep learning, especially generative models. In particular, the diffusion-based probabilistic framework has large overlaps with data assimilation principles: both allows for conditional generation of samples with a Bayesian inverse framework. These models have shown remarkable success in text-conditioned image generation or image-controlled video synthesis. Likewise, one can frame data assimilation as observation-conditioned state calibration. In this work, we propose SLAMS: Score-based Latent Assimilation in Multimodal Setting. Specifically, we assimilate in-situ weather station data and ex-situ satellite imagery to calibrate the vertical temperature profiles, globally. Through extensive ablation, we demonstrate that SLAMS is robust even in low-resolution, noisy, and sparse data settings. To our knowledge, our work is the first to apply deep generative framework for multimodal data assimilation using real-world datasets; an important step for building robust computational simulators, including the next-generation Earth system models. Our code is available at: this https URL
Robust地将物理知识和数据集成是提高计算模拟的关键,例如地球系统模型。数据同化对于实现这一目标至关重要,因为它提供了一个系统方法来用观测数据校准模型输出,包括遥感和地面站测量数据,并计算不确定性。传统方法,包括Kalman滤波器和变分方法,本质上依赖于简化线性和 Gaussian 假设,并且计算代价较高。然而,随着数据驱动方法在许多计算科学领域的快速采用,我们看到了使用深度学习模仿传统数据同化前景的潜力,特别是生成模型。 特别是,扩散为基础的概率框架与数据同化原理有很多重叠:两者都允许使用贝叶斯反向框架条件生成样本。这些模型在文本条件图像生成或图像控制的视频合成方面取得了显著的成功。同样,可以将数据同化视为观测条件下的状态估计。 在这篇论文中,我们提出了SLAMS:基于分数的多元设置中的局部同化。具体来说,我们将现场气象站数据和外层卫星图像同化,以校准垂直温度剖面。通过广泛的消融,我们证明了SLAMS在低分辨率、嘈杂和稀疏数据环境中也具有鲁棒性。据我们所知,这是第一个将深度生成框架应用于真实世界数据的多模态数据同化;对于构建稳健的计算模拟器(包括下一代地球系统模型)来说,这是一个重要的进展。 我们的代码可在此处下载:https:// this URL
https://arxiv.org/abs/2404.06665
Within colorectal cancer diagnostics, conventional colonoscopy techniques face critical limitations, including a limited field of view and a lack of depth information, which can impede the detection of precancerous lesions. Current methods struggle to provide comprehensive and accurate 3D reconstructions of the colonic surface which can help minimize the missing regions and reinspection for pre-cancerous polyps. Addressing this, we introduce 'Gaussian Pancakes', a method that leverages 3D Gaussian Splatting (3D GS) combined with a Recurrent Neural Network-based Simultaneous Localization and Mapping (RNNSLAM) system. By introducing geometric and depth regularization into the 3D GS framework, our approach ensures more accurate alignment of Gaussians with the colon surface, resulting in smoother 3D reconstructions with novel viewing of detailed textures and structures. Evaluations across three diverse datasets show that Gaussian Pancakes enhances novel view synthesis quality, surpassing current leading methods with a 18% boost in PSNR and a 16% improvement in SSIM. It also delivers over 100X faster rendering and more than 10X shorter training times, making it a practical tool for real-time applications. Hence, this holds promise for achieving clinical translation for better detection and diagnosis of colorectal cancer.
在直肠癌诊断中,传统的结肠镜检查技术面临着关键的限制,包括视野有限和深度信息缺乏,这可能阻碍了癌前病变的检测。目前的 methods 很难提供完整的和准确的 3D 结肠表面重建,这可以帮助最小化遗漏区域和重新评估癌前结肠癌。为了解决这个问题,我们引入了“Gaussian Pancakes”方法,这是一种利用 3D Gaussian Splatting(3D GS)与基于循环神经网络的同时定位与映射(RNNSLAM)系统相结合的方法。通过将几何和深度规范化到 3D GS 框架中,我们的方法确保了 Gaussian 与结肠表面的更准确对齐,从而实现了更平滑的 3D 重建,并重新观察到了详细纹理和结构的全新视角。在三个不同的数据集上的评估显示,Gaussian Pancakes 提高了新颖视角合成质量,超过现有领先方法,PSNR 提高了 18%,SSIM 提高了 16%。它还实现了超过 100X 的快速渲染和超过 10X 的训练时间,使得它成为实时应用的实用工具。因此,这有望为实现临床转化和改进直肠癌的检测和诊断带来希望。
https://arxiv.org/abs/2404.06128
In addition to environmental perception sensors such as cameras, radars, etc. in the automatic driving system, the external environment of the vehicle is perceived, in fact, there is also a perception sensor that has been silently dedicated in the system, that is, the positioning module. This paper explores the application of SLAM (Simultaneous Localization and Mapping) technology in the context of automatic lane change behavior prediction and environment perception for autonomous vehicles. It discusses the limitations of traditional positioning methods, introduces SLAM technology, and compares LIDAR SLAM with visual SLAM. Real-world examples from companies like Tesla, Waymo, and Mobileye showcase the integration of AI-driven technologies, sensor fusion, and SLAM in autonomous driving systems. The paper then delves into the specifics of SLAM algorithms, sensor technologies, and the importance of automatic lane changes in driving safety and efficiency. It highlights Tesla's recent update to its Autopilot system, which incorporates automatic lane change functionality using SLAM technology. The paper concludes by emphasizing the crucial role of SLAM in enabling accurate environment perception, positioning, and decision-making for autonomous vehicles, ultimately enhancing safety and driving experience.
除了自动驾驶系统中的环境感知传感器(如摄像头、雷达等)外,还感知车辆外部的环境,实际上,系统中还有一个静默安装的感知传感器,即定位模块。本文探讨了在自动驾驶车辆中应用SLAM(同时定位与映射)技术的应用,特别是在自动变道行为预测和环境感知方面。它讨论了传统定位方法的局限性,介绍了SLAM技术,并比较了LIDAR SLAM与视觉SLAM。特斯拉、Waymo和Mobileye等公司的实际案例展示了AI驱动技术、传感器融合和SLAM在自动驾驶系统中的应用。接着,文章深入探讨了SLAM算法的具体细节、传感器技术以及自动变道在驾驶安全与效率中的重要性。最后,文章强调了SLAM在使自动驾驶车辆准确感知环境、定位和做出决策方面的重要性,从而提高了安全性和驾驶体验。
https://arxiv.org/abs/2404.04492
Enabling robots to understand the world in terms of objects is a critical building block towards higher level autonomy. The success of foundation models in vision has created the ability to segment and identify nearly all objects in the world. However, utilizing such objects to localize the robot and build an open-set semantic map of the world remains an open research question. In this work, a system of identifying, localizing, and encoding objects is tightly coupled with probabilistic graphical models for performing open-set semantic simultaneous localization and mapping (SLAM). Results are presented demonstrating that the proposed lightweight object encoding can be used to perform more accurate object-based SLAM than existing open-set methods, closed-set methods, and geometric methods while incurring a lower computational overhead than existing open-set mapping methods.
使机器理解世界的对象是实现更高层次自主的关键构建模块。基础模型在视觉上的成功已经使人们能够将近世界中的几乎所有物体进行分割和识别。然而,将这样的物体用于定位机器人并构建一个开放集语义图仍然是一个开放的研究问题。在这项工作中,一个用于识别、定位和编码物体的系统与用于进行开放集语义同时定位和映射(SLAM)的概率图模型紧密耦合。结果表明,与现有开放集方法和封闭集方法相比,所提出的轻量级物体编码可以在更准确的基础上进行物体为基础的SLAM,同时计算开销更低。
https://arxiv.org/abs/2404.04377
Imaging radar is an emerging sensor modality in the context of Localization and Mapping (SLAM), especially suitable for vision-obstructed environments. This article investigates the use of 4D imaging radars for SLAM and analyzes the challenges in robust loop closure. Previous work indicates that 4D radars, together with inertial measurements, offer ample information for accurate odometry estimation. However, the low field of view, limited resolution, and sparse and noisy measurements render loop closure a significantly more challenging problem. Our work builds on the previous work - TBV SLAM - which was proposed for robust loop closure with 360$^\circ$ spinning radars. This article highlights and addresses challenges inherited from a directional 4D radar, such as sparsity, noise, and reduced field of view, and discusses why the common definition of a loop closure is unsuitable. By combining multiple quality measures for accurate loop closure detection adapted to 4D radar data, significant results in trajectory estimation are achieved; the absolute trajectory error is as low as 0.46 m over a distance of 1.8 km, with consistent operation over multiple environments.
成像雷达是在定位与导航(SLAM)场景中新兴的传感器模块,特别是适用于视觉受限的环境。本文研究了使用4D成像雷达进行SLAM,并分析了稳健环闭合的挑战。之前的工作表明,4D雷达与惯性测量相结合可以提供足够的信息来进行精确的测距估计。然而,低视场、有限分辨率以及稀疏和嘈杂的测量使环闭合变得更具挑战性。本文的工作是在之前的工作基础上 - TBV SLAM - 提出的,该工作提出了使用360$^\circ$旋转雷达进行稳健环闭合。本文突出了并解决了从方向性4D雷达中继承的挑战,如稀疏性、噪声和视场限制,并讨论了为什么常见的环闭合定义不适合。通过将多种4D雷达数据的质量度量适应该功能的环闭合检测,获得了轨迹估计的显著结果;绝对轨迹误差在1.8公里距离下为0.46米,在多个环境下具有稳定的操作。
https://arxiv.org/abs/2404.03940
This paper explores the integration of linguistic inputs within robotic navigation systems, drawing upon the symbol interdependency hypothesis to bridge the divide between symbolic and embodied cognition. It examines previous work incorporating language and semantics into Neural Network (NN) and Simultaneous Localization and Mapping (SLAM) approaches, highlighting how these integrations have advanced the field. By contrasting abstract symbol manipulation with sensory-motor grounding, we propose a unified framework where language functions both as an abstract communicative system and as a grounded representation of perceptual experiences. Our review of cognitive models of distributional semantics and their application to autonomous agents underscores the transformative potential of language-integrated systems.
本文探讨了在机器人导航系统中集成语言输入的问题,并借鉴符号互依性假设来弥合符号和身体认知之间的分歧。它回顾了将语言和语义融入神经网络(NN)和同时定位与映射(SLAM)方法中的先驱工作,并强调了这些整合如何推动该领域的发展。通过将抽象符号操作与感知-运动 groundeding 相比较,我们提出了一个统一框架,其中语言既作为抽象交流系统,又作为感知经历的 grounded 表示。我们对分布式语义模型的认知模型及其应用于自主机器人的回顾强调了语言集成系统的变革潜力。
https://arxiv.org/abs/2404.03049