We present SLAIM - Simultaneous Localization and Implicit Mapping. We propose a novel coarse-to-fine tracking model tailored for Neural Radiance Field SLAM (NeRF-SLAM) to achieve state-of-the-art tracking performance. Notably, existing NeRF-SLAM systems consistently exhibit inferior tracking performance compared to traditional SLAM algorithms. NeRF-SLAM methods solve camera tracking via image alignment and photometric bundle-adjustment. Such optimization processes are difficult to optimize due to the narrow basin of attraction of the optimization loss in image space (local minima) and the lack of initial correspondences. We mitigate these limitations by implementing a Gaussian pyramid filter on top of NeRF, facilitating a coarse-to-fine tracking optimization strategy. Furthermore, NeRF systems encounter challenges in converging to the right geometry with limited input views. While prior approaches use a Signed-Distance Function (SDF)-based NeRF and directly supervise SDF values by approximating ground truth SDF through depth measurements, this often results in suboptimal geometry. In contrast, our method employs a volume density representation and introduces a novel KL regularizer on the ray termination distribution, constraining scene geometry to consist of empty space and opaque surfaces. Our solution implements both local and global bundle-adjustment to produce a robust (coarse-to-fine) and accurate (KL regularizer) SLAM solution. We conduct experiments on multiple datasets (ScanNet, TUM, Replica) showing state-of-the-art results in tracking and in reconstruction accuracy.
我们提出了SLAIM - 同时定位和隐式映射。我们针对Neural Radiance Field SLAM(NeRF-SLAM)提出了一种新颖的粗-到细跟踪模型,以实现最先进的跟踪性能。值得注意的是,现有的NeRF-SLAM系统与传统SLAM算法相比,跟踪性能 consistently较差。通过图像对齐和光度 bundle-adjustment,NeRF-SLAM方法通过优化损失函数在图像空间中的狭窄吸引域(局部最小值)来解决相机跟踪问题。由于优化损失函数在图像空间中的狭窄吸引域和缺乏初始对应关系,这种优化过程很难优化。通过在NeRF上实现高斯金字塔滤波器,我们通过促进粗-到细跟踪优化策略来缓解这些限制。此外,NeRF系统在有限输入视图下难以收敛到正确的几何形状。虽然之前的 approaches 使用基于 Signed-Distance Function (SDF) 的NeRF并直接通过深度测量通过近似地面真实 SDF 来指导SDF值,但通常会导致次优的几何形状。相比之下,我们的方法采用体积密度表示,并在光线终止分布上引入了一种新颖的KL正则化器,将场景几何限制为空旷空间和透明表面。我们的解决方案同时实现局部和全局束调整,产生一个稳健(粗-到细)和准确的SLAM解决方案。我们在多个数据集(ScanNet,TUM,Replica)上进行实验,展示了在跟踪和重建精度方面的最先进结果。
https://arxiv.org/abs/2404.11419
This paper presents a vision and perception research dataset collected in Rome, featuring RGB data, 3D point clouds, IMU, and GPS data. We introduce a new benchmark targeting visual odometry and SLAM, to advance the research in autonomous robotics and computer vision. This work complements existing datasets by simultaneously addressing several issues, such as environment diversity, motion patterns, and sensor frequency. It uses up-to-date devices and presents effective procedures to accurately calibrate the intrinsic and extrinsic of the sensors while addressing temporal synchronization. During recording, we cover multi-floor buildings, gardens, urban and highway scenarios. Combining handheld and car-based data collections, our setup can simulate any robot (quadrupeds, quadrotors, autonomous vehicles). The dataset includes an accurate 6-dof ground truth based on a novel methodology that refines the RTK-GPS estimate with LiDAR point clouds through Bundle Adjustment. All sequences divided in training and testing are accessible through our website.
本文介绍了一个收集自罗马的数据集,包括RGB数据、3D点云、IMU和GPS数据。我们提出了一个新的基准,针对视觉导航和SLAM,以促进自主机器人和计算机视觉的研究。这项工作通过同时解决多个问题,如环境多样性、运动模式和传感器频率,补充了现有的数据集。它使用最先进的设备,并提供了准确校准传感器固有和外在参数的有效方法,同时解决时间同步问题。在记录过程中,我们覆盖了多层建筑、花园和城市和高速公路场景。结合手持和车载数据收集,我们的系统可以模拟任何机器人(四足、四旋翼、自动驾驶车辆)。该数据集基于一种新方法,通过利用激光雷达点云通过束调整对RTK-GPS估计进行优化。所有训练和测试序列都可以通过我们的网站访问。
https://arxiv.org/abs/2404.11322
Positioning is a prominent field of study, notably focusing on Visual Inertial Odometry (VIO) and Simultaneous Localization and Mapping (SLAM) methods. Despite their advancements, these methods often encounter dead-reckoning errors that leads to considerable drift in estimated platform motion especially during long traverses. In such cases, the drift error is not negligible and should be rectified. Our proposed approach minimizes the drift error by correcting the estimated motion generated by any SLAM method at each epoch. Our methodology treats positioning measurements rendered by the SLAM solution as random variables formulated jointly in a multivariate distribution. In this setting, The correction of the drift becomes equivalent to finding the mode of this multivariate distribution which jointly maximizes the likelihood of a set of relevant geo-spatial priors about the platform motion and environment. Our method is integrable into any SLAM/VIO method as an correction module. Our experimental results shows the effectiveness of our approach in minimizing the drift error by 10x in long treverses.
定位是一个突出的研究领域,特别是集中在视觉惯性导航(VIO)和同时定位与映射(SLAM)方法上。尽管这些方法取得了进步,但它们通常会遭遇死估计误差,导致在长时间穿越过程中估计平台运动的大幅偏差。在这种情况下,偏差误差不容忽视,应该得到纠正。我们提出的方法通过在每个时刻纠正由SLAM方法生成的估计运动来最小化偏差误差。我们的方法将定位测量由SLAM解决方案生成的随机变量视为多维分布中的随机变量。在这样一个设置中,偏差误差的纠正等同于找到这个多维分布中使关于平台运动和相关环境的几何先验的概率最大化的模式。我们的方法可以集成到任何SLAM/VIO方法中作为修正模块。我们的实验结果表明,通过我们的方法可以有效地将偏差误差降低10倍,在长穿越过程中。
https://arxiv.org/abs/2404.10140
Simultaneous Localization and Mapping systems are a key enabler for positioning in both handheld and robotic applications. The Hilti SLAM Challenges organized over the past years have been successful at benchmarking some of the world's best SLAM Systems with high accuracy. However, more capabilities of these systems are yet to be explored, such as platform agnosticism across varying sensor suites and multi-session SLAM. These factors indirectly serve as an indicator of robustness and ease of deployment in real-world applications. There exists no dataset plus benchmark combination publicly available, which considers these factors combined. The Hilti SLAM Challenge 2023 Dataset and Benchmark addresses this issue. Additionally, we propose a novel fiducial marker design for a pre-surveyed point on the ground to be observable from an off-the-shelf LiDAR mounted on a robot, and an algorithm to estimate its position at mm-level accuracy. Results from the challenge show an increase in overall participation, single-session SLAM systems getting increasingly accurate, successfully operating across varying sensor suites, but relatively few participants performing multi-session SLAM.
同时定位与映射系统是实现便携式和机器人应用的关键推动力。在过去的几年里,Hilti SLAM挑战赛已经成功地基准测试了世界上一些最好的SLAM系统,取得了高精度的成果。然而,这些系统还有许多功能有待探索,例如在各种传感器套件之间的平台无关性以及多会话SLAM。这些因素间接地表明了在现实应用中的稳健性和易用性。目前还没有公开可用的数据集和基准组合,同时考虑了这些因素。Hilti SLAM挑战2023数据集和基准解决了这个问题。此外,我们提出了一个新型的二维标记设计,用于从机器人上安装的便携式激光雷达上观察到预先调查的点的位置,并估计其达到毫米级的准确性。挑战赛的结果表明,总体参与度增加,单会话SLAM系统越来越精确,在各种传感器套件上成功运行,但相对较少参与者完成多会话SLAM。
https://arxiv.org/abs/2404.09765
This study presents a comprehensive multi-sensor dataset designed for 3D mapping in challenging indoor and outdoor environments. The dataset comprises data from infrared cameras, depth cameras, LiDAR, and 4D millimeter-wave radar, facilitating exploration of advanced perception and mapping techniques. Integration of diverse sensor data enhances perceptual capabilities in extreme conditions such as rain, snow, and uneven road surfaces. The dataset also includes interactive robot data at different speeds indoors and outdoors, providing a realistic background environment. Slam comparisons between similar routes are conducted, analyzing the influence of different complex scenes on various sensors. Various SLAM algorithms are employed to process the dataset, revealing performance differences among algorithms in different scenarios. In summary, this dataset addresses the problem of data scarcity in special environments, fostering the development of perception and mapping algorithms for extreme conditions. Leveraging multi-sensor data including infrared, depth cameras, LiDAR, 4D millimeter-wave radar, and robot interactions, the dataset advances intelligent mapping and perception capabilities.Our dataset is available at this https URL.
本研究旨在为具有挑战性的室内和室外环境3D建模创建一个全面的传感器数据集。数据集包括来自红外摄像机、深度相机、激光雷达和4D毫米波雷达的数据,促进了先进感知和建模技术的探索。 diverse传感器数据的集成增强了在极端条件下的感知能力,例如雨、雪和不平整的路面。数据集中的室内和室外交互式机器人数据以不同速度运行,提供了一个真实的背景环境。在类似路线的SLAM比较中进行了研究,分析了不同场景下各种复杂场景对各种传感器的影響。采用各种SLAM算法处理数据,揭示了不同情景中算法之间性能的差异。总之,这个数据集解决了在特殊环境中的数据稀缺问题,推动了为极端情况下的感知和建模算法的发展。通过利用包括红外、深度相机、激光雷达、4D毫米波雷达和机器人交互在内的多传感器数据,数据集提高了智能建模和感知能力。我们的数据集可在以下链接处获得:https://www.example.com。
https://arxiv.org/abs/2404.09622
Vision-based localization for autonomous driving has been of great interest among researchers. When a pre-built 3D map is not available, the techniques of visual simultaneous localization and mapping (SLAM) are typically adopted. Due to error accumulation, visual SLAM (vSLAM) usually suffers from long-term drift. This paper proposes a framework to increase the localization accuracy by fusing the vSLAM with a deep-learning-based ground-to-satellite (G2S) image registration method. In this framework, a coarse (spatial correlation bound check) to fine (visual odometry consistency check) method is designed to select the valid G2S prediction. The selected prediction is then fused with the SLAM measurement by solving a scaled pose graph problem. To further increase the localization accuracy, we provide an iterative trajectory fusion pipeline. The proposed framework is evaluated on two well-known autonomous driving datasets, and the results demonstrate the accuracy and robustness in terms of vehicle localization.
在自动驾驶中,基于视觉的局部定位一直引起了研究人员的极大兴趣。当预先构建的3D地图不可用时,通常采用视觉同时定位与映射(SLAM)技术。由于误差累积,通常会导致视觉SLAM(vSLAM)长期漂移。本文提出了一种通过融合vSLAM与基于深度学习的地面到卫星(G2S)图像配准方法来提高局部定位精度的框架。在这种框架中,设计了一种粗(空间关联边界检查)到细(视觉观测一致性检查)的方法来选择有效的G2S预测。选择后的预测通过求解标量姿态图问题与SLAM测量进行融合。为了进一步提高局部定位精度,我们提供了一个迭代轨迹融合管道。在两个著名的自动驾驶数据集上进行评估,结果显示在车辆定位方面具有准确性和稳健性。
https://arxiv.org/abs/2404.09169
Simultaneous Localization and Mapping (SLAM) technology has been widely applied in various robotic scenarios, from rescue operations to autonomous driving. However, the generalization of SLAM algorithms remains a significant challenge, as current datasets often lack scalability in terms of platforms and environments. To address this limitation, we present FusionPortableV2, a multi-sensor SLAM dataset featuring notable sensor diversity, varied motion patterns, and a wide range of environmental scenarios. Our dataset comprises $27$ sequences, spanning over $2.5$ hours and collected from four distinct platforms: a handheld suite, wheeled and legged robots, and vehicles. These sequences cover diverse settings, including buildings, campuses, and urban areas, with a total length of $38.7km$. Additionally, the dataset includes ground-truth (GT) trajectories and RGB point cloud maps covering approximately $0.3km^2$. To validate the utility of our dataset in advancing SLAM research, we assess several state-of-the-art (SOTA) SLAM algorithms. Furthermore, we demonstrate the dataset's broad applicability beyond traditional SLAM tasks by investigating its potential for monocular depth estimation. The complete dataset, including sensor data, GT, and calibration details, is accessible at this https URL.
同时定位与映射(SLAM)技术已经在各种机器人场景中得到了广泛应用,从救援行动到自动驾驶。然而,SLAM算法的泛化仍然是一个重要的挑战,因为当前的数据集在平台和环境方面缺乏可扩展性。为了解决这个问题,我们提出了FusionPortableV2,一个包含显著传感器多样性、多样运动模式和广泛环境场景的多传感器SLAM数据集。我们的数据集包括4个不同平台的27个序列,总长度超过2.5小时,来自四个不同的平台:手持设备套装、轮式和腿式机器人以及车辆。这些序列涵盖了各种场景,包括建筑物、校园和城市地区,总长度为38.7公里。此外,数据集还包括地面真实(GT)轨迹和覆盖约0.3平方公里的RGB点云地图。为了验证我们数据在推动SLAM研究方面的实用性,我们评估了几种最先进的(SOTA)SLAM算法。此外,我们通过研究其潜在的单目深度估计能力,证明了数据集在传统SLAM任务之外的广泛应用。完整的数据集,包括传感器数据、GT和校准细节,可以通过此链接访问:https://www.example.com/
https://arxiv.org/abs/2404.08563
Due to budgetary constraints, indoor navigation typically employs 2D LiDAR rather than 3D LiDAR. However, the utilization of 2D LiDAR in Simultaneous Localization And Mapping (SLAM) frequently encounters challenges related to motion degeneracy, particularly in geometrically similar environments. To address this problem, this paper proposes a robust, accurate, and multi-sensor-fused 2D LiDAR SLAM system specifically designed for indoor mobile robots. To commence, the original LiDAR data undergoes meticulous processing through point and line extraction. Leveraging the distinctive characteristics of indoor environments, line-line constraints are established to complement other sensor data effectively, thereby augmenting the overall robustness and precision of the system. Concurrently, a tightly-coupled front-end is created, integrating data from the 2D LiDAR, IMU, and wheel odometry, thus enabling real-time state estimation. Building upon this solid foundation, a novel global feature point matching-based loop closure detection algorithm is proposed. This algorithm proves highly effective in mitigating front-end accumulated errors and ultimately constructs a globally consistent map. The experimental results indicate that our system fully meets real-time requirements. When compared to Cartographer, our system not only exhibits lower trajectory errors but also demonstrates stronger robustness, particularly in degeneracy problem.
由于预算限制,室内导航通常采用2D LiDAR而不是3D LiDAR。然而,在同时定位与映射(SLAM)中使用2D LiDAR会经常遇到与运动退化相关的挑战,特别是在几何相似的环境中。为解决这个问题,本文提出了一种专为室内移动机器人设计的 robust、accurate、多传感器融合的2D LiDAR SLAM系统。首先,对原始LiDAR数据进行详细的处理,通过点线提取建立线线约束。利用室内环境的独特特点,建立线线约束以补充其他传感器数据,从而增强系统的整体稳健性和精度。同时,创建了一个紧密耦合的前端,将来自2D LiDAR、IMU和轮径测量的数据进行整合,从而实现实时状态估计。在此基础上,提出了一种基于全局特征点匹配的环闭检测算法。该算法在减轻前端累积误差方面表现出高度的有效性,并最终构建了一个全局一致的地图。实验结果表明,我们的系统完全满足实时要求。与Cartographer相比,我们的系统不仅表现出较低的轨迹误差,而且在退化问题中表现出更强的稳健性。
https://arxiv.org/abs/2404.07644
We present a real-time LiDAR-Inertial-Camera SLAM system with 3D Gaussian Splatting as the mapping backend. Leveraging robust pose estimates from our LiDAR-Inertial-Camera odometry, Coco-LIC, an incremental photo-realistic mapping system is proposed in this paper. We initialize 3D Gaussians from colorized LiDAR points and optimize them using differentiable rendering powered by 3D Gaussian Splatting. Meticulously designed strategies are employed to incrementally expand the Gaussian map and adaptively control its density, ensuring high-quality mapping with real-time capability. Experiments conducted in diverse scenarios demonstrate the superior performance of our method compared to existing radiance-field-based SLAM systems.
我们提出了一个基于真实时间LiDAR-Inertial-Camera SLAM系统,其3D高斯平铺作为映射后端。利用来自LiDAR-Inertial-Camera odometry的稳健姿态估计,我们提出了一个逐步照片现实主义的映射系统——Coco-LIC。我们使用由彩色LiDAR点初始化的3D高斯,并使用基于3D高斯平铺的差分渲染进行优化。我们采用精心设计的策略来逐步扩展高斯图,并适当地控制其密度,以确保具有实时能力的高质量映射。在不同的场景中进行实验,证明了我们的方法与现有基于辐射场SLAM系统的优越性能相比具有更高的性能。
https://arxiv.org/abs/2404.06926
Robust integration of physical knowledge and data is key to improve computational simulations, such as Earth system models. Data assimilation is crucial for achieving this goal because it provides a systematic framework to calibrate model outputs with observations, which can include remote sensing imagery and ground station measurements, with uncertainty quantification. Conventional methods, including Kalman filters and variational approaches, inherently rely on simplifying linear and Gaussian assumptions, and can be computationally expensive. Nevertheless, with the rapid adoption of data-driven methods in many areas of computational sciences, we see the potential of emulating traditional data assimilation with deep learning, especially generative models. In particular, the diffusion-based probabilistic framework has large overlaps with data assimilation principles: both allows for conditional generation of samples with a Bayesian inverse framework. These models have shown remarkable success in text-conditioned image generation or image-controlled video synthesis. Likewise, one can frame data assimilation as observation-conditioned state calibration. In this work, we propose SLAMS: Score-based Latent Assimilation in Multimodal Setting. Specifically, we assimilate in-situ weather station data and ex-situ satellite imagery to calibrate the vertical temperature profiles, globally. Through extensive ablation, we demonstrate that SLAMS is robust even in low-resolution, noisy, and sparse data settings. To our knowledge, our work is the first to apply deep generative framework for multimodal data assimilation using real-world datasets; an important step for building robust computational simulators, including the next-generation Earth system models. Our code is available at: this https URL
Robust地将物理知识和数据集成是提高计算模拟的关键,例如地球系统模型。数据同化对于实现这一目标至关重要,因为它提供了一个系统方法来用观测数据校准模型输出,包括遥感和地面站测量数据,并计算不确定性。传统方法,包括Kalman滤波器和变分方法,本质上依赖于简化线性和 Gaussian 假设,并且计算代价较高。然而,随着数据驱动方法在许多计算科学领域的快速采用,我们看到了使用深度学习模仿传统数据同化前景的潜力,特别是生成模型。 特别是,扩散为基础的概率框架与数据同化原理有很多重叠:两者都允许使用贝叶斯反向框架条件生成样本。这些模型在文本条件图像生成或图像控制的视频合成方面取得了显著的成功。同样,可以将数据同化视为观测条件下的状态估计。 在这篇论文中,我们提出了SLAMS:基于分数的多元设置中的局部同化。具体来说,我们将现场气象站数据和外层卫星图像同化,以校准垂直温度剖面。通过广泛的消融,我们证明了SLAMS在低分辨率、嘈杂和稀疏数据环境中也具有鲁棒性。据我们所知,这是第一个将深度生成框架应用于真实世界数据的多模态数据同化;对于构建稳健的计算模拟器(包括下一代地球系统模型)来说,这是一个重要的进展。 我们的代码可在此处下载:https:// this URL
https://arxiv.org/abs/2404.06665
Within colorectal cancer diagnostics, conventional colonoscopy techniques face critical limitations, including a limited field of view and a lack of depth information, which can impede the detection of precancerous lesions. Current methods struggle to provide comprehensive and accurate 3D reconstructions of the colonic surface which can help minimize the missing regions and reinspection for pre-cancerous polyps. Addressing this, we introduce 'Gaussian Pancakes', a method that leverages 3D Gaussian Splatting (3D GS) combined with a Recurrent Neural Network-based Simultaneous Localization and Mapping (RNNSLAM) system. By introducing geometric and depth regularization into the 3D GS framework, our approach ensures more accurate alignment of Gaussians with the colon surface, resulting in smoother 3D reconstructions with novel viewing of detailed textures and structures. Evaluations across three diverse datasets show that Gaussian Pancakes enhances novel view synthesis quality, surpassing current leading methods with a 18% boost in PSNR and a 16% improvement in SSIM. It also delivers over 100X faster rendering and more than 10X shorter training times, making it a practical tool for real-time applications. Hence, this holds promise for achieving clinical translation for better detection and diagnosis of colorectal cancer.
在直肠癌诊断中,传统的结肠镜检查技术面临着关键的限制,包括视野有限和深度信息缺乏,这可能阻碍了癌前病变的检测。目前的 methods 很难提供完整的和准确的 3D 结肠表面重建,这可以帮助最小化遗漏区域和重新评估癌前结肠癌。为了解决这个问题,我们引入了“Gaussian Pancakes”方法,这是一种利用 3D Gaussian Splatting(3D GS)与基于循环神经网络的同时定位与映射(RNNSLAM)系统相结合的方法。通过将几何和深度规范化到 3D GS 框架中,我们的方法确保了 Gaussian 与结肠表面的更准确对齐,从而实现了更平滑的 3D 重建,并重新观察到了详细纹理和结构的全新视角。在三个不同的数据集上的评估显示,Gaussian Pancakes 提高了新颖视角合成质量,超过现有领先方法,PSNR 提高了 18%,SSIM 提高了 16%。它还实现了超过 100X 的快速渲染和超过 10X 的训练时间,使得它成为实时应用的实用工具。因此,这有望为实现临床转化和改进直肠癌的检测和诊断带来希望。
https://arxiv.org/abs/2404.06128
In addition to environmental perception sensors such as cameras, radars, etc. in the automatic driving system, the external environment of the vehicle is perceived, in fact, there is also a perception sensor that has been silently dedicated in the system, that is, the positioning module. This paper explores the application of SLAM (Simultaneous Localization and Mapping) technology in the context of automatic lane change behavior prediction and environment perception for autonomous vehicles. It discusses the limitations of traditional positioning methods, introduces SLAM technology, and compares LIDAR SLAM with visual SLAM. Real-world examples from companies like Tesla, Waymo, and Mobileye showcase the integration of AI-driven technologies, sensor fusion, and SLAM in autonomous driving systems. The paper then delves into the specifics of SLAM algorithms, sensor technologies, and the importance of automatic lane changes in driving safety and efficiency. It highlights Tesla's recent update to its Autopilot system, which incorporates automatic lane change functionality using SLAM technology. The paper concludes by emphasizing the crucial role of SLAM in enabling accurate environment perception, positioning, and decision-making for autonomous vehicles, ultimately enhancing safety and driving experience.
除了自动驾驶系统中的环境感知传感器(如摄像头、雷达等)外,还感知车辆外部的环境,实际上,系统中还有一个静默安装的感知传感器,即定位模块。本文探讨了在自动驾驶车辆中应用SLAM(同时定位与映射)技术的应用,特别是在自动变道行为预测和环境感知方面。它讨论了传统定位方法的局限性,介绍了SLAM技术,并比较了LIDAR SLAM与视觉SLAM。特斯拉、Waymo和Mobileye等公司的实际案例展示了AI驱动技术、传感器融合和SLAM在自动驾驶系统中的应用。接着,文章深入探讨了SLAM算法的具体细节、传感器技术以及自动变道在驾驶安全与效率中的重要性。最后,文章强调了SLAM在使自动驾驶车辆准确感知环境、定位和做出决策方面的重要性,从而提高了安全性和驾驶体验。
https://arxiv.org/abs/2404.04492
Enabling robots to understand the world in terms of objects is a critical building block towards higher level autonomy. The success of foundation models in vision has created the ability to segment and identify nearly all objects in the world. However, utilizing such objects to localize the robot and build an open-set semantic map of the world remains an open research question. In this work, a system of identifying, localizing, and encoding objects is tightly coupled with probabilistic graphical models for performing open-set semantic simultaneous localization and mapping (SLAM). Results are presented demonstrating that the proposed lightweight object encoding can be used to perform more accurate object-based SLAM than existing open-set methods, closed-set methods, and geometric methods while incurring a lower computational overhead than existing open-set mapping methods.
使机器理解世界的对象是实现更高层次自主的关键构建模块。基础模型在视觉上的成功已经使人们能够将近世界中的几乎所有物体进行分割和识别。然而,将这样的物体用于定位机器人并构建一个开放集语义图仍然是一个开放的研究问题。在这项工作中,一个用于识别、定位和编码物体的系统与用于进行开放集语义同时定位和映射(SLAM)的概率图模型紧密耦合。结果表明,与现有开放集方法和封闭集方法相比,所提出的轻量级物体编码可以在更准确的基础上进行物体为基础的SLAM,同时计算开销更低。
https://arxiv.org/abs/2404.04377
Imaging radar is an emerging sensor modality in the context of Localization and Mapping (SLAM), especially suitable for vision-obstructed environments. This article investigates the use of 4D imaging radars for SLAM and analyzes the challenges in robust loop closure. Previous work indicates that 4D radars, together with inertial measurements, offer ample information for accurate odometry estimation. However, the low field of view, limited resolution, and sparse and noisy measurements render loop closure a significantly more challenging problem. Our work builds on the previous work - TBV SLAM - which was proposed for robust loop closure with 360$^\circ$ spinning radars. This article highlights and addresses challenges inherited from a directional 4D radar, such as sparsity, noise, and reduced field of view, and discusses why the common definition of a loop closure is unsuitable. By combining multiple quality measures for accurate loop closure detection adapted to 4D radar data, significant results in trajectory estimation are achieved; the absolute trajectory error is as low as 0.46 m over a distance of 1.8 km, with consistent operation over multiple environments.
成像雷达是在定位与导航(SLAM)场景中新兴的传感器模块,特别是适用于视觉受限的环境。本文研究了使用4D成像雷达进行SLAM,并分析了稳健环闭合的挑战。之前的工作表明,4D雷达与惯性测量相结合可以提供足够的信息来进行精确的测距估计。然而,低视场、有限分辨率以及稀疏和嘈杂的测量使环闭合变得更具挑战性。本文的工作是在之前的工作基础上 - TBV SLAM - 提出的,该工作提出了使用360$^\circ$旋转雷达进行稳健环闭合。本文突出了并解决了从方向性4D雷达中继承的挑战,如稀疏性、噪声和视场限制,并讨论了为什么常见的环闭合定义不适合。通过将多种4D雷达数据的质量度量适应该功能的环闭合检测,获得了轨迹估计的显著结果;绝对轨迹误差在1.8公里距离下为0.46米,在多个环境下具有稳定的操作。
https://arxiv.org/abs/2404.03940
This paper explores the integration of linguistic inputs within robotic navigation systems, drawing upon the symbol interdependency hypothesis to bridge the divide between symbolic and embodied cognition. It examines previous work incorporating language and semantics into Neural Network (NN) and Simultaneous Localization and Mapping (SLAM) approaches, highlighting how these integrations have advanced the field. By contrasting abstract symbol manipulation with sensory-motor grounding, we propose a unified framework where language functions both as an abstract communicative system and as a grounded representation of perceptual experiences. Our review of cognitive models of distributional semantics and their application to autonomous agents underscores the transformative potential of language-integrated systems.
本文探讨了在机器人导航系统中集成语言输入的问题,并借鉴符号互依性假设来弥合符号和身体认知之间的分歧。它回顾了将语言和语义融入神经网络(NN)和同时定位与映射(SLAM)方法中的先驱工作,并强调了这些整合如何推动该领域的发展。通过将抽象符号操作与感知-运动 groundeding 相比较,我们提出了一个统一框架,其中语言既作为抽象交流系统,又作为感知经历的 grounded 表示。我们对分布式语义模型的认知模型及其应用于自主机器人的回顾强调了语言集成系统的变革潜力。
https://arxiv.org/abs/2404.03049
Simultaneous localization and mapping is essential for position tracking and scene understanding. 3D Gaussian-based map representations enable photorealistic reconstruction and real-time rendering of scenes using multiple posed cameras. We show for the first time that using 3D Gaussians for map representation with unposed camera images and inertial measurements can enable accurate SLAM. Our method, MM3DGS, addresses the limitations of prior neural radiance field-based representations by enabling faster rendering, scale awareness, and improved trajectory tracking. Our framework enables keyframe-based mapping and tracking utilizing loss functions that incorporate relative pose transformations from pre-integrated inertial measurements, depth estimates, and measures of photometric rendering quality. We also release a multi-modal dataset, UT-MM, collected from a mobile robot equipped with a camera and an inertial measurement unit. Experimental evaluation on several scenes from the dataset shows that MM3DGS achieves 3x improvement in tracking and 5% improvement in photometric rendering quality compared to the current 3DGS SLAM state-of-the-art, while allowing real-time rendering of a high-resolution dense 3D map. Project Webpage: this https URL
同时定位和映射对于位置跟踪和场景理解是至关重要的。基于3D高斯的高质量地图表示允许使用多个姿态相机进行照片现实主义重建和实时渲染场景。我们首次证明了使用带有未姿态相机图像和惯性测量的3D高斯进行地图表示可以实现准确的位置跟踪。我们的方法MM3DGS通过使渲染更快、具有更好的缩放感知和轨迹跟踪解决了先前的神经辐射场表示器的限制。我们的框架通过使用包含预积分惯性测量、深度估计和光量渲染质量测量的相对姿态变换的损失函数实现了基于关键帧的映射和跟踪。我们还发布了来自配备摄像头和惯性测量单元的移动机器人上的多模态数据集UT-MM。实验评估了数据集中的多个场景,结果显示MM3DGS在跟踪和光量渲染质量方面比当前3DGS SLAM状态提高了3倍,同时允许对高分辨率密集3D地图进行实时渲染。网站链接:https:// this URL
https://arxiv.org/abs/2404.00923
In recent years, there have been significant advancements in 3D reconstruction and dense RGB-D SLAM systems. One notable development is the application of Neural Radiance Fields (NeRF) in these systems, which utilizes implicit neural representation to encode 3D scenes. This extension of NeRF to SLAM has shown promising results. However, the depth images obtained from consumer-grade RGB-D sensors are often sparse and noisy, which poses significant challenges for 3D reconstruction and affects the accuracy of the representation of the scene geometry. Moreover, the original hierarchical feature grid with occupancy value is inaccurate for scene geometry representation. Furthermore, the existing methods select random pixels for camera tracking, which leads to inaccurate localization and is not robust in real-world indoor environments. To this end, we present NeSLAM, an advanced framework that achieves accurate and dense depth estimation, robust camera tracking, and realistic synthesis of novel views. First, a depth completion and denoising network is designed to provide dense geometry prior and guide the neural implicit representation optimization. Second, the occupancy scene representation is replaced with Signed Distance Field (SDF) hierarchical scene representation for high-quality reconstruction and view synthesis. Furthermore, we also propose a NeRF-based self-supervised feature tracking algorithm for robust real-time tracking. Experiments on various indoor datasets demonstrate the effectiveness and accuracy of the system in reconstruction, tracking quality, and novel view synthesis.
近年来,3D建模和密集RGB-D SLAM系统取得了显著的进展。一个值得注意的是,将神经辐射场(NeRF)应用于这些系统,利用隐式神经表示来编码3D场景。这种将NeRF扩展到SLAM系统的方法已经取得了良好的效果。然而,消费级RGB-D传感器获得的深度图像通常稀疏且噪声严重,这使得3D建模和场景几何表示的准确性面临重大挑战。此外,原始分层特征网格的占有值值也不准确,并且现有的方法选择随机像素进行相机跟踪,导致不准确的局部定位,在现实世界的室内环境中也不够稳健。因此,我们提出了NeSLAM,一种实现准确和密集深度估计、稳健相机跟踪和真实场景合成的高性能框架。首先,设计了一个深度完成和去噪网络,以提供丰富的几何信息并指导神经隐式表示优化。其次,用符号距离场(SDF)层次场景表示来替代填充场景表示,以实现高质量的重构和视图合成。此外,我们还提出了一种基于NeRF的自监督特征跟踪算法,用于实时跟踪。在各种室内数据集上进行的实验证明了这个系统在建模、跟踪质量和场景合成方面的有效性和准确性。
https://arxiv.org/abs/2403.20034
Multi-camera SLAM systems offer a plethora of advantages, primarily stemming from their capacity to amalgamate information from a broader field of view, thereby resulting in heightened robustness and improved localization accuracy. In this research, we present a significant extension and refinement of the state-of-the-art stereo SLAM system, known as ORB-SLAM2, with the objective of attaining even higher this http URL accomplish this objective, we commence by mapping measurements from all cameras onto a virtual camera termed BundledFrame. This virtual camera is meticulously engineered to seamlessly adapt to multi-camera configurations, facilitating the effective fusion of data captured from multiple cameras. Additionally, we harness extrinsic parameters in the bundle adjustment (BA) process to achieve precise trajectory estimation.Furthermore, we conduct an extensive analysis of the role of bundle adjustment (BA) in the context of multi-camera scenarios, delving into its impact on tracking, local mapping, and global optimization. Our experimental evaluation entails comprehensive comparisons between ground truth data and the state-of-the-art SLAM system. To rigorously assess the system's performance, we utilize the EuRoC datasets. The consistent results of our evaluations demonstrate the superior accuracy of our system in comparison to existing approaches.
多摄像头SLAM系统具有诸多优势,主要源于其能够整合更广泛视角下的信息,从而实现更高的稳健性和更准确的局部定位精度。在本文中,我们提出了对最先进的立体SLAM系统ORB-SLAM2的重大拓展和改进,旨在实现更高的性能,我们在开始时将来自所有摄像头的测量结果映射到一个虚拟相机,称之为BundledFrame。这个虚拟相机精心设计,以无缝适应多种摄像机配置,促进多个摄像头捕获到的数据的有效融合。此外,我们还利用 bundle adjustment(BA)过程中提取的外部参数来实现精确的运动轨迹估计。 此外,我们对多摄像头场景中 bundle adjustment(BA)的作用进行了深入分析,探讨了其对跟踪、局部建模和全局优化的影响。我们的实验评估是对真实数据和最先进的SLAM系统之间的全面比较。为了严谨评估系统的性能,我们使用了EuRoC数据集。评估结果表明,我们的系统的性能优于现有方法。
https://arxiv.org/abs/2403.19886
Visual SLAM with thermal imagery, and other low contrast visually degraded environments such as underwater, or in areas dominated by snow and ice, remain a difficult problem for many state of the art (SOTA) algorithms. In addition to challenging front-end data association, thermal imagery presents an additional difficulty for long term relocalization and map reuse. The relative temperatures of objects in thermal imagery change dramatically from day to night. Feature descriptors typically used for relocalization in SLAM are unable to maintain consistency over these diurnal changes. We show that learned feature descriptors can be used within existing Bag of Word based localization schemes to dramatically improve place recognition across large temporal gaps in thermal imagery. In order to demonstrate the effectiveness of our trained vocabulary, we have developed a baseline SLAM system, integrating learned features and matching into a classical SLAM algorithm. Our system demonstrates good local tracking on challenging thermal imagery, and relocalization that overcomes dramatic day to night thermal appearance changes. Our code and datasets are available here: this https URL
视觉SLAM与热成像以及其他低对比视觉降噪环境(如水下或被雪和冰主导的区域) remains a difficult problem for many state-of-the-art (SOTA) algorithms. 除了具有挑战性的前端数据关联之外,热成像还提出了长期重定位和地图重用的问题。热成像中物体的相对温度从白天到黑夜急剧变化。用于SLAM中进行重定位的特征描述符通常无法维持这些日间变化的一致性。我们证明了学习到的特征描述符可以用于现有的基于Bag of Word的定位方案,以显著改善大时间间隔热成像中place recognition。为了证明我们训练词汇的有效性,我们开发了一个基于学习特征和匹配的经典SLAM系统。我们的系统在具有挑战性的热成像上表现出良好的局部跟踪能力,并克服了白天到黑夜热成像显著变化的 relocalization。我们的代码和数据集都可以在这里找到:这个链接
https://arxiv.org/abs/2403.19885
Simultaneous localization and mapping (SLAM) is a critical capability in autonomous navigation, but memory and computational limits make long-term application of common SLAM techniques impractical; a robot must be able to determine what information should be retained and what can safely be forgotten. In graph-based SLAM, the number of edges (measurements) in a pose graph determines both the memory requirements of storing a robot's observations and the computational expense of algorithms deployed for performing state estimation using those observations, both of which can grow unbounded during long-term navigation. Motivated by these challenges, we propose a new general purpose approach to sparsify graphs in a manner that maximizes algebraic connectivity, a key spectral property of graphs which has been shown to control the estimation error of pose graph SLAM solutions. Our algorithm, MAC (for maximizing algebraic connectivity), is simple and computationally inexpensive, and admits formal post hoc performance guarantees on the quality of the solution that it provides. In application to the problem of pose-graph SLAM, we show on several benchmark datasets that our approach quickly produces high-quality sparsification results which retain the connectivity of the graph and, in turn, the quality of corresponding SLAM solutions.
同时定位与映射(SLAM)是自动驾驶中的关键能力,但内存和计算能力的限制使得长期应用常见的SLAM技术变得不可行;机器人必须能够确定应该保留哪些信息以及可以安全忘记哪些信息。在基于图的SLAM中,姿态图中的边数(测量)决定了存储机器人观测信息的内存需求以及使用这些观测信息进行状态估计的算法的计算开销,两者在长期导航过程中都可能无限制增长。为了应对这些挑战,我们提出了一种新的通用方法来稀疏图,以最大程度地增加图的 algebraic 连接性,这是图的一个重要特征,已被证明可以控制姿态图SLAM解决方案的估计误差。我们的算法MAC(用于最大化 algebraic connectivity)简单且计算成本低,并且对所提供解决方案的质量具有正式的后续性能保证。在应用到姿态图SLAM问题中,我们在多个基准数据集上证明了我们的方法可以快速产生高质量稀疏化结果,保留图的连接性,并进而保留相应的SLAM解决方案的质量。
https://arxiv.org/abs/2403.19879