Simultaneous localization and mapping (SLAM) is a critical capability for autonomous systems. Traditional SLAM approaches, which often rely on visual or LiDAR sensors, face significant challenges in adverse conditions such as low light or featureless environments. To overcome these limitations, we propose a novel Doppler-aided radar-inertial and LiDAR-inertial SLAM framework that leverages the complementary strengths of 4D radar, FMCW LiDAR, and inertial measurement units. Our system integrates Doppler velocity measurements and spatial data into a tightly-coupled front-end and graph optimization back-end to provide enhanced ego velocity estimation, accurate odometry, and robust mapping. We also introduce a Doppler-based scan-matching technique to improve front-end odometry in dynamic environments. In addition, our framework incorporates an innovative online extrinsic calibration mechanism, utilizing Doppler velocity and loop closure to dynamically maintain sensor alignment. Extensive evaluations on both public and proprietary datasets show that our system significantly outperforms state-of-the-art radar-SLAM and LiDAR-SLAM frameworks in terms of accuracy and robustness. To encourage further research, the code of our Doppler-SLAM and our dataset are available at: this https URL.
同时定位与地图构建(SLAM)是自主系统的一项关键能力。传统SLAM方法通常依赖于视觉或激光雷达传感器,在恶劣条件下如低光环境或特征缺乏的环境中面临重大挑战。为克服这些限制,我们提出了一种新型多普勒辅助的雷达-惯性及激光雷达-惯性SLAM框架,该框架利用4D雷达、调频连续波(FMCW)激光雷达以及惯性测量单元(IMU)之间的互补优势。我们的系统将多普勒速度测量和空间数据集成到紧密耦合的前端和图优化后端中,以提供增强的自我速度估计、准确的姿态跟踪以及稳健的地图构建。此外,我们还提出了一种基于多普勒的速度匹配技术,用于改进动态环境中的前端姿态跟踪。我们的框架还包括一种创新性的在线外参校准机制,该机制利用多普勒速度和闭环检测来动态保持传感器对齐。 在公共数据集和专有数据集上的大量评估表明,与最先进的雷达-SLAM和激光雷达-SLAM框架相比,我们的系统在准确性和鲁棒性方面表现出显著优势。为了促进进一步的研究,我们的多普勒SLAM代码及数据集可在以下链接获取:[此URL](请将“this https URL”替换为实际提供的具体网址)。
https://arxiv.org/abs/2504.11634
Autonomous exploration for mapping unknown large scale environments is a fundamental challenge in robotics, with efficiency in time, stability against map corruption and computational resources being crucial. This paper presents a novel approach to indoor exploration that addresses these key issues in existing methods. We introduce a Simultaneous Localization and Mapping (SLAM)-aware region-based exploration strategy that partitions the environment into discrete regions, allowing the robot to incrementally explore and stabilize each region before moving to the next one. This approach significantly reduces redundant exploration and improves overall efficiency. As the device finishes exploring a region and stabilizes it, we also perform SLAM keyframe marginalization, a technique which reduces problem complexity by eliminating variables, while preserving their essential information. To improves robustness and further enhance efficiency, we develop a check- point system that enables the robot to resume exploration from the last stable region in case of failures, eliminating the need for complete re-exploration. Our method, tested in real homes, office and simulations, outperforms state-of-the-art approaches. The improvements demonstrate substantial enhancements in various real world environments, with significant reductions in keyframe usage (85%), submap usage (50% office, 32% home), pose graph optimization time (78-80%), and exploration duration (10-15%). This region-based strategy with keyframe marginalization offers an efficient solution for autonomous robotic mapping.
自主探索以绘制未知的大规模环境是机器人技术中的一个基本挑战,时间效率、地图数据的稳定性以及计算资源的有效使用至关重要。本文提出了一种新颖的方法来解决现有室内探索方法中存在的这些问题。我们引入了一种基于Simultaneous Localization and Mapping (SLAM)感知区域的探索策略,该策略将环境划分为离散区域,使机器人能够逐步探索和稳定每个区域,然后移动到下一个区域。这种方法显著减少了重复探索,并提高了整体效率。 当设备完成一个区域的探索并将其稳定下来后,我们还会执行SLAM关键帧边际化操作,这是一种通过消除变量来降低问题复杂度的技术,同时保留其核心信息。为了提高鲁棒性和进一步提升效率,我们开发了一个检查点系统,使机器人在遇到故障时可以从最后一个稳定的区域重新开始探索,从而避免了完全重新探索的需要。 我们的方法已经在真实的家庭、办公室以及模拟环境中进行了测试,并且超越了现有的最佳方案。改进后的表现展示了在各种现实世界环境中的显著增强,包括关键帧使用量减少了85%,子图使用量分别在办公场景中减少50%和家庭场景中减少32%,姿态图优化时间减少了78-80%,探索持续时间减少了10-15%。带有关键帧边际化的这种基于区域的策略为自主机器人制图提供了一种高效的解决方案。
https://arxiv.org/abs/2504.10416
This paper describes the approach used by Team UruBots for participation in the 2025 RoboCup Rescue Robot League competition. Our team aims to participate for the first time in this competition at RoboCup, using experience learned from previous competitions and research. We present our vehicle and our approach to tackle the task of detecting and finding victims in search and rescue environments. Our approach contains known topics in robotics, such as ROS, SLAM, Human Robot Interaction and segmentation and perception. Our proposed approach is open source, available to the RoboCup Rescue community, where we aim to learn and contribute to the league.
本文介绍了Team UruBots为参加2025年RoboCup救援机器人联赛所采取的方法。我们团队计划首次参与这项在RoboCup中的竞赛,利用以往比赛和研究中获得的经验。我们将介绍我们的车辆以及在搜索与救援环境中检测和寻找遇难者问题上的解决方法。我们的方法涵盖了机器人技术中已知的主题,例如ROS(机器人操作系统)、SLAM(即时定位与地图构建)、人机交互以及分割与感知等。我们提出的解决方案是开源的,并向RoboCup救援社区开放,旨在从中学习并为联赛做出贡献。
https://arxiv.org/abs/2504.09778
Geometrically accurate and semantically expressive map representations have proven invaluable to facilitate robust and safe mobile robot navigation and task planning. Nevertheless, real-time, open-vocabulary semantic understanding of large-scale unknown environments is still an open problem. In this paper we present FindAnything, an open-world mapping and exploration framework that incorporates vision-language information into dense volumetric submaps. Thanks to the use of vision-language features, FindAnything bridges the gap between pure geometric and open-vocabulary semantic information for a higher level of understanding while allowing to explore any environment without the help of any external source of ground-truth pose information. We represent the environment as a series of volumetric occupancy submaps, resulting in a robust and accurate map representation that deforms upon pose updates when the underlying SLAM system corrects its drift, allowing for a locally consistent representation between submaps. Pixel-wise vision-language features are aggregated from efficient SAM (eSAM)-generated segments, which are in turn integrated into object-centric volumetric submaps, providing a mapping from open-vocabulary queries to 3D geometry that is scalable also in terms of memory usage. The open-vocabulary map representation of FindAnything achieves state-of-the-art semantic accuracy in closed-set evaluations on the Replica dataset. This level of scene understanding allows a robot to explore environments based on objects or areas of interest selected via natural language queries. Our system is the first of its kind to be deployed on resource-constrained devices, such as MAVs, leveraging vision-language information for real-world robotic tasks.
几何准确且语义表达丰富的地图表示已被证明对于促进移动机器人导航和任务规划的稳健性和安全性至关重要。然而,实时、开放词汇表式的大型未知环境语义理解仍然是一个未解决的问题。在本文中,我们介绍了FindAnything框架,这是一个开放式世界映射与探索框架,它将视觉-语言信息整合到密集体积子图中。得益于使用了视觉-语言特征,FindAnything弥合了纯几何信息和开放词汇表式语义信息之间的差距,并达到了更高层次的理解水平,同时允许机器人在没有外部地面实况姿态信息帮助的情况下自由探索任何环境。 我们采用一系列体积占用子图来表示环境,这产生了既稳健又准确的地图表示形式,在底层SLAM系统校正其漂移时,地图可以通过位姿更新进行变形,使得各个子图之间保持局部一致性。像素级别的视觉-语言特征从高效的SAM(eSAM)生成的片段中聚合,并进一步整合到以物体为中心的体积子图中,提供了可扩展至内存使用的、基于开放词汇表查询映射到3D几何结构的映射。 FindAnything的开放词汇语义地图表示在封闭集评估中的Replica数据集中实现了最先进的语义准确性。这种场景理解水平使得机器人能够根据通过自然语言查询选定的对象或区域来探索环境。我们的系统首次部署于资源受限设备(如MAVs)上,利用视觉-语言信息来进行真实世界中的机器人任务。 总的来说,FindAnything框架为基于自然语言的任务规划和环境探索提供了一个强大且高效的解决方案,在资源受限环境中也表现出色。
https://arxiv.org/abs/2504.08603
LiDAR loop closure detection (LCD) is crucial for consistent Simultaneous Localization and Mapping (SLAM) but faces challenges in robustness and accuracy. Existing methods, including semantic graph approaches, often suffer from coarse geometric representations and lack temporal robustness against noise, dynamics, and viewpoint changes. We introduce PNE-SGAN, a Probabilistic NDT-Enhanced Semantic Graph Attention Network, to overcome these limitations. PNE-SGAN enhances semantic graphs by using Normal Distributions Transform (NDT) covariance matrices as rich, discriminative geometric node features, processed via a Graph Attention Network (GAT). Crucially, it integrates graph similarity scores into a probabilistic temporal filtering framework (modeled as an HMM/Bayes filter), incorporating uncertain odometry for motion modeling and utilizing forward-backward smoothing to effectively handle ambiguities. Evaluations on challenging KITTI sequences (00 and 08) demonstrate state-of-the-art performance, achieving Average Precision of 96.2\% and 95.1\%, respectively. PNE-SGAN significantly outperforms existing methods, particularly in difficult bidirectional loop scenarios where others falter. By synergizing detailed NDT geometry with principled probabilistic temporal reasoning, PNE-SGAN offers a highly accurate and robust solution for LiDAR LCD, enhancing SLAM reliability in complex, large-scale environments.
LiDAR环路闭合检测(LCD)对于一致的同步定位与地图构建(SLAM)至关重要,但面临着鲁棒性和准确性方面的挑战。现有的方法,包括语义图方法,在几何表示上的粗略度以及对噪声、动态变化和视点改变的时间稳定性方面存在不足。为此,我们引入了PNE-SGAN,这是一种基于概率NDT增强的语义图注意力网络的方法,旨在克服这些局限性。 PNE-SGAN通过利用法向分布变换(NDT)协方差矩阵作为丰富且具有判别性的几何节点特征来增强语义图,并使用图注意力网络(GAT)对其进行处理。此外,它将图相似度评分整合进一个概率时间过滤框架中(以HMM/贝叶斯滤波模型),结合不确定的里程计数据进行运动建模,并采用前向后向平滑技术有效处理模糊性。 在具有挑战性的KITTI序列(00和08)上的评估表明,PNE-SGAN达到了最先进的性能,分别实现了96.2%和95.1%的平均精度。PNE-SGAN显著优于现有方法,在困难的双向环路场景中尤其表现出色,而其他方法则在此类情况下表现不佳。 通过结合详细的NDT几何结构与严格的时间概率推理,PNE-SGAN提供了一种高准确性和鲁棒性的解决方案以进行LiDAR环路闭合检测,从而在复杂、大规模环境中增强了SLAM的可靠性。
https://arxiv.org/abs/2504.08280
Autonomous Underwater Vehicles (AUVs) play a crucial role in underwater exploration. Vision-based methods offer cost-effective solutions for localization and mapping in the absence of conventional sensors like GPS and LIDAR. However, underwater environments present significant challenges for feature extraction and matching due to image blurring and noise caused by attenuation, scattering, and the interference of \textit{marine snow}. In this paper, we aim to improve the robustness of the feature extraction and matching in the turbid underwater environment using the cross-modal knowledge distillation method that transfers the in-air feature extraction models to underwater settings using synthetic underwater images as the medium. We first propose a novel adaptive GAN-synthesis method to estimate water parameters and underwater noise distribution, to generate environment-specific synthetic underwater images. We then introduce a general knowledge distillation framework compatible with different teacher models. The evaluation of GAN-based synthesis highlights the significance of the new components, i.e. GAN-synthesized noise and forward scattering, in the proposed model. Additionally, the downstream application of feature extraction and matching (VSLAM) on real underwater sequences validates the effectiveness of the transferred model.
自主水下航行器(AUVs)在水下探索中扮演着关键角色。基于视觉的方法为没有传统GPS和LIDAR传感器的情况下的定位和建图提供了经济有效的解决方案。然而,由于图像模糊、由衰减、散射以及“海洋雪”干扰引起的噪声问题,水下环境对特征提取与匹配提出了重大挑战。本文旨在通过一种跨模态知识蒸馏方法来提高在浑浊的水下环境中特征提取与匹配的鲁棒性,该方法利用合成的水下图像作为媒介,将空中特征提取模型转移到水下设置中。 我们首先提出了一种新颖的自适应GAN(生成对抗网络)合成法,用于估计水质参数和水下噪声分布,并据此生成特定环境的合成水下图像。接着,我们介绍了一个通用的知识蒸馏框架,该框架可以与不同的教师模型兼容。基于GAN的合成评估突显了新组件的重要性,即GAN合成的噪声和前向散射,在所提出的模型中的作用。此外,将特征提取和匹配(VSLAM)的实际水下序列作为下游应用验证了转移模型的有效性。 此研究通过生成对抗网络技术改进了水下图像处理方法,增强了在复杂多变的水下环境下的自主导航与建图能力,为未来的水下探索提供了有力的技术支持。
https://arxiv.org/abs/2504.08253
SLAM technology plays a crucial role in indoor mapping and localization. A common challenge in indoor environments is the "double-sided mapping issue", where closely positioned walls, doors, and other surfaces are mistakenly identified as a single plane, significantly hindering map accuracy and consistency. To address this issue this paper introduces a SLAM approach that ensures accurate mapping using normal vector consistency. We enhance the voxel map structure to store both point cloud data and normal vector information, enabling the system to evaluate consistency during nearest neighbor searches and map updates. This process distinguishes between the front and back sides of surfaces, preventing incorrect point-to-plane constraints. Moreover, we implement an adaptive radius KD-tree search method that dynamically adjusts the search radius based on the local density of the point cloud, thereby enhancing the accuracy of normal vector calculations. To further improve realtime performance and storage efficiency, we incorporate a Least Recently Used (LRU) cache strategy, which facilitates efficient incremental updates of the voxel map. The code is released as open-source and validated in both simulated environments and real indoor scenarios. Experimental results demonstrate that this approach effectively resolves the "double-sided mapping issue" and significantly improves mapping precision. Additionally, we have developed and open-sourced the first simulation and real world dataset specifically tailored for the "double-sided mapping issue".
SLAM(Simultaneous Localization and Mapping)技术在室内地图绘制和定位中扮演着关键角色。室内环境中常见的挑战之一是“双面映射问题”,即紧密相邻的墙壁、门和其他表面容易被误认为是一个平面,这大大影响了地图的精度和一致性。为了解决这一问题,本文提出了一种利用法向量一致性的SLAM方法,以确保准确的映射。 为了实现这一点,我们增强了体素(voxel)地图结构,使其不仅能存储点云数据,还能保存法向量信息。这使系统能够在最近邻搜索和地图更新过程中评估一致性,并区分表面的正面和背面,从而防止错误的点到平面约束条件的产生。此外,我们还实施了一种自适应半径KD树搜索方法,该方法根据点云局部密度动态调整搜索半径,提高了法向量计算的准确性。 为了进一步提高实时性能和存储效率,我们在系统中集成了最近最少使用(LRU)缓存策略,从而支持体素地图的有效增量更新。代码以开源形式发布,并在仿真环境和真实室内场景中得到了验证。实验结果表明,该方法有效地解决了“双面映射问题”,显著提高了地图绘制的精度。 此外,我们还开发并开放了首个针对“双面映射问题”的模拟数据集及实际世界数据集。
https://arxiv.org/abs/2504.08204
Localization of an autonomous mobile robot during planetary exploration is challenging due to the unknown terrain, the difficult lighting conditions and the lack of any global reference such as satellite navigation systems. We present a novel approach for robot localization based on ultra-wideband (UWB) technology. The robot sets up its own reference coordinate system by distributing UWB anchor nodes in the environment via a rocket-propelled launcher system. This allows the creation of a localization space in which UWB measurements are employed to supplement traditional SLAM-based techniques. The system was developed for our involvement in the ESA-ESRIC challenge 2021 and the AMADEE-24, an analog Mars simulation in Armenia by the Austrian Space Forum (ÃWF).
在行星探索中,自主移动机器人的定位面临诸多挑战,包括未知地形、恶劣的光照条件以及缺乏全球参考系统(如卫星导航系统)。为此,我们提出了一种基于超宽带(UWB)技术的新机器人定位方法。该方法利用火箭推进发射系统在环境中部署UWB锚节点,从而建立自己的参考坐标系。这种方法创建了一个可以运用UWB测量来补充传统SLAM(即时定位与地图构建)技术的定位空间。 这套系统是为参与2021年ESA-ESRIC挑战赛和奥地利太空论坛(ÃWF)在亚美尼亚进行的类火星模拟任务AMADEE-24而开发的。
https://arxiv.org/abs/2504.07658
Events offer a novel paradigm for capturing scene dynamics via asynchronous sensing, but their inherent randomness often leads to degraded signal quality. Event signal filtering is thus essential for enhancing fidelity by reducing this internal randomness and ensuring consistent outputs across diverse acquisition conditions. Unlike traditional time series that rely on fixed temporal sampling to capture steady-state behaviors, events encode transient dynamics through polarity and event intervals, making signal modeling significantly more complex. To address this, the theoretical foundation of event generation is revisited through the lens of diffusion processes. The state and process information within events is modeled as continuous probability flux at threshold boundaries of the underlying irradiance diffusion. Building on this insight, a generative, online filtering framework called Event Density Flow Filter (EDFilter) is introduced. EDFilter estimates event correlation by reconstructing the continuous probability flux from discrete events using nonparametric kernel smoothing, and then resamples filtered events from this flux. To optimize fidelity over time, spatial and temporal kernels are employed in a time-varying optimization framework. A fast recursive solver with O(1) complexity is proposed, leveraging state-space models and lookup tables for efficient likelihood computation. Furthermore, a new real-world benchmark Rotary Event Dataset (RED) is released, offering microsecond-level ground truth irradiance for full-reference event filtering evaluation. Extensive experiments validate EDFilter's performance across tasks like event filtering, super-resolution, and direct event-based blob tracking. Significant gains in downstream applications such as SLAM and video reconstruction underscore its robustness and effectiveness.
事件提供了通过异步传感捕捉场景动态的新范式,但其内在的随机性往往会导致信号质量下降。因此,对事件信号进行过滤是增强信号保真度、减少内部随机性并确保在各种获取条件下输出一致性所必需的。与依赖固定时间采样来捕获稳态行为的传统时间序列不同,事件通过极性和事件间隔编码瞬态动态变化,这使得信号建模变得更加复杂。 为了应对这一挑战,本文从扩散过程的角度重新审视了事件生成的基本理论。事件内的状态和进程信息被建模为底层辐照度扩散在阈值边界处的连续概率流。基于此见解,引入了一种名为事件密度流动过滤器(EDFilter)的生成式在线过滤框架。EDFilter通过非参数核平滑技术从离散事件中重构出连续的概率流来估计事件的相关性,并随后从该流中重新采样过滤后的事件。 为了随着时间优化保真度,在时间变化的优化框架中采用了空间和时间内核。提出了一种具有O(1)复杂性的快速递归求解器,利用状态空间模型和查找表进行高效的似然计算。此外,还发布了一个新的现实世界基准——旋转事件数据集(RED),该数据集提供了微秒级的地面真实辐照度,用于全面参考事件过滤评估。 通过广泛的实验验证了EDFilter在事件过滤、超分辨率和直接基于事件的目标追踪等任务中的性能表现。其下游应用如SLAM(即时定位与地图构建)和视频重建中取得的重大进展进一步强调了它的鲁棒性和有效性。
https://arxiv.org/abs/2504.07503
Simultaneous localization and mapping (SLAM) technology now has photorealistic mapping capabilities thanks to the real-time high-fidelity rendering capability of 3D Gaussian splatting (3DGS). However, due to the static representation of scenes, current 3DGS-based SLAM encounters issues with pose drift and failure to reconstruct accurate maps in dynamic environments. To address this problem, we present D4DGS-SLAM, the first SLAM method based on 4DGS map representation for dynamic environments. By incorporating the temporal dimension into scene representation, D4DGS-SLAM enables high-quality reconstruction of dynamic scenes. Utilizing the dynamics-aware InfoModule, we can obtain the dynamics, visibility, and reliability of scene points, and filter stable static points for tracking accordingly. When optimizing Gaussian points, we apply different isotropic regularization terms to Gaussians with varying dynamic characteristics. Experimental results on real-world dynamic scene datasets demonstrate that our method outperforms state-of-the-art approaches in both camera pose tracking and map quality.
基于同时定位与地图构建(SLAM)技术如今凭借实时的高保真渲染能力拥有了逼真的三维映射功能,这要归功于3D高斯点阵(3DGS)。然而,由于当前3DGS-based SLAM对场景采用静态表示方式,在动态环境中会出现姿态漂移和无法准确重建地图的问题。为了解决这个问题,我们提出了一种新的方法——D4DGS-SLAM,这是第一个基于四维高斯点阵(4DGS)的地图表征的SLAM技术,专门用于处理动态环境。 通过将时间维度引入场景表示中,D4DGS-SLAM能够高质量地重建动态场景。利用感知运动的信息模块(InfoModule),我们可以获取场景点的运动状态、可见性和可靠性,并根据这些信息过滤出稳定的静态点以进行跟踪。在优化高斯点时,我们针对不同运动特性的高斯分布应用不同的各向同性正则化项。 在真实世界的动态场景数据集上的实验结果表明,与现有最先进的方法相比,我们的方法在相机姿态追踪和地图质量方面均表现出更优的性能。
https://arxiv.org/abs/2504.04844
Accurate and stable feature matching is critical for computer vision tasks, particularly in applications such as Simultaneous Localization and Mapping (SLAM). While recent learning-based feature matching methods have demonstrated promising performance in challenging spatiotemporal scenarios, they still face inherent trade-offs between accuracy and computational efficiency in specific settings. In this paper, we propose a lightweight feature matching network designed to establish sparse, stable, and consistent correspondence between multiple frames. The proposed method eliminates the dependency on manual annotations during training and mitigates feature drift through a hybrid self-supervised paradigm. Extensive experiments validate three key advantages: (1) Our method operates without dependency on external prior knowledge and seamlessly incorporates its hybrid training mechanism into original datasets. (2) Benchmarked against state-of-the-art deep learning-based methods, our approach maintains equivalent computational efficiency at low-resolution scales while achieving a 2-10x improvement in computational efficiency for high-resolution inputs. (3) Comparative evaluations demonstrate that the proposed hybrid self-supervised scheme effectively mitigates feature drift in long-term tracking while maintaining consistent representation across image sequences.
精确且稳定的特征匹配对于计算机视觉任务至关重要,特别是在诸如同时定位与地图构建(SLAM)等应用中。尽管最近基于学习的特征匹配方法在具有挑战性的时空场景下展示了令人鼓舞的表现,但在特定情况下它们仍然面临着精度和计算效率之间的内在权衡。本文提出了一种轻量级的特征匹配网络,旨在建立多帧之间稀疏、稳定且一致的对应关系。所提出的方法消除了训练过程中对人工标注的依赖,并通过混合自监督范式缓解了特征漂移问题。广泛的实验验证了三种关键优势: 1. 我们的方法在操作时无需依赖外部先验知识,可以无缝地将其混合训练机制整合到原始数据集中。 2. 与最先进的深度学习方法相比,在低分辨率尺度下我们的方法保持相当的计算效率,而在高分辨率输入方面则实现了2-10倍的计算效率提升。 3. 对比评估表明,所提出的混合自监督方案在长时间跟踪中有效缓解了特征漂移,并在整个图像序列中保持了一致的表现。
https://arxiv.org/abs/2504.04497
Visual Simultaneous Localization and Mapping (VSLAM) research faces significant challenges due to fragmented toolchains, complex system configurations, and inconsistent evaluation methodologies. To address these issues, we present VSLAM-LAB, a unified framework designed to streamline the development, evaluation, and deployment of VSLAM systems. VSLAM-LAB simplifies the entire workflow by enabling seamless compilation and configuration of VSLAM algorithms, automated dataset downloading and preprocessing, and standardized experiment design, execution, and evaluation--all accessible through a single command-line interface. The framework supports a wide range of VSLAM systems and datasets, offering broad compatibility and extendability while promoting reproducibility through consistent evaluation metrics and analysis tools. By reducing implementation complexity and minimizing configuration overhead, VSLAM-LAB empowers researchers to focus on advancing VSLAM methodologies and accelerates progress toward scalable, real-world solutions. We demonstrate the ease with which user-relevant benchmarks can be created: here, we introduce difficulty-level-based categories, but one could envision environment-specific or condition-specific categories.
视觉同时定位与地图构建(VSLAM)研究面临着工具链碎片化、系统配置复杂以及评估方法不一致等重大挑战。为解决这些问题,我们提出了一个统一框架——VSLAM-LAB,旨在简化VSLAM系统的开发、评估和部署流程。通过这一框架,用户可以轻松编译和配置VSLAM算法,并自动化下载和预处理数据集,同时标准化实验设计、执行及评估过程,所有这些操作均可通过单一命令行界面完成。 VSLAM-LAB支持广泛的VSLAM系统和数据集,提供广泛兼容性和可扩展性的同时,也通过一致的评价指标和分析工具促进研究工作的重现性。该框架减少了实现复杂度,并且最大限度地降低了配置负担,从而让研究人员能够专注于推进VSLAM技术的发展并加速向具有实际应用价值的解决方案迈进。 我们展示了创建用户相关基准测试的简便性:这里介绍了基于难度级别的分类方式,但也可以设想根据环境或条件进行特定分类的方式。
https://arxiv.org/abs/2504.04457
This paper addresses the problem of Simultaneous Localization and Mapping (SLAM) for rigid body systems in three-dimensional space. We introduce a new matrix Lie group SE_{3+n}(3), whose elements are composed of the pose, gravity, linear velocity and landmark positions, and propose an almost globally asymptotically stable nonlinear geometric observer that integrates Inertial Measurement Unit (IMU) data with landmark measurements. The proposed observer estimates the pose and map up to a constant position and a constant rotation about the gravity direction. Numerical simulations are provided to validate the performance and effectiveness of the proposed observer, demonstrating its potential for robust SLAM applications.
本文解决了三维空间中刚体系统的同时定位与地图构建(SLAM)问题。我们引入了一种新的矩阵李群SE_{3+n}(3),其元素由姿态、重力方向、线性速度和地标位置组成,并提出了一种几乎全局渐近稳定的非线性几何观测器,该观测器将惯性测量单元(IMU)数据与地标测量值相结合。所提出的观测器能够估计姿态和地图,且仅相差一个恒定的位置和平移(绕重力方向的旋转)。通过数值仿真验证了所提观测器的性能及有效性,展示了其在鲁棒SLAM应用中的潜力。
https://arxiv.org/abs/2504.04239
We present WildGS-SLAM, a robust and efficient monocular RGB SLAM system designed to handle dynamic environments by leveraging uncertainty-aware geometric mapping. Unlike traditional SLAM systems, which assume static scenes, our approach integrates depth and uncertainty information to enhance tracking, mapping, and rendering performance in the presence of moving objects. We introduce an uncertainty map, predicted by a shallow multi-layer perceptron and DINOv2 features, to guide dynamic object removal during both tracking and mapping. This uncertainty map enhances dense bundle adjustment and Gaussian map optimization, improving reconstruction accuracy. Our system is evaluated on multiple datasets and demonstrates artifact-free view synthesis. Results showcase WildGS-SLAM's superior performance in dynamic environments compared to state-of-the-art methods.
我们提出了WildGS-SLAM,这是一种鲁棒且高效的单目RGB SLAM系统,旨在通过利用不确定性的几何映射来处理动态环境。与传统假设场景静止的SLAM系统不同,我们的方法结合了深度和不确定性信息,在存在移动物体的情况下增强了跟踪、建图和渲染性能。 我们引入了一种由浅层多层感知机(MLP)和DINOv2特征预测的不确定性地图,用于在跟踪和建图过程中引导动态对象移除。这种不确定性地图能够增强密集束调整和高斯映射优化,从而提高重建精度。 我们在多个数据集上对我们的系统进行了评估,并展示了无伪影视点合成的效果。实验结果表明,在处理动态环境方面,WildGS-SLAM相比现有的最佳方法表现更为优越。
https://arxiv.org/abs/2504.03886
The widespread adoption of learning-based methods for the LiDAR makes autonomous vehicles vulnerable to adversarial attacks through adversarial \textit{point injections (PiJ)}. It poses serious security challenges for navigation and map generation. Despite its critical nature, no major work exists that studies learning-based attacks on LiDAR-based SLAM. Our work proposes SLACK, an end-to-end deep generative adversarial model to attack LiDAR scans with several point injections without deteriorating LiDAR quality. To facilitate SLACK, we design a novel yet simple autoencoder that augments contrastive learning with segmentation-based attention for precise reconstructions. SLACK demonstrates superior performance on the task of \textit{point injections (PiJ)} compared to the best baselines on KITTI and CARLA-64 dataset while maintaining accurate scan quality. We qualitatively and quantitatively demonstrate PiJ attacks using a fraction of LiDAR points. It severely degrades navigation and map quality without deteriorating the LiDAR scan quality.
基于学习的方法在LiDAR(激光雷达)中的广泛应用使得自主车辆易受通过对抗性“点注入(PiJ)”发起的攻击,这对导航和地图生成的安全构成了严重挑战。尽管这个问题至关重要,但目前尚无重大研究关注针对基于LiDAR的SLAM(同步定位与映射)的学习型攻击方法。我们的工作提出了一种端到端深度生成式对抗模型——SLACK,能够使用多种点注入对LiDAR扫描进行攻击而不影响激光雷达的质量。为了支持SLACK的工作机制,我们设计了一个新颖而简单的自编码器,该编码器通过结合基于分割的注意力增强对比学习来实现精确重建。在KITTI和CARLA-64数据集上,与现有的最佳基线相比,SLACK在执行点注入(PiJ)任务时表现出更优性能,并且能够保持扫描质量的准确性。我们从定性和定量两方面展示了使用少量LiDAR点进行PiJ攻击的能力:这些攻击严重损害了导航和地图的质量,但未影响激光雷达扫描本身的品质。
https://arxiv.org/abs/2504.03089
Robot vision has greatly benefited from advancements in multimodal fusion techniques and vision-language models (VLMs). We systematically review the applications of multimodal fusion in key robotic vision tasks, including semantic scene understanding, simultaneous localization and mapping (SLAM), 3D object detection, navigation and localization, and robot manipulation. We compare VLMs based on large language models (LLMs) with traditional multimodal fusion methods, analyzing their advantages, limitations, and synergies. Additionally, we conduct an in-depth analysis of commonly used datasets, evaluating their applicability and challenges in real-world robotic scenarios. Furthermore, we identify critical research challenges such as cross-modal alignment, efficient fusion strategies, real-time deployment, and domain adaptation, and propose future research directions, including self-supervised learning for robust multimodal representations, transformer-based fusion architectures, and scalable multimodal frameworks. Through a comprehensive review, comparative analysis, and forward-looking discussion, we provide a valuable reference for advancing multimodal perception and interaction in robotic vision. A comprehensive list of studies in this survey is available at this https URL.
机器人视觉从多模态融合技术及视觉-语言模型(VLMs)的进展中受益良多。我们系统地回顾了多模态融合在关键机器人视觉任务中的应用,包括语义场景理解、同时定位与地图构建(SLAM)、3D目标检测、导航和定位以及机器人操作等任务。我们将基于大型语言模型(LLMs)的VLMs与传统的多模态融合方法进行了比较,并分析了它们各自的优缺点及协同效应。此外,我们还对常用的数据集进行了深入分析,评估其在真实世界中的适用性和挑战性问题。另外,我们识别了一些关键的研究难题,如跨模式对齐、高效融合策略、实时部署以及领域适应等,并提出了未来的研究方向,包括自监督学习以实现稳健的多模态表示、基于Transformer的融合架构及可扩展的多模态框架。通过全面的综述、比较分析和前瞻性讨论,我们为推进机器人视觉中的多模态感知与交互提供了宝贵的参考。此调查中列出的相关研究文献可在以下链接查阅:[请在此处插入实际URL]。
https://arxiv.org/abs/2504.02477
We present MonoGS++, a novel fast and accurate Simultaneous Localization and Mapping (SLAM) method that leverages 3D Gaussian representations and operates solely on RGB inputs. While previous 3D Gaussian Splatting (GS)-based methods largely depended on depth sensors, our approach reduces the hardware dependency and only requires RGB input, leveraging online visual odometry (VO) to generate sparse point clouds in real-time. To reduce redundancy and enhance the quality of 3D scene reconstruction, we implemented a series of methodological enhancements in 3D Gaussian mapping. Firstly, we introduced dynamic 3D Gaussian insertion to avoid adding redundant Gaussians in previously well-reconstructed areas. Secondly, we introduced clarity-enhancing Gaussian densification module and planar regularization to handle texture-less areas and flat surfaces better. We achieved precise camera tracking results both on the synthetic Replica and real-world TUM-RGBD datasets, comparable to those of the state-of-the-art. Additionally, our method realized a significant 5.57x improvement in frames per second (fps) over the previous state-of-the-art, MonoGS.
我们介绍了MonoGS++,这是一种新颖的快速且准确的同时定位与地图构建(SLAM)方法,它利用3D高斯表示,并仅依赖于RGB输入。尽管先前基于三维高斯点阵(GS)的方法主要依靠深度传感器,我们的方法减少了对硬件的依赖,只需要RGB输入,并利用在线视觉测距法(VO)实时生成稀疏点云。为了减少冗余并提高三维场景重建的质量,我们在3D高斯映射中实施了一系列方法改进: 首先,我们引入了动态3D高斯插入技术,以避免在先前已经充分重建的区域添加多余的高斯值。 其次,我们引入了一个增强清晰度的高斯密集化模块和平面正则化处理来更好地应对纹理不足的区域和扁平表面。 我们在合成的Replica数据集和真实的TUM-RGBD数据集上实现了与现有先进技术相媲美的精确相机追踪结果。此外,我们的方法在每秒帧数(fps)方面比先前的最佳技术MonoGS提高了5.57倍。
https://arxiv.org/abs/2504.02437
For utilizing autonomous vehicle in urban areas a reliable localization is needed. Especially when HD maps are used, a precise and repeatable method has to be chosen. Therefore accurate map generation but also re-localization against these maps is necessary. Due to best 3D reconstruction of the surrounding, LiDAR has become a reliable modality for localization. The latest LiDAR odometry estimation are based on iterative closest point (ICP) approaches, namely KISS-ICP and SAGE-ICP. We extend the capabilities of KISS-ICP by incorporating semantic information into the point alignment process using a generalizable approach with minimal parameter tuning. This enhancement allows us to surpass KISS-ICP in terms of absolute trajectory error (ATE), the primary metric for map accuracy. Additionally, we improve the Cartographer mapping framework to handle semantic information. Cartographer facilitates loop closure detection over larger areas, mitigating odometry drift and further enhancing ATE accuracy. By integrating semantic information into the mapping process, we enable the filtering of specific classes, such as parked vehicles, from the resulting map. This filtering improves relocalization quality by addressing temporal changes, such as vehicles being moved.
为了在城市地区使用自动驾驶车辆,需要一个可靠的定位系统。特别是在使用高精度地图时,必须选择一种精确且可重复的方法。因此,不仅需要准确的地图生成,还需要针对这些地图进行再定位。由于LiDAR能够提供最佳的3D环境重建,它已成为定位中的一种可靠模式。最新的LiDAR里程计估计主要基于迭代最近点(ICP)方法,例如KISS-ICP和SAGE-ICP。我们通过在点对齐过程中采用一种通用且参数调整最少的方法来整合语义信息,从而扩展了KISS-ICP的功能。这种改进使我们在绝对轨迹误差(ATE)方面超越了原始的KISS-ICP,而ATE是衡量地图精度的主要指标。此外,我们还改进了Cartographer映射框架以处理语义信息。Cartographer可以实现更大范围内的闭环检测,从而减少里程计漂移,并进一步提高ATE准确性。通过将语义信息整合到映射过程中,我们可以过滤出特定类别(如停放的车辆)从生成的地图中。这种过滤提高了再定位质量,解决了诸如移动车辆等时间变化带来的问题。
https://arxiv.org/abs/2504.02086
This paper presents field-tested use cases from Search and Rescue (SAR) missions, highlighting the co-design of mobile robots and communication systems to support Edge-Cloud architectures based on 5G Standalone (SA). The main goal is to contribute to the effective cooperation of multiple robots and first responders. Our field experience includes the development of Hybrid Wireless Sensor Networks (H-WSNs) for risk and victim detection, smartphones integrated into the Robot Operating System (ROS) as Edge devices for mission requests and path planning, real-time Simultaneous Localization and Mapping (SLAM) via Multi-Access Edge Computing (MEC), and implementation of Uncrewed Ground Vehicles (UGVs) for victim evacuation in different navigation modes. These experiments, conducted in collaboration with actual first responders, underscore the need for intelligent network resource management, balancing low-latency and high-bandwidth demands. Network slicing is key to ensuring critical emergency services are performed despite challenging communication conditions. The paper identifies architectural needs, lessons learned, and challenges to be addressed by 6G technologies to enhance emergency response capabilities.
本文介绍了在搜索和救援(SAR)任务中经过实地测试的用例,重点展示了移动机器人与通信系统协同设计以支持基于5G独立组网(SA)的边缘-云架构的方法。主要目标是促进多个机器人与一线响应人员的有效合作。我们的现场经验包括开发混合无线传感器网络(H-WSNs),用于风险和受害者检测;将智能手机集成到机器人操作系统(ROS)中作为边缘设备,用以处理任务请求和路径规划;通过多接入边缘计算(MEC)实现实时的同时定位与地图构建(SLAM);以及在不同导航模式下实施无人地面车辆(UGVs)进行受害者疏散。这些实验是与实际的一线响应人员合作完成的,强调了智能网络资源管理的重要性,以平衡低延迟和高带宽的需求。网络切片对于确保关键紧急服务能够在通信条件恶劣的情况下执行至关重要。本文还确定了架构需求、经验教训以及6G技术需解决的挑战,从而提高应急响应能力。
https://arxiv.org/abs/2504.01940
The accuracy of the initial state, including initial velocity, gravity direction, and IMU biases, is critical for the initialization of LiDAR-inertial SLAM systems. Inaccurate initial values can reduce initialization speed or lead to failure. When the system faces urgent tasks, robust and fast initialization is required while the robot is moving, such as during the swift assessment of rescue environments after natural disasters, bomb disposal, and restarting LiDAR-inertial SLAM in rescue missions. However, existing initialization methods usually require the platform to remain stationary, which is ineffective when the robot is in motion. To address this issue, this paper introduces a robust and fast dynamic initialization method for LiDAR-inertial systems (D-LI-Init). This method iteratively aligns LiDAR-based odometry with IMU measurements to achieve system initialization. To enhance the reliability of the LiDAR odometry module, the LiDAR and gyroscope are tightly integrated within the ESIKF framework. The gyroscope compensates for rotational distortion in the point cloud. Translational distortion compensation occurs during the iterative update phase, resulting in the output of LiDAR-gyroscope odometry. The proposed method can initialize the system no matter the robot is moving or stationary. Experiments on public datasets and real-world environments demonstrate that the D-LI-Init algorithm can effectively serve various platforms, including vehicles, handheld devices, and UAVs. D-LI-Init completes dynamic initialization regardless of specific motion patterns. To benefit the research community, we have open-sourced our code and test datasets on GitHub.
初始状态的准确性,包括初始速度、重力方向和惯性测量单元(IMU)偏差,对于激光雷达与惯性测量单元联合同时定位与地图构建(LiDAR-inertial SLAM)系统的初始化至关重要。不准确的初始值可能会降低初始化的速度或导致系统无法成功初始化。当系统面临紧急任务时,在机器人移动过程中需要进行稳健且快速的初始化,例如在自然灾害后的救援环境快速评估、排爆以及搜救任务中的重新启动LiDAR-Inertial SLAM。然而,现有的初始化方法通常要求平台保持静止状态,这在机器人处于运动状态时是无效的。 为了解决这个问题,本文提出了一种针对激光雷达惯性系统(LiDAR-inertial)的稳健快速动态初始化方法(D-LI-Init)。该方法通过迭代地将基于激光雷达的里程计与IMU测量对齐来实现系统的初始化。为了增强基于激光雷达的里程计量模块的可靠性,本文在扩展状态卡尔曼滤波器框架(ESIKF)内紧密整合了激光雷达和陀螺仪,利用陀螺仪补偿点云中的旋转畸变。在迭代更新阶段,通过修正平移畸变输出激光雷达-陀螺仪里程计结果。 所提出的方法可以在机器人处于移动或静止状态时完成系统初始化。公开数据集和真实世界环境的实验表明,D-LI-Init算法可以有效地服务于各种平台,包括车辆、手持设备和无人机,并且无论特定运动模式如何都可以进行动态初始化。为了促进研究社区的发展,我们在GitHub上开放了我们的代码和测试数据集。
https://arxiv.org/abs/2504.01451