Inverse Reinforcement Learning (IRL) presents a powerful paradigm for learning complex robotic tasks from human demonstrations. However, most approaches make the assumption that expert demonstrations are available, which is often not the case. Those that allow for suboptimality in the demonstrations are not designed for long-horizon goals or adversarial tasks. Many desirable robot capabilities fall into one or both of these categories, thus highlighting a critical shortcoming in the ability of IRL to produce field-ready robotic agents. We introduce Sample-efficient Preference-based inverse reinforcement learning for Long-horizon Adversarial tasks from Suboptimal Hierarchical demonstrations (SPLASH), which advances the state-of-the-art in learning from suboptimal demonstrations to long-horizon and adversarial settings. We empirically validate SPLASH on a maritime capture-the-flag task in simulation, and demonstrate real-world applicability with sim-to-real translation experiments on autonomous unmanned surface vehicles. We show that our proposed methods allow SPLASH to significantly outperform the state-of-the-art in reward learning from suboptimal demonstrations.
逆向强化学习(IRL)为从人类演示中学习复杂机器人任务提供了一个强大的框架。然而,大多数方法假设专家演示是可用的,这在实践中常常不成立。那些允许演示存在非最优性的方法并不适用于长期目标或对抗性任务的设计。许多理想的机器人能力都属于上述一种或两种情况,因此突显了IRL生成可直接应用的机器人代理的能力上的一个关键缺陷。 我们提出了SPLASH(从次优层次化演示中进行样本高效偏好评价逆向强化学习以解决长时序和对抗性任务),该方法在从非最优演示中学习除外,在长期目标与对抗性任务设置方面也实现了对现有技术的重大突破。我们在模拟环境中通过海上夺旗任务验证了SPLASH的效果,并通过自主无人水面艇的仿真到现实转换实验展示了其实际应用潜力。我们证明,我们的方法使SPLASH在从非最优演示中学习奖励时显著超越现有的最先进技术。
https://arxiv.org/abs/2507.08707
Learning whole-body control for locomotion and arm motions in a single policy has challenges, as the two tasks have conflicting goals. For instance, efficient locomotion typically favors a horizontal base orientation, while end-effector tracking may benefit from base tilting to extend reachability. Additionally, current Reinforcement Learning (RL) approaches using a pose-based task specification lack the ability to directly control the end-effector velocity, making smoothly executing trajectories very challenging. To address these limitations, we propose an RL-based framework that allows for dynamic, velocity-aware whole-body end-effector control. Our method introduces a multi-critic actor architecture that decouples the reward signals for locomotion and manipulation, simplifying reward tuning and allowing the policy to resolve task conflicts more effectively. Furthermore, we design a twist-based end-effector task formulation that can track both discrete poses and motion trajectories. We validate our approach through a set of simulation and hardware experiments using a quadruped robot equipped with a robotic arm. The resulting controller can simultaneously walk and move its end-effector and shows emergent whole-body behaviors, where the base assists the arm in extending the workspace, despite a lack of explicit formulations.
在单一策略中学习全身控制以协调行走和手臂动作面临挑战,因为这两个任务的目标往往是冲突的。例如,高效的行走通常倾向于保持水平的基础姿态,而末端执行器跟踪可能需要基础倾斜来增加可达性。此外,目前使用基于姿势的任务规范的强化学习(RL)方法无法直接控制末端执行器的速度,这使得平滑地执行轨迹变得非常困难。 为了解决这些限制,我们提出了一种基于强化学习的框架,该框架允许动态、速度感知的整体末端执行器控制。我们的方法引入了一个多评论员演员架构,将行走和操作任务的奖励信号解耦,简化了奖励调整,并使策略能够更有效地解决任务冲突。此外,我们设计了一种基于扭曲的末端执行器任务规范,可以跟踪离散姿势和运动轨迹。 通过使用配备机械臂的四足机器人进行的一系列模拟和硬件实验验证了我们的方法。由此产生的控制器可以在行走的同时移动其末端执行器,并展示了整体出现的行为,在这种行为中基础帮助手臂扩展工作空间,即使没有明确的形式化也是如此。
https://arxiv.org/abs/2507.08656
When inverse kinematics (IK) is adopted to control robotic arms in manipulation tasks, there is often a discrepancy between the end effector (EE) position of the robot model in the simulator and the physical EE in reality. In most robotic scenarios with sim-to-real transfer, we have information about joint positions in both simulation and reality, but the EE position is only available in simulation. We developed a novel method to overcome this difficulty based on haptic feedback calibration, using a touchscreen in front of the robot that provides information on the EE position in the real environment. During the calibration procedure, the robot touches specific points on the screen, and the information is stored. In the next stage, we build a transformation function from the data based on linear transformation and neural networks that is capable of outputting all missing variables from any partial input (simulated/real joint/EE position). Our results demonstrate that a fully nonlinear neural network model performs best, significantly reducing positioning errors.
当使用逆运动学(IK)来控制机器人手臂执行操作任务时,通常会发现仿真环境中机器人的末端执行器(EE)位置与现实中的物理末端执行器位置之间存在差异。在大多数涉及从仿真到现实转移的机器人场景中,我们同时拥有模拟和现实中关节位置的信息,但只有仿真环境提供了末端执行器的位置信息。为此,我们开发了一种基于触觉反馈校准的新方法,在机器人前方放置一个触摸屏以提供真实环境中末端执行器位置的信息。在校准过程中,机器人会接触屏幕上的特定点,并将这些信息记录下来。 接下来的阶段中,我们会根据线性变换和神经网络的数据构建转换函数,从而能够从任何部分输入(仿真/现实中的关节/末端执行器位置)输出所有缺失变量。我们的结果显示,一个完全非线性的神经网络模型表现最佳,显著减少了定位误差。
https://arxiv.org/abs/2507.08572
LiDAR-based 3D mapping suffers from cumulative drift causing global misalignment, particularly in GNSS-constrained environments. To address this, we propose a unified framework that fuses LiDAR, GNSS, and IMU data for high-resolution city-scale mapping. The method performs velocity-based temporal alignment using Dynamic Time Warping and refines GNSS and IMU signals via extended Kalman filtering. Local maps are built using Normal Distributions Transform-based registration and pose graph optimization with loop closure detection, while global consistency is enforced using GNSS-constrained anchors followed by fine registration of overlapping segments. We also introduce a large-scale multimodal dataset captured in Perth, Western Australia to facilitate future research in this direction. Our dataset comprises 144{,}000 frames acquired with a 128-channel Ouster LiDAR, synchronized RTK-GNSS trajectories, and MEMS-IMU measurements across 21 urban loops. To assess geometric consistency, we evaluated our method using alignment metrics based on road centerlines and intersections to capture both global and local accuracy. Our method reduces the average global alignment error from 3.32\,m to 1.24\,m, achieving a 61.4\% improvement. The constructed high-fidelity map supports a wide range of applications, including smart city planning, geospatial data integration, infrastructure monitoring, and GPS-free navigation. Our method, and dataset together establish a new benchmark for evaluating 3D city mapping in GNSS-constrained environments. The dataset and code will be released publicly.
基于LiDAR的三维地图制作在GNSS受限环境中会因累积漂移而导致全局定位偏差。为解决这一问题,我们提出了一种统一框架,该框架融合了LiDAR、GNSS和IMU数据,用于高分辨率的城市级地图绘制。此方法采用动态时间规整技术进行基于速度的时间对齐,并通过扩展卡尔曼滤波器优化GNSS和IMU信号。局部地图利用基于正态分布变换的配准技术和具有闭环检测的姿态图优化构建而成;全局一致性则借助受GNSS约束的锚点,并进一步精炼重叠部分完成。 为了促进未来的研究,我们在澳大利亚西澳珀斯市捕获了一套大规模多模态数据集。该数据集中包含了144,000帧由具有128通道Ouster LiDAR采集的数据、同步RTK-GNSS轨迹和MEMS-IMU测量值,在覆盖了21条城市环路的情况下收集而成。 为了评估几何一致性,我们基于道路中心线和交叉路口的对齐度量来测试我们的方法,这些指标可以捕捉全局和局部精度。通过这种方法,我们将平均全局对准误差从3.32米减少到1.24米,实现了61.4%的改进率。 所创建的高保真地图适用于包括智慧城市规划、地理空间数据整合、基础设施监测以及无GPS导航在内的广泛应用场景。我们的方法和数据集共同为评估GNSS受限环境下的三维城市制图设立了新的基准标准。该数据集与代码将公开发布以供使用。
https://arxiv.org/abs/2507.08420
Considerable advancements have been achieved in SLAM methods tailored for structured environments, yet their robustness under challenging corner cases remains a critical limitation. Although multi-sensor fusion approaches integrating diverse sensors have shown promising performance improvements, the research community faces two key barriers: On one hand, the lack of standardized and configurable benchmarks that systematically evaluate SLAM algorithms under diverse degradation scenarios hinders comprehensive performance assessment. While on the other hand, existing SLAM frameworks primarily focus on fusing a limited set of sensor types, without effectively addressing adaptive sensor selection strategies for varying environmental conditions. To bridge these gaps, we make three key contributions: First, we introduce M3DGR dataset: a sensor-rich benchmark with systematically induced degradation patterns including visual challenge, LiDAR degeneracy, wheel slippage and GNSS denial. Second, we conduct a comprehensive evaluation of forty SLAM systems on M3DGR, providing critical insights into their robustness and limitations under challenging real-world conditions. Third, we develop a resilient modular multi-sensor fusion framework named Ground-Fusion++, which demonstrates robust performance by coupling GNSS, RGB-D, LiDAR, IMU (Inertial Measurement Unit) and wheel odometry. Codes and datasets are publicly available.
在针对结构化环境的SLAM(同步定位与地图构建)方法中已经取得了显著的进步,然而,在处理具有挑战性的边缘情况时其鲁棒性仍然是一个关键限制。虽然多传感器融合方法整合多种传感器显示出有前景的性能改进,但研究社区面临着两个主要障碍:一方面,缺乏标准化和可配置的基准测试,这些测试系统地评估SLAM算法在各种退化场景下的表现,这阻碍了全面的性能评价。另一方面,现有的SLAM框架主要集中在有限类型的传感器融合上,并未有效解决适应不同环境条件的自适应传感器选择策略问题。 为了弥补这些差距,我们做出了三项关键贡献:首先,我们引入了M3DGR数据集:一个包含系统性诱导退化模式(包括视觉挑战、LiDAR退化、车轮打滑和GNSS拒止)的丰富传感器基准测试。其次,我们在M3DGR上对四十种SLAM系统进行了全面评估,提供了它们在具有挑战性的现实条件下稳健性和局限性的关键见解。第三,我们开发了一个名为Ground-Fusion++的弹性模块化多传感器融合框架,通过结合GNSS(全球导航卫星系统)、RGB-D、LiDAR、IMU(惯性测量单元)和轮式里程计,展示了其在各种环境条件下的鲁棒性能。代码和数据集已经公开发布。
https://arxiv.org/abs/2507.08364
Accurate extrinsic calibration between multiple LiDAR sensors and a GNSS-aided inertial navigation system (GINS) is essential for achieving reliable sensor fusion in intelligent mining environments. Such calibration enables vehicle-road collaboration by aligning perception data from vehicle-mounted sensors to a unified global reference frame. However, existing methods often depend on artificial targets, overlapping fields of view, or precise trajectory estimation, which are assumptions that may not hold in practice. Moreover, the planar motion of mining vehicles leads to observability issues that degrade calibration performance. This paper presents a targetless extrinsic calibration method that aligns multiple onboard LiDAR sensors to the GINS coordinate system without requiring overlapping sensor views or external targets. The proposed approach introduces an observation model based on the known installation height of the GINS unit to constrain unobservable calibration parameters under planar motion. A joint optimization framework is developed to refine both the extrinsic parameters and GINS trajectory by integrating multiple constraints derived from geometric correspondences and motion consistency. The proposed method is applicable to heterogeneous LiDAR configurations, including both mechanical and solid-state sensors. Extensive experiments on simulated and real-world datasets demonstrate the accuracy, robustness, and practical applicability of the approach under diverse sensor setups.
在智能采矿环境中,多个激光雷达(LiDAR)传感器与全球导航卫星系统辅助惯性导航系统(GNSS-aided INS,简称GINS)之间的精确外参数校准对于实现可靠的传感器融合至关重要。这种校准通过将车辆安装的传感器感知数据对齐到统一的全局参考框架中,促进了车路协同作业。然而,现有的方法往往依赖于人工目标、重叠视场或精准轨迹估计等假设条件,在实际操作中这些条件可能无法满足。此外,采矿车辆的平面运动会导致可观察性问题,从而降低校准性能。 本文提出了一种无需外部目标且不需传感器视角重叠的目标外参数标定方法,该方法能够将车载多个LiDAR传感器与GINS坐标系统对齐。所提出的方案引入了一个基于已知GINS单元安装高度的观测模型,在平面运动下约束不可观察校准参数。此外,还开发了一种联合优化框架,通过结合来自几何对应关系和运动一致性的多约束来细化外参以及GINS轨迹。该方法适用于包括机械式和固态传感器在内的异构LiDAR配置。 在模拟数据集和真实世界数据集上的广泛实验表明,在各种传感器设置下,所提出的方法具有高精度、鲁棒性和实际应用价值。
https://arxiv.org/abs/2507.08349
Humanoid robots show significant potential in daily tasks. However, reinforcement learning-based motion policies often suffer from robustness degradation due to the sim-to-real dynamics gap, thereby affecting the agility of real robots. In this work, we propose a novel robust adversarial training paradigm designed to enhance the robustness of humanoid motion policies in real worlds. The paradigm introduces a learnable adversarial attack network that precisely identifies vulnerabilities in motion policies and applies targeted perturbations, forcing the motion policy to enhance its robustness against perturbations through dynamic adversarial training. We conduct experiments on the Unitree G1 humanoid robot for both perceptive locomotion and whole-body control tasks. The results demonstrate that our proposed method significantly enhances the robot's motion robustness in real world environments, enabling successful traversal of challenging terrains and highly agile whole-body trajectory tracking.
人形机器人在日常任务中展现出巨大的潜力。然而,基于强化学习的运动策略往往由于仿真到现实的动力学差距而出现鲁棒性下降的问题,从而影响了实际机器人的敏捷性。为此,我们提出了一种新的稳健对抗训练范式,旨在提高人形机器人在真实世界中的运动策略的鲁棒性。该范式引入了一个可学习的对抗攻击网络,能够精确地识别运动策略中的漏洞并施加针对性的扰动,通过动态对抗训练迫使运动策略增强其对扰动的抵抗能力。 我们在Unitree G1人形机器人的感知行走和全身控制任务上进行了实验。结果表明,我们提出的方法显著提高了机器人在真实环境中的运动鲁棒性,使其能够成功地穿越具有挑战性的地形并实现高度敏捷的整体轨迹跟踪。
https://arxiv.org/abs/2507.08303
Large language models (LLMs) have shown promise in robotic procedural planning, yet their human-centric reasoning often omits the low-level, grounded details needed for robotic execution. Vision-language models (VLMs) offer a path toward more perceptually grounded plans, but current methods either rely on expensive, large-scale models or are constrained to narrow simulation settings. We introduce SelfReVision, a lightweight and scalable self-improvement framework for vision-language procedural planning. SelfReVision enables small VLMs to iteratively critique, revise, and verify their own plans-without external supervision or teacher models-drawing inspiration from chain-of-thought prompting and self-instruct paradigms. Through this self-distillation loop, models generate higher-quality, execution-ready plans that can be used both at inference and for continued fine-tuning. Using models varying from 3B to 72B, our results show that SelfReVision not only boosts performance over weak base VLMs but also outperforms models 100X the size, yielding improved control in downstream embodied tasks.
大型语言模型(LLMs)在机器人程序规划中展现出潜力,但它们以人类为中心的推理往往忽略了机器人执行所需的低层次、具象化的细节。视觉-语言模型(VLMs)为更感知基础的计划提供了一条路径,然而当前的方法要么依赖于昂贵的大规模模型,要么局限于狭窄的模拟环境中。我们引入了SelfReVision,这是一个轻量级且可扩展的自我改进框架,用于视觉-语言程序规划。SelfReVision使小型VLM能够在没有外部监督或教师模型的情况下迭代地批评、修订和验证自己的计划,这借鉴了链式思维提示和自我指令范式的灵感。通过这种自我蒸馏循环,模型能够生成更高质量且可以直接执行的计划,并可用于推理以及持续微调。使用从30亿到720亿参数规模不同的模型进行实验,我们的结果表明SelfReVision不仅增强了弱基础VLMs的表现,而且还优于100倍大小的模型,在下游具身任务中提供了更好的控制能力。
https://arxiv.org/abs/2507.08224
Obstacle avoidance is crucial for mobile robots' navigation in both known and unknown environments. This research designs, trains, and tests two custom Convolutional Neural Networks (CNNs), using color and depth images from a depth camera as inputs. Both networks adopt sensor fusion to produce an output: the mobile robot's angular velocity, which serves as the robot's steering command. A newly obtained visual dataset for navigation was collected in diverse environments with varying lighting conditions and dynamic obstacles. During data collection, a communication link was established over Wi-Fi between a remote server and the robot, using Robot Operating System (ROS) topics. Velocity commands were transmitted from the server to the robot, enabling synchronized recording of visual data and the corresponding steering commands. Various evaluation metrics, such as Mean Squared Error, Variance Score, and Feed-Forward time, provided a clear comparison between the two networks and clarified which one to use for the application.
避障对于移动机器人在已知和未知环境中的导航至关重要。本研究设计、训练并测试了两个自定义的卷积神经网络(CNN),这些网络使用来自深度相机的颜色和深度图像作为输入。这两个网络都采用了传感器融合技术,以生成输出:即移动机器人的角速度,这充当机器人的转向指令。为了这项研究,我们收集了一个全新的用于导航的视觉数据集,在各种光照条件和动态障碍物变化多样的环境中进行了采集。在数据收集过程中,通过Wi-Fi在远程服务器与机器人之间建立了通信链路,并使用了机器人操作系统(ROS)主题进行数据传输。速度命令从服务器发送到机器人,使得视觉数据和相应的转向指令能够同步记录下来。各种评估指标,如均方误差、方差评分以及前向传播时间,为这两个网络之间的比较提供了清晰的依据,并明确了在实际应用中应选择哪个网络使用。
https://arxiv.org/abs/2507.08112
In crowded environments, individuals must navigate around other occupants to reach their destinations. Understanding and controlling traffic flows in these spaces is relevant to coordinating robot swarms and designing infrastructure for dense populations. Here, we combine simulations, theory, and robotic experiments to study how noisy motion can disrupt traffic jams and enable flow as agents travel to individual goals. Above a critical noise level, large jams do not persist. From this observation, we analytically approximate the goal attainment rate as a function of the noise level, then solve for the optimal agent density and noise level that maximize the swarm's goal attainment rate. We perform robotic experiments to corroborate our simulated and theoretical results. Finally, we compare simple, local navigation approaches with a sophisticated but computationally costly central planner. A simple reactive scheme performs well up to moderate densities and is far more computationally efficient than a planner, suggesting lessons for real-world problems.
在拥挤的环境中,个体必须绕过其他占用者以到达目的地。理解并控制这些空间中的交通流对于协调机器人集群和为密集人口设计基础设施具有重要意义。在这里,我们结合模拟、理论和机器人实验来研究噪声运动如何扰乱交通堵塞,并使代理达到个人目标时能够流动。当噪音水平超过一个临界值后,大的堵塞不再持续存在。基于这一观察,我们分析地近似了以噪声水平为函数的目标实现率,然后求解出最大化集群目标实现率的最佳代理密度和噪声水平。我们进行了机器人实验来验证模拟和理论结果。最后,我们将简单的局部导航方法与复杂但计算成本高昂的中央规划器进行比较。在中等密度下,一个简单的反应式方案表现良好,并且比规划者更高效得多,这为解决实际问题提供了启示。
https://arxiv.org/abs/2507.08100
Robots can better interact with humans and unstructured environments through touch sensing. However, most commercial robots are not equipped with tactile skins, making it challenging to achieve even basic touch-sensing functions, such as contact localization. We present UniTac, a data-driven whole-body touch-sensing approach that uses only proprioceptive joint sensors and does not require the installation of additional sensors. Our approach enables a robot equipped solely with joint sensors to localize contacts. Our goal is to democratize touch sensing and provide an off-the-shelf tool for HRI researchers to provide their robots with touch-sensing capabilities. We validate our approach on two platforms: the Franka robot arm and the Spot quadruped. On Franka, we can localize contact to within 8.0 centimeters, and on Spot, we can localize to within 7.2 centimeters at around 2,000 Hz on an RTX 3090 GPU without adding any additional sensors to the robot. Project website: this https URL.
机器人可以通过触觉感应更好地与人类和非结构化环境互动。然而,大多数商用机器人并未配备触觉皮肤,这使得实现诸如接触定位等基本的触感功能变得困难。我们提出了UniTac,这是一种基于数据驱动的整体触觉感知方法,仅使用本体感受关节传感器,并不需要安装额外的传感器。我们的方法使只配备了关节传感器的机器人都能够进行接触定位。我们的目标是让触觉感应更加普及,并为人机交互(HRI)研究者提供现成的工具,以赋予他们的机器人触觉感知能力。我们在两个平台上验证了该方法的有效性:Franka机械臂和Spot四足机器人。在Franka上,我们能够将接触定位到8.0厘米范围内;而在Spot上,在配备RTX 3090 GPU的情况下,无需添加任何额外的传感器即可实现在约2,000 Hz频率下的7.2厘米范围内的接触定位。 项目网站:[此处插入实际链接]
https://arxiv.org/abs/2507.07980
The safety validation of automatic emergency braking system (AEBS) requires accurately distinguishing between false positive (FP) and true positive (TP) system activations. While simulations allow straightforward differentiation by comparing scenarios with and without interventions, analyzing activations from open-loop resimulations - such as those from field operational testing (FOT) - is more complex. This complexity arises from scenario parameter uncertainty and the influence of driver interventions in the recorded data. Human labeling is frequently used to address these challenges, relying on subjective assessments of intervention necessity or situational criticality, potentially introducing biases and limitations. This work proposes a rule-based classification approach leveraging the Prediction Divergence Principle (PDP) to address those issues. Applied to a simplified AEBS, the proposed method reveals key strengths, limitations, and system requirements for effective implementation. The findings suggest that combining this approach with human labeling may enhance the transparency and consistency of classification, thereby improving the overall validation process. While the rule set for classification derived in this work adopts a conservative approach, the paper outlines future directions for refinement and broader applicability. Finally, this work highlights the potential of such methods to complement existing practices, paving the way for more reliable and reproducible AEBS validation frameworks.
自动紧急制动系统(AEBS)的安全验证需要准确地区分误报(FP)和真正报(TP)的系统激活。虽然模拟可以通过比较有干预和无干预的情景来轻松区分这些情况,但对开放循环重仿真(如现场运行测试中的记录数据进行分析)更为复杂。这种复杂性源于情景参数不确定性以及驾驶员干预的影响。通常采用人工标注的方法来解决这些问题,这种方法依赖于主观判断干预的必要性或情境的关键程度,可能会引入偏见和限制。本研究提出了一种基于预测分歧原则(PDP)的规则分类方法,以解决上述问题。该方法应用于简化后的AEBS系统中,揭示了有效实施的关键优势、局限性和系统要求。研究结果表明,将这种方法与人工标注相结合可以增强分类过程的透明度和一致性,从而提高整体验证过程的质量。 本工作中所提出的分类规则采用保守的方法制定,文章还概述了未来改进和更广泛适用性的方向。最终,该工作强调此类方法具有补充现有实践、为更可靠和可重复的AEBS验证框架铺平道路的巨大潜力。
https://arxiv.org/abs/2507.07872
As the robotics systems increasingly integrate into daily life, from smart home assistants to the new-wave of industrial automation systems (Industry 4.0), there's an increasing need to bridge the gap between complex robotic systems and everyday users. The Robot Operating System (ROS) is a flexible framework often utilised in writing robot software, providing tools and libraries for building complex robotic systems. However, ROS's distributed architecture and technical messaging system create barriers for understanding robot status and diagnosing errors. This gap can lead to extended maintenance downtimes, as users with limited ROS knowledge may struggle to quickly diagnose and resolve system issues. Moreover, this deficit in expertise often delays proactive maintenance and troubleshooting, further increasing the frequency and duration of system interruptions. ROS Help Desk provides intuitive error explanations and debugging support, dynamically customized to users of varying expertise levels. It features user-centric debugging tools that simplify error diagnosis, implements proactive error detection capabilities to reduce downtime, and integrates multimodal data processing for comprehensive system state understanding across multi-sensor data (e.g., lidar, RGB). Testing qualitatively and quantitatively with artificially induced errors demonstrates the system's ability to proactively and accurately diagnose problems, ultimately reducing maintenance time and fostering more effective human-robot collaboration.
随着机器人系统越来越多地融入日常生活,从智能家居助手到工业4.0的新一代自动化系统,对于弥合复杂机器人系统与普通用户之间的差距的需求也在不断增加。机器人操作系统(ROS)是一个灵活的框架,常用于编写机器人软件,提供工具和库来构建复杂的机器人系统。然而,ROS的分布式架构和技术消息传递机制使得理解机器人的状态和诊断错误变得困难。这种差距可能导致延长维护停机时间,因为具有有限ROS知识的用户可能会难以快速诊断并解决系统问题。此外,专业知识的缺乏往往推迟了主动维护和故障排除,进一步增加了系统的中断频率和持续时间。 ROS帮助台提供直观的错误解释和调试支持,并根据不同技能水平的用户进行动态定制。它具备以用户为中心的调试工具,简化了错误诊断过程;实现主动错误检测功能,减少停机时间;并集成了多模态数据处理能力,能够通过来自多个传感器(如激光雷达、RGB)的数据来全面理解系统的状态。 通过人工诱发错误进行定性和定量测试表明该系统具备主动且准确地诊断问题的能力,最终减少了维护时间,并促进了更有效的机器人与人类之间的协作。
https://arxiv.org/abs/2507.07846
Autonomous agents, particularly in the field of robotics, rely on sensory information to perceive and navigate their environment. However, these sensory inputs are often imperfect, leading to distortions in the agent's internal representation of the world. This paper investigates the nature of these perceptual distortions and how they influence autonomous representation learning using a minimal robotic system. We utilize a simulated two-wheeled robot equipped with distance sensors and a compass, operating within a simple square environment. Through analysis of the robot's sensor data during random exploration, we demonstrate how a distorted perceptual space emerges. Despite these distortions, we identify emergent structures within the perceptual space that correlate with the physical environment, revealing how the robot autonomously learns a structured representation for navigation without explicit spatial information. This work contributes to the understanding of embodied cognition, minimal agency, and the role of perception in self-generated navigation strategies in artificial life.
自主代理,尤其是在机器人领域中,依赖于感官信息来感知和导航其环境。然而,这些感官输入往往是不完美的,导致了代理人内部世界表示的扭曲。本文通过一个最小化的机器人系统研究了这种知觉偏差的本质及其对自主表征学习的影响。我们使用了一个配备有距离传感器和指南针的双轮模拟机器人,在简单的方形环境中操作。通过对机器人在随机探索过程中的感官数据进行分析,我们展示了如何在这种情况下出现感知空间的扭曲现象。尽管存在这些扭曲,我们在感知空间中识别出了与物理环境相关的新兴结构,揭示了机器人是如何自主学习用于导航的结构化表示方法,而无需显式的空间信息。这项工作对具身认知、最小代理和感知在人工生命自我生成导航策略中的作用有了更深入的理解。
https://arxiv.org/abs/2507.07845
Unknown dynamic load carrying is one important practical application for quadruped robots. Such a problem is non-trivial, posing three major challenges in quadruped locomotion control. First, how to model or represent the dynamics of the load in a generic manner. Second, how to make the robot capture the dynamics without any external sensing. Third, how to enable the robot to interact with load handling the mutual effect and stabilizing the load. In this work, we propose a general load modeling approach called load characteristics modeling to capture the dynamics of the load. We integrate this proposed modeling technique and leverage recent advances in Reinforcement Learning (RL) based locomotion control to enable the robot to infer the dynamics of load movement and interact with the load indirectly to stabilize it and realize the sim-to-real deployment to verify its effectiveness in real scenarios. We conduct extensive comparative simulation experiments to validate the effectiveness and superiority of our proposed method. Results show that our method outperforms other methods in sudden load resistance, load stabilizing and locomotion with heavy load on rough terrain. \href{this https URL}{Project Page}.
未知的动态负载承载是四足机器人的重要实际应用之一。这类问题具有挑战性,主要面临三个重大难题:首先是如何以通用方式建模或表示负载的动力学;其次是如何在没有任何外部传感器的情况下捕捉负载的动力学特性;最后是如何使机器人能够与负载互动,并处理相互作用的影响来稳定负载。在这项工作中,我们提出了一种名为负载特征建模的通用负载建模方法,旨在捕获负载的动力学特性。我们将这一提议的建模技术与基于强化学习(RL)的步态控制最新进展相结合,使机器人能够间接地推断出负载运动的动力学,并与其互动以稳定负载,同时实现仿真到实际部署的应用验证其在真实场景中的有效性。 我们进行了广泛的比较性模拟实验来验证所提出方法的有效性和优越性。结果表明,我们的方法在应对突发负载阻力、稳定负载以及在崎岖地形上携带重载行走方面优于其他方法。[项目页面](https://this%20URL/)提供了更多详细信息和研究成果。
https://arxiv.org/abs/2507.07825
Mandibular Angle Split Osteotomy (MASO) is a significant procedure in oral and maxillofacial surgery. Despite advances in technique and instrumentation, its success still relies heavily on the surgeon's experience. In this work, a human-robot collaborative system is proposed to perform MASO according to a preoperative plan and under guidance of a surgeon. A task decomposition methodology is used to divide the collaborative surgical procedure into three subtasks: (1) positional control and (2) orientation control, both led by the robot for precise alignment; and (3) force-control, managed by surgeon to ensure safety. Additionally, to achieve patient tracking without the need for a skull clamp, an optical tracking system (OTS) is utilized. Movement of the patient mandibular is measured with an optical-based tracker mounted on a dental occlusal splint. A registration method and Robot-OTS calibration method are introduced to achieve reliable navigation within our framework. The experiments of drilling were conducted on the realistic phantom model, which demonstrated that the average error between the planned and actual drilling points is 1.85mm.
下颌角切骨术(MASO)是口腔和面部外科手术中的一个重要程序。尽管技术与器械有了进步,其成功率仍然很大程度上依赖于外科医生的经验。本文提出了一种人机协作系统,旨在根据术前计划并在外科医生的指导下执行MASO。该系统使用任务分解方法将协作手术过程分为三个子任务:(1)位置控制和(2)方向控制均由机器人主导以实现精确对齐;以及(3)力控由外科医生管理以确保安全。此外,为了在没有头骨夹的情况下进行患者跟踪,采用了一种光学追踪系统(OTS)。患者的下颌移动通过安装在牙合垫上的基于光学的追踪器来测量。引入了注册方法和机器人-OTS校准方法,在我们的框架内实现了可靠的导航。钻孔实验是在一个现实模型上进行的,结果显示计划与实际钻孔点之间的平均误差为1.85毫米。
https://arxiv.org/abs/2507.07794
Robust Visual SLAM (vSLAM) is essential for autonomous systems operating in real-world environments, where challenges such as dynamic objects, low texture, and critically, varying illumination conditions often degrade performance. Existing feature-based SLAM systems rely on fixed front-end parameters, making them vulnerable to sudden lighting changes and unstable feature tracking. To address these challenges, we propose ``IRAF-SLAM'', an Illumination-Robust and Adaptive Feature-Culling front-end designed to enhance vSLAM resilience in complex and challenging environments. Our approach introduces: (1) an image enhancement scheme to preprocess and adjust image quality under varying lighting conditions; (2) an adaptive feature extraction mechanism that dynamically adjusts detection sensitivity based on image entropy, pixel intensity, and gradient analysis; and (3) a feature culling strategy that filters out unreliable feature points using density distribution analysis and a lighting impact factor. Comprehensive evaluations on the TUM-VI and European Robotics Challenge (EuRoC) datasets demonstrate that IRAF-SLAM significantly reduces tracking failures and achieves superior trajectory accuracy compared to state-of-the-art vSLAM methods under adverse illumination conditions. These results highlight the effectiveness of adaptive front-end strategies in improving vSLAM robustness without incurring significant computational overhead. The implementation of IRAF-SLAM is publicly available at https://thanhnguyencanh. this http URL.
鲁棒的视觉同步定位与地图构建(vSLAM)对于在真实世界环境中运行的自主系统至关重要。动态物体、低纹理以及最关键的是光照条件的变化,常常会降低其性能。现有的基于特征的SLAM系统依赖于固定的前端参数设置,这使得它们对突然的照明变化和不稳定的特征跟踪变得脆弱。 为了解决这些问题,我们提出了“IRAF-SLAM”,这是一种设计用于提升vSLAM在复杂且挑战性环境中稳健性的光照鲁棒性和自适应特征剔除前端。我们的方法包括: 1. 一种图像增强方案,用于预处理并根据不同的光照条件调整图像质量; 2. 自适应的特征提取机制,该机制可以根据图像熵、像素强度和梯度分析动态地调整检测敏感性; 3. 基于密度分布分析和照明影响因子过滤不可靠特征点的特征剔除策略。 我们在TUM-VI和欧洲机器人挑战赛(EuRoC)数据集上进行了全面评估,结果表明IRAF-SLAM在不良光照条件下显著减少了跟踪失败,并且其轨迹准确性优于最先进的vSLAM方法。这些结果突显了自适应前端策略在不增加显著计算开销的情况下提升vSLAM鲁棒性方面的有效性。 IRAF-SLAM的实现代码公开可用,网址为https://thanhnguyencanh.github.io/iraf-slam/。
https://arxiv.org/abs/2507.07752
Despite their recent introduction to human society, Large Language Models (LLMs) have significantly affected the way we tackle mental challenges in our everyday lives. From optimizing our linguistic communication to assisting us in making important decisions, LLMs, such as ChatGPT, are notably reducing our cognitive load by gradually taking on an increasing share of our mental activities. In the context of Learning by Demonstration (LbD), classifying and segmenting complex motions into primitive actions, such as pushing, pulling, twisting etc, is considered to be a key-step towards encoding a task. In this work, we investigate the capabilities of LLMs to undertake this task, considering a finite set of predefined primitive actions found in fruit picking operations. By utilizing LLMs instead of simple supervised learning or analytic methods, we aim at making the method easily applicable and deployable in a real-life scenario. Three different fine-tuning approaches are investigated, compared on datasets captured kinesthetically, using a UR10e robot, during a fruit-picking scenario.
尽管大型语言模型(LLM)最近才被引入人类社会,它们已经在我们日常生活中解决心理挑战的方式上产生了重大影响。从优化我们的语言交流到帮助我们在重要决策中做出选择,像ChatGPT这样的LLM通过逐渐承担越来越多的心理活动,显著减轻了我们的认知负荷。 在基于示范学习(LbD)的背景下,将复杂的动作分类和分割为基本的动作单元,如推动、拉动、扭转等,被视为编码任务的关键步骤。在这项工作中,我们探讨了LLM执行此类任务的能力,并考虑了一个有限的基本动作集,这些动作是在水果采摘操作中发现的。通过使用LLM而不是简单的监督学习或分析方法,我们的目标是使这种方法在现实生活场景中易于应用和部署。 本文研究了三种不同的微调方法,在由UR10e机器人在水果采摘场景中捕捉到的数据集中进行比较和评估。
https://arxiv.org/abs/2507.07745
Robot swarms offer the potential to serve a variety of distributed sensing applications. An interesting real-world application that stands to benefit significantly from deployment of swarms is structural monitoring, where traditional sensor networks face challenges in structural coverage due to their static nature. This paper investigates the deployment of a swarm of miniaturized vibration sensing robots to inspect and localize structural damages on a surface section within a high-fidelity simulation environment. In particular, we consider a 1 m x 1 m x 3 mm steel surface section and utilize finite element analysis using Abaqus to obtain realistic structural vibration data. The resulting vibration data is imported into the physics-based robotic simulator Webots, where we simulate the dynamics of our surface inspecting robot swarm. We employ (i) Gaussian process estimators to guide the robots' exploration as they collect vibration samples across the surface and (ii) operational modal analysis to detect structural damages by estimating and comparing existing and intact structural vibration patterns. We analyze the influence of exploration radii on estimation uncertainty and assess the effectiveness of our method across 10 randomized scenarios, where the number, locations, surface area, and depth of structural damages vary. Our simulation studies validate the efficacy of our miniaturized robot swarm for vibration-based structural inspection.
机器人集群为分布式传感应用提供了潜在的服务能力。一个能显著受益于部署集群的有趣现实世界应用场景是结构监测,传统传感器网络由于其静态特性,在覆盖结构方面面临挑战。本文研究了在高保真模拟环境中使用微型振动感知机器人群体来检查和定位表面区域内的结构损坏的方法。具体来说,我们考虑了一个1米×1米×3毫米的钢铁表面板块,并利用Abaqus进行有限元分析以获取现实中的结构振动数据。获得的振动数据被导入基于物理的机器人模拟器Webots中,在这里我们模拟了我们在表面上进行检查的机器人集群的动力学特性。我们采用(i)高斯过程估计器来指导机器人在其收集表面振动样本的过程中探索,以及(ii)操作模态分析方法通过估算并比较现有和完好结构的振动模式来检测结构损坏。我们分析了探索半径对估计不确定性的影响力,并评估在10个随机化场景下的方法有效性,这些场景中结构损坏的数量、位置、表面积和深度各不相同。我们的模拟研究表明微型机器人集群在基于振动的结构检查方面是有效的。
https://arxiv.org/abs/2507.07724
The integration of high-level assistance algorithms in surgical robotics training curricula may be beneficial in establishing a more comprehensive and robust skillset for aspiring surgeons, improving their clinical performance as a consequence. This work presents the development and validation of a haptic-enhanced Virtual Reality simulator for surgical robotics training, featuring 8 surgical tasks that the trainee can interact with thanks to the embedded physics engine. This virtual simulated environment is augmented by the introduction of high-level haptic interfaces for robotic assistance that aim at re-directing the motion of the trainee's hands and wrists toward targets or away from obstacles, and providing a quantitative performance score after the execution of each training this http URL experimental study shows that the introduction of enhanced robotic assistance into a surgical robotics training curriculum improves performance during the training process and, crucially, promotes the transfer of the acquired skills to an unassisted surgical scenario, like the clinical one.
在手术机器人培训课程中整合高级辅助算法可能有助于为有志于成为外科医生的人士建立更加全面和稳固的技能集,从而提高他们的临床表现。本研究介绍了开发并验证了一种基于触觉增强虚拟现实(VR)模拟器用于手术机器人训练的过程,该模拟器包含8个可交互的手术任务,并内置了物理引擎以支持这些任务。此外,这一虚拟仿真环境通过引入高级触觉接口来增强手术机器人的辅助功能,这些接口旨在将学员的手腕动作引导到目标位置或避开障碍物,并在每次训练结束后提供定量的表现评分。 实验研究表明,在手术机器人培训课程中引入增强型机器人辅助系统可以提升受训期间的性能表现,并且关键的是,这有助于将所学技能转移到不需要额外辅助的真实临床环境中。
https://arxiv.org/abs/2507.07718