Object 6D pose estimation is a critical challenge in robotics, particularly for manipulation tasks. While prior research combining visual and tactile (visuotactile) information has shown promise, these approaches often struggle with generalization due to the limited availability of visuotactile data. In this paper, we introduce ViTa-Zero, a zero-shot visuotactile pose estimation framework. Our key innovation lies in leveraging a visual model as its backbone and performing feasibility checking and test-time optimization based on physical constraints derived from tactile and proprioceptive observations. Specifically, we model the gripper-object interaction as a spring-mass system, where tactile sensors induce attractive forces, and proprioception generates repulsive forces. We validate our framework through experiments on a real-world robot setup, demonstrating its effectiveness across representative visual backbones and manipulation scenarios, including grasping, object picking, and bimanual handover. Compared to the visual models, our approach overcomes some drastic failure modes while tracking the in-hand object pose. In our experiments, our approach shows an average increase of 55% in AUC of ADD-S and 60% in ADD, along with an 80% lower position error compared to FoundationPose.
六维(6D)姿态估计是机器人技术中的一个关键挑战,尤其是在执行抓取和操作任务时。尽管之前的研究将视觉信息与触觉(视听触觉)相结合的方法显示出了一定的潜力,但由于视触数据集有限的问题,这些方法在泛化能力上常常存在不足。本文介绍了ViTa-Zero框架,这是一个零样本学习下的视听触觉姿态估计框架。 我们的核心创新在于利用一个视觉模型作为主干,并基于来自触觉和本体感觉观察所推导出的物理约束来进行可行性检查及测试时优化。具体来说,我们将抓手-物体交互建模为弹簧质量系统,在此系统中,触觉传感器诱导吸引作用力,而本体感受则产生排斥作用力。 我们通过在实际机器人设置上的实验验证了该框架的有效性,展示出其对代表性视觉主干和操作场景(包括抓取、物体拾起及双臂交接)的适用性。与仅依赖视觉模型的方法相比,在跟踪手中物体姿态时,我们的方法克服了一些极端失败模式,并且在平均AUC增益方面,ADD-S提高了55%,ADD提高了60%,同时位置误差降低了80%(相较于FoundationPose)。
https://arxiv.org/abs/2504.13179
Visuomotor policies learned from teleoperated demonstrations face challenges such as lengthy data collection, high costs, and limited data diversity. Existing approaches address these issues by augmenting image observations in RGB space or employing Real-to-Sim-to-Real pipelines based on physical simulators. However, the former is constrained to 2D data augmentation, while the latter suffers from imprecise physical simulation caused by inaccurate geometric reconstruction. This paper introduces RoboSplat, a novel method that generates diverse, visually realistic demonstrations by directly manipulating 3D Gaussians. Specifically, we reconstruct the scene through 3D Gaussian Splatting (3DGS), directly edit the reconstructed scene, and augment data across six types of generalization with five techniques: 3D Gaussian replacement for varying object types, scene appearance, and robot embodiments; equivariant transformations for different object poses; visual attribute editing for various lighting conditions; novel view synthesis for new camera perspectives; and 3D content generation for diverse object types. Comprehensive real-world experiments demonstrate that RoboSplat significantly enhances the generalization of visuomotor policies under diverse disturbances. Notably, while policies trained on hundreds of real-world demonstrations with additional 2D data augmentation achieve an average success rate of 57.2%, RoboSplat attains 87.8% in one-shot settings across six types of generalization in the real world.
从远程操作演示中学到的视动策略面临诸如数据收集时间长、成本高和数据多样性有限等挑战。现有方法通过在RGB空间中增强图像观测或使用基于物理模拟器的Real-to-Sim-to-Real流水线来解决这些问题。然而,前者仅限于2D数据增强,而后者则因几何重建不准确而导致物理仿真不够精确。本文介绍了RoboSplat这一新方法,它能生成多样且视觉逼真的演示,通过直接操作3D高斯分布实现。具体来说,我们通过三维高斯点绘(3DGS)重构场景、直接编辑重构后的场景,并利用五种技术在六类泛化中进行数据增强:使用3D高斯替换改变对象类型、场景外观和机器人形态;使用等变变换以处理不同物体姿态的变化;采用视觉属性编辑来适应不同的光照条件;进行新视角合成以生成新的摄像机视图;以及通过三维内容生成实现多样的物体类型变化。全面的现实世界实验表明,RoboSplat显著提高了在各种扰动下的视动策略泛化能力。值得注意的是,在利用2D数据增强进行额外训练后,基于数百个真实世界演示学习到的策略平均成功率仅为57.2%,而使用RoboSplat在同一项设置下却实现了87.8%的成功率,这一性能跨越了六类泛化的测试环境。
https://arxiv.org/abs/2504.13175
We introduce a semidefinite relaxation for optimal control of linear systems with time scaling. These problems are inherently nonconvex, since the system dynamics involves bilinear products between the discretization time step and the system state and controls. The proposed relaxation is closely related to the standard second-order semidefinite relaxation for quadratic constraints, but we carefully select a subset of the possible bilinear terms and apply a change of variables to achieve empirically tight relaxations while keeping the computational load light. We further extend our method to handle piecewise-affine (PWA) systems by formulating the PWA optimal-control problem as a shortest-path problem in a graph of convex sets (GCS). In this GCS, different paths represent different mode sequences for the PWA system, and the convex sets model the relaxed dynamics within each mode. By combining a tight convex relaxation of the GCS problem with our semidefinite relaxation with time scaling, we can solve PWA optimal-control problems through a single semidefinite program.
我们提出了一种用于线性系统时间尺度最优控制的半定松弛方法。这些问题本质上是非凸问题,因为系统动力学包含了离散化时间步长与系统状态和控制之间的双线性乘积。所提出的松弛方法与标准二次约束的二阶半定松弛密切相关,但我们精心选择了一部分可能存在的双线性项,并通过变量变换来实现经验上较为紧致的松弛处理,同时保持计算负载较低。 我们进一步将该方法扩展到分段仿射(PWA)系统。为此,我们将PWA最优控制问题表述为凸集图(GCS)中的最短路径问题。在这一GCS中,不同的路径代表了PWA系统的不同模式序列,并且凸集模型描述了每种模式下的松弛动力学。 通过结合GCS问题的紧致凸松弛方法与我们的带有时间尺度调整的半定松弛方法,我们可以用单一的半定规划解决PWA最优控制问题。
https://arxiv.org/abs/2504.13170
A robot navigating an outdoor environment with no prior knowledge of the space must rely on its local sensing to perceive its surroundings and plan. This can come in the form of a local metric map or local policy with some fixed horizon. Beyond that, there is a fog of unknown space marked with some fixed cost. A limited planning horizon can often result in myopic decisions leading the robot off course or worse, into very difficult terrain. Ideally, we would like the robot to have full knowledge that can be orders of magnitude larger than a local cost map. In practice, this is intractable due to sparse sensing information and often computationally expensive. In this work, we make a key observation that long-range navigation only necessitates identifying good frontier directions for planning instead of full map knowledge. To this end, we propose Long Range Navigator (LRN), that learns an intermediate affordance representation mapping high-dimensional camera images to `affordable' frontiers for planning, and then optimizing for maximum alignment with the desired goal. LRN notably is trained entirely on unlabeled ego-centric videos making it easy to scale and adapt to new platforms. Through extensive off-road experiments on Spot and a Big Vehicle, we find that augmenting existing navigation stacks with LRN reduces human interventions at test-time and leads to faster decision making indicating the relevance of LRN. this https URL
在没有事先了解的空间中导航的机器人必须依赖于其局部感知来理解周围环境并规划路径。这可以通过局部度量地图或具有固定时间范围的局部策略实现。除此之外,存在一片未知空间的“迷雾”,这片区域被标记为有固定的成本。有限的计划视野常常会导致近视决策,导致机器人偏离预定路线,甚至进入难以通行的地形中。理想情况下,我们希望机器人能够拥有全面的知识,这可能比局部代价地图大几个数量级。然而,在实践中,由于稀疏的感知信息和高昂的计算成本,这是不可行的。 在这项工作中,我们做出一个重要观察:远距离导航只需要确定规划中的“可接近”前沿方向,而无需完整的地图知识。为此,我们提出了长程导航器(LRN),它通过学习一个中间的可操作性表示来实现这一目标——该表示将高维相机图像映射到用于规划的“可接受”的前进步骤,并优化与期望目标的最大一致性。值得注意的是,LRN完全基于未标记的第一人称视频进行训练,这使得它可以很容易地扩展和适应新的平台。 通过在Spot机器人(一种小型四足机器人)以及一辆大型车辆上的大量越野实验中发现,在现有导航系统中加入LRN可以减少测试时的人工干预,并且能更快地做出决策,表明了LRN的相关性和实用性。
https://arxiv.org/abs/2504.13149
Many soft robots struggle to produce dynamic motions with fast, large displacements. We develop a parallel 6 degree-of-freedom (DoF) Stewart-Gough mechanism using Handed Shearing Auxetic (HSA) actuators. By using soft actuators, we are able to use one third as many mechatronic components as a rigid Stewart platform, while retaining a working payload of 2kg and an open-loop bandwidth greater than 16Hx. We show that the platform is capable of both precise tracing and dynamic disturbance rejection when controlling a ball and sliding puck using a Proportional Integral Derivative (PID) controller. We develop a machine-learning-based kinematics model and demonstrate a functional workspace of roughly 10cm in each translation direction and 28 degrees in each orientation. This 6DoF device has many of the characteristics associated with rigid components - power, speed, and total workspace - while capturing the advantages of soft mechanisms.
许多软机器人在产生快速大位移的动态运动方面面临挑战。我们开发了一种使用左手剪切开孔(HSA)执行器的并行六自由度(DoF)Stewart-Gough机构。通过采用软执行器,我们可以将机电部件的数量减少到刚性Stewart平台所需数量的三分之一,同时保留2公斤的工作负载和超过16Hz的开环带宽。我们展示了该平台在使用比例积分微分(PID)控制器控制球体和平移曲棍时能够进行精确跟踪以及动态干扰抑制的能力。我们开发了一种基于机器学习的动力学模型,并证明了该平台在每个平移方向上的功能工作空间约为10厘米,每个定向轴上的工作范围为28度。这种六自由度设备具备刚性组件的许多特性——功率、速度和总体工作空间——同时捕捉到了软机制的优势。
https://arxiv.org/abs/2504.13127
Deep learning-based trajectory prediction models have demonstrated promising capabilities in capturing complex interactions. However, their out-of-distribution generalization remains a significant challenge, particularly due to unbalanced data and a lack of enough data and diversity to ensure robustness and calibration. To address this, we propose SHIFT (Spectral Heteroscedastic Informed Forecasting for Trajectories), a novel framework that uniquely combines well-calibrated uncertainty modeling with informative priors derived through automated rule extraction. SHIFT reformulates trajectory prediction as a classification task and employs heteroscedastic spectral-normalized Gaussian processes to effectively disentangle epistemic and aleatoric uncertainties. We learn informative priors from training labels, which are automatically generated from natural language driving rules, such as stop rules and drivability constraints, using a retrieval-augmented generation framework powered by a large language model. Extensive evaluations over the nuScenes dataset, including challenging low-data and cross-location scenarios, demonstrate that SHIFT outperforms state-of-the-art methods, achieving substantial gains in uncertainty calibration and displacement metrics. In particular, our model excels in complex scenarios, such as intersections, where uncertainty is inherently higher. Project page: this https URL.
基于深度学习的轨迹预测模型在捕捉复杂交互方面展示了令人鼓舞的能力,然而它们在外分布泛化(out-of-distribution generalization)上仍面临重大挑战,尤其是由于数据不平衡和缺乏足够的多样性和数据量以确保鲁棒性和校准。为了解决这个问题,我们提出了一种新的框架SHIFT(Spectral Heteroscedastic Informed Forecasting for Trajectories),该框架独特地结合了具有良好校准的不确定性建模与通过自动化规则提取生成的信息先验知识。SHIFT将轨迹预测重新定义为分类任务,并使用异方差谱归一化高斯过程有效地分离了认识论(epistemic)和算法(aleatoric)不确定性。我们从训练标签中学习到信息先验,这些先验是由大型语言模型驱动的检索增强生成框架自动从自然语言驾驶规则(如停止规则和可行驶约束)中生成的。在nuScenes数据集上进行广泛的评估,包括具有挑战性的低数据量和跨地点场景,证明了SHIFT优于现有最先进的方法,在不确定性校准和位移指标方面取得了显著改进。特别地,我们的模型在复杂场景(如交叉路口)中表现出色,这些场景中的不确定性本就较高。 项目页面:[此链接](https://this-url.com)
https://arxiv.org/abs/2504.13111
Modeling and control of nonlinear dynamics are critical in robotics, especially in scenarios with unpredictable external influences and complex dynamics. Traditional cascaded modular control pipelines often yield suboptimal performance due to conservative assumptions and tedious parameter tuning. Pure data-driven approaches promise robust performance but suffer from low sample efficiency, sim-to-real gaps, and reliance on extensive datasets. Hybrid methods combining learning-based and traditional model-based control in an end-to-end manner offer a promising alternative. This work presents a self-supervised learning framework combining learning-based inertial odometry (IO) module and differentiable model predictive control (d-MPC) for Unmanned Aerial Vehicle (UAV) attitude control. The IO denoises raw IMU measurements and predicts UAV attitudes, which are then optimized by MPC for control actions in a bi-level optimization (BLO) setup, where the inner MPC optimizes control actions and the upper level minimizes discrepancy between real-world and predicted performance. The framework is thus end-to-end and can be trained in a self-supervised manner. This approach combines the strength of learning-based perception with the interpretable model-based control. Results show the effectiveness even under strong wind. It can simultaneously enhance both the MPC parameter learning and IMU prediction performance.
在机器人技术中,非线性动态模型的建立与控制尤为重要,尤其是在存在不可预测外部影响和复杂动力学的情况下。传统级联模块化控制系统常常由于保守假设及繁琐的参数调整而表现欠佳。纯数据驱动的方法虽然可以提供鲁棒性能,但却面临样本效率低、仿真到现实差距大以及依赖大规模数据集的问题。结合学习型与基于模型的传统控制方法的混合方式为解决上述问题提供了有前景的选择。 本文介绍了一种自监督学习框架,该框架将基于学习的姿态惯性导航系统(IO)模块和可微分预测模型控制(d-MPC)相结合,用于无人飞行器(UAV)姿态控制。该IO模块能够对原始IMU数据进行去噪,并预测UAV的姿态,然后通过MPC优化得到的这些预测值来确定控制动作,在双层优化(BLO)结构中,内层的MPC负责优化控制策略,而外层则致力于缩小真实世界表现与预测性能之间的差距。整个框架具有端到端特性,并且能够以自监督的方式进行训练。这种结合了基于学习的感知技术和可解释模型控制系统优点的方法,在强风等恶劣条件下也表现出有效性。此外,它还可以同时提高MPC参数学习和IMU姿态预测的效果。
https://arxiv.org/abs/2504.13088
This paper presents a new task-space Non-singular Terminal Super-Twisting Sliding Mode (NT-STSM) controller with adaptive gains for robust trajectory tracking of a 7-DOF robotic manipulator. The proposed approach addresses the challenges of chattering, unknown disturbances, and rotational motion tracking, making it suited for high-DOF manipulators in dexterous manipulation tasks. A rigorous boundedness proof is provided, offering gain selection guidelines for practical implementation. Simulations and hardware experiments with external disturbances demonstrate the proposed controller's robust, accurate tracking with reduced control effort under unknown disturbances compared to other NT-STSM and conventional controllers. The results demonstrated that the proposed NT-STSM controller mitigates chattering and instability in complex motions, making it a viable solution for dexterous robotic manipulations and various industrial applications.
本文提出了一种新的任务空间非奇异终端超级扭转滑模(NT-STSM)控制器,该控制器具有自适应增益,适用于7自由度机械臂的鲁棒轨迹跟踪。所提出的方法解决了抖振、未知扰动和旋转运动跟踪等挑战,非常适合于灵巧操作任务中的高自由度机械臂。本文提供了严格的有界性证明,并为实际应用中增益的选择提供了指导原则。通过外部干扰情况下的仿真和硬件实验表明,与其它NT-STSM控制器及传统控制器相比,所提出的控制器在未知扰动条件下能够实现鲁棒、精确的跟踪并减少控制努力。实验结果表明,所提出的NT-STSM控制器有效地减轻了复杂运动中的抖振和不稳定问题,使其成为灵巧机器人操作以及各种工业应用中的一种可行解决方案。
https://arxiv.org/abs/2504.13056
This paper presents the Krysalis Hand, a five-finger robotic end-effector that combines a lightweight design, high payload capacity, and a high number of degrees of freedom (DoF) to enable dexterous manipulation in both industrial and research settings. This design integrates the actuators within the hand while maintaining an anthropomorphic form. Each finger joint features a self-locking mechanism that allows the hand to sustain large external forces without active motor engagement. This approach shifts the payload limitation from the motor strength to the mechanical strength of the hand, allowing the use of smaller, more cost-effective motors. With 18 DoF and weighing only 790 grams, the Krysalis Hand delivers an active squeezing force of 10 N per finger and supports a passive payload capacity exceeding 10 lbs. These characteristics make Krysalis Hand one of the lightest, strongest, and most dexterous robotic end-effectors of its kind. Experimental evaluations validate its ability to perform intricate manipulation tasks and handle heavy payloads, underscoring its potential for industrial applications as well as academic research. All code related to the Krysalis Hand, including control and teleoperation, is available on the project GitHub repository: this https URL
本文介绍了Krysalis手,这是一种五指机器人末端执行器,它结合了轻量化设计、高负载能力和大量自由度(DoF),能够在工业和研究环境中实现灵巧操作。该设计将驱动器集成到手中,同时保持人形外观。每个手指关节都配备了一个自锁机制,使手能够在外力作用下保持稳定而不需要主动电机参与。这种策略将承载能力的限制从电机强度转移到了手部机械结构上,从而可以使用更小、成本更低的电机。Krysalis手拥有18个自由度,重量仅为790克,每个手指能够提供10牛顿的主动捏合力,并且支持超过4.5公斤(约10磅)的被动负载能力。这些特点使Krysalis手成为同类中最轻、最强和最灵巧的机器人末端执行器之一。实验评估验证了其完成复杂操作任务及处理重载荷的能力,突显了它在工业应用以及学术研究中的潜力。所有与Krysalis手相关的代码(包括控制和遥操作系统)可在项目GitHub仓库中获取:[此链接](this https URL)
https://arxiv.org/abs/2504.12967
Tactile sensing is crucial for achieving human-level robotic capabilities in manipulation tasks. VBTSs have emerged as a promising solution, offering high spatial resolution and cost-effectiveness by sensing contact through camera-captured deformation patterns of elastic gel pads. However, these sensors' complex physical characteristics and visual signal processing requirements present unique challenges for robotic applications. The lack of efficient and accurate simulation tools for VBTS has significantly limited the scale and scope of tactile robotics research. Here we present Taccel, a high-performance simulation platform that integrates IPC and ABD to model robots, tactile sensors, and objects with both accuracy and unprecedented speed, achieving an 18-fold acceleration over real-time across thousands of parallel environments. Unlike previous simulators that operate at sub-real-time speeds with limited parallelization, Taccel provides precise physics simulation and realistic tactile signals while supporting flexible robot-sensor configurations through user-friendly APIs. Through extensive validation in object recognition, robotic grasping, and articulated object manipulation, we demonstrate precise simulation and successful sim-to-real transfer. These capabilities position Taccel as a powerful tool for scaling up tactile robotics research and development. By enabling large-scale simulation and experimentation with tactile sensing, Taccel accelerates the development of more capable robotic systems, potentially transforming how robots interact with and understand their physical environment.
触觉感知对于实现人类水平的机器人操作能力至关重要。VBTS(视觉捕获弹性凝胶垫变形模式的传感器)作为一种有前景的解决方案,因其通过摄像机捕捉弹性凝胶垫接触时的变形图案而提供了高空间分辨率和成本效益。然而,这些传感器复杂的物理特性和对视觉信号处理的要求为机器人应用带来了独特的挑战。缺乏高效的仿真工具来模拟VBTS显著限制了触觉机器人研究的发展规模和范围。在这里,我们介绍了Taccel,这是一个高性能仿真实验平台,它整合了IPC(图像处理组件)和ABD(高级行为驱动),能够以高精度和前所未有的速度对机器人、触觉传感器以及物体进行建模,在数千个并行环境中实现了比实时快18倍的加速。不同于之前在亚实时光速下运行且并行化有限的仿真器,Taccel提供了精确的物理模拟,并生成了逼真的触觉信号,同时通过用户友好的API支持灵活的机器人-传感器配置。通过对物体识别、机械手抓取和关节对象操作进行广泛的验证,我们展示了精准的模拟效果以及成功的从模拟到实际环境的应用转换能力。这些功能使Taccel成为扩大触觉机器人研究与开发规模的强大工具。通过启用大规模的模拟实验来探索触觉感知,Taccel加速了更高级别机器人系统的研发,有可能改变机器如何与其物理环境互动和理解的方式。
https://arxiv.org/abs/2504.12908
Achieving versatile and explosive motion with robustness against dynamic uncertainties is a challenging task. Introducing parallel compliance in quadrupedal design is deemed to enhance locomotion performance, which, however, makes the control task even harder. This work aims to address this challenge by proposing a general template model and establishing an efficient motion planning and control pipeline. To start, we propose a reduced-order template model-the dual-legged actuated spring-loaded inverted pendulum with trunk rotation-which explicitly models parallel compliance by decoupling spring effects from active motor actuation. With this template model, versatile acrobatic motions, such as pronking, froggy jumping, and hop-turn, are generated by a dual-layer trajectory optimization, where the singularity-free body rotation representation is taken into consideration. Integrated with a linear singularity-free tracking controller, enhanced quadrupedal locomotion is achieved. Comparisons with the existing template model reveal the improved accuracy and generalization of our model. Hardware experiments with a rigid quadruped and a newly designed compliant quadruped demonstrate that i) the template model enables generating versatile dynamic motion; ii) parallel elasticity enhances explosive motion. For example, the maximal pronking distance, hop-turn yaw angle, and froggy jumping distance increase at least by 25%, 15% and 25%, respectively; iii) parallel elasticity improves the robustness against dynamic uncertainties, including modelling errors and external disturbances. For example, the allowable support surface height variation increases by 100% for robust froggy jumping.
实现具备广泛适应性和爆发力的运动,并且能够抵御动态不确定性,是一项挑战性任务。在四足动物设计中引入并行柔顺性被认为可以提高行走性能,但这却使控制任务变得更加困难。本研究旨在通过提出一个通用模板模型并建立高效的运动规划和控制系统来解决这一问题。 首先,我们提出了一个降阶的模板模型——具有躯干旋转功能的双足驱动弹簧加载倒立摆(Dual-Legged Actuated Spring-Loaded Inverted Pendulum with Trunk Rotation)。该模型明确地通过分离弹簧效应与主动电机驱动作用来建模并行柔顺性。利用这个模板模型,诸如跳跃、蛙跳和转向跃等广泛的杂技动作可以通过双层轨迹优化生成,在此过程中考虑到了无奇点的体旋转表示。 结合线性的无奇点跟踪控制器后,实现了增强型四足动物行走性能。与现有的模板模型相比,我们的模型在准确性和泛化性方面表现出显著改进。通过刚性四足机器人和新设计的柔顺四足机器人的硬件实验发现: 1. 模板模型能够生成多样化的动态运动; 2. 并行弹性增强了爆发力动作的表现能力。例如,最大跳跃距离、转向跃偏航角度及蛙跳距离至少分别增加了25%、15%和25%; 3. 并行弹性提高了对抗动态不确定性(包括建模误差和外部干扰)的鲁棒性。例如,在稳健型蛙跳中允许的支持面高度变化增加了一倍。 这些结果表明,所提出的模板模型及控制策略在提高四足机器人运动性能方面具有显著优势,并为未来的研究提供了坚实的基础。
https://arxiv.org/abs/2504.12854
End-to-end autonomous driving aims to produce planning trajectories from raw sensors directly. Currently, most approaches integrate perception, prediction, and planning modules into a fully differentiable network, promising great scalability. However, these methods typically rely on deterministic modeling of online maps in the perception module for guiding or constraining vehicle planning, which may incorporate erroneous perception information and further compromise planning safety. To address this issue, we delve into the importance of online map uncertainty for enhancing autonomous driving safety and propose a novel paradigm named UncAD. Specifically, UncAD first estimates the uncertainty of the online map in the perception module. It then leverages the uncertainty to guide motion prediction and planning modules to produce multi-modal trajectories. Finally, to achieve safer autonomous driving, UncAD proposes an uncertainty-collision-aware planning selection strategy according to the online map uncertainty to evaluate and select the best trajectory. In this study, we incorporate UncAD into various state-of-the-art (SOTA) end-to-end methods. Experiments on the nuScenes dataset show that integrating UncAD, with only a 1.9% increase in parameters, can reduce collision rates by up to 26% and drivable area conflict rate by up to 42%. Codes, pre-trained models, and demo videos can be accessed at this https URL.
端到端的自动驾驶旨在直接从原始传感器数据生成规划轨迹。目前,大多数方法将感知、预测和规划模块整合为一个完全可微分的网络,以实现良好的扩展性。然而,这些方法通常依赖于感知模块中的在线地图的确定性建模来指导或约束车辆规划,这可能会引入错误的感知信息并进一步影响规划的安全性。为了应对这一问题,我们深入探讨了在线地图不确定性在增强自动驾驶安全性方面的重要性,并提出了一种名为UncAD的新范式。具体而言,UncAD首先估计感知模块中在线地图的不确定性。然后利用这种不确定性来指导运动预测和规划模块生成多模态轨迹。最后,为了实现更安全的自动驾驶,UncAD提出了根据在线地图不确定性评估并选择最佳路径的不确定性碰撞感知规划选择策略。 在本研究中,我们将UncAD整合到各种最先进的端到端方法中。实验结果表明,在nuScenes数据集上,集成UncAD仅增加1.9%的参数量便可以将碰撞率降低最多26%,可行驶区域冲突率减少最多42%。代码、预训练模型和演示视频可在以下网址访问:[此链接](请根据实际情况提供正确的URL)。
https://arxiv.org/abs/2504.12826
Autonomous driving is a complex undertaking. A common approach is to break down the driving task into individual subtasks through modularization. These sub-modules are usually developed and published separately. However, if these individually developed algorithms have to be combined again to form a full-stack autonomous driving software, this poses particular challenges. Drawing upon our practical experience in developing the software of TUM Autonomous Motorsport, we have identified and derived these challenges in developing an autonomous driving software stack within a scientific environment. We do not focus on the specific challenges of individual algorithms but on the general difficulties that arise when deploying research algorithms on real-world test vehicles. To overcome these challenges, we introduce strategies that have been effective in our development approach. We additionally provide open-source implementations that enable these concepts on GitHub. As a result, this paper's contributions will simplify future full-stack autonomous driving projects, which are essential for a thorough evaluation of the individual algorithms.
自动驾驶是一项复杂的任务。一种常见的方法是通过模块化将驾驶任务分解为单独的子任务。这些子模块通常分别开发和发布。然而,如果需要将这些独立开发的算法重新组合以形成一个完整的自动驾驶软件栈,则会面临特殊的挑战。基于我们在开发慕尼黑工业大学自主赛车队软件方面的实践经验,我们已经识别并总结了在科学研究环境中开发自动驾驶软件栈所面临的挑战。我们的重点不在于个别算法的具体挑战,而是在于将研究算法部署到实际测试车辆时出现的一般性难题。为了克服这些挑战,我们介绍了一些在我们开发方法中证明有效的策略,并且还提供了开源实现,使这些概念可以在GitHub上运行。因此,本文的贡献将简化未来的全栈自动驾驶项目,这对于全面评估各个独立算法至关重要。
https://arxiv.org/abs/2504.12813
In this work, we present a novel approach to bias the driving style of an artificial race driver (ARD) for online time-optimal trajectory planning. Our method leverages a nonlinear model predictive control (MPC) framework that combines time minimization with exit speed maximization at the end of the planning horizon. We introduce a new MPC terminal cost formulation based on the trajectory planned in the previous MPC step, enabling ARD to adapt its driving style from early to late apex maneuvers in real-time. Our approach is computationally efficient, allowing for low replan times and long planning horizons. We validate our method through simulations, comparing the results against offline minimum-lap-time (MLT) optimal control and online minimum-time MPC solutions. The results demonstrate that our new terminal cost enables ARD to bias its driving style, and achieve online lap times close to the MLT solution and faster than the minimum-time MPC solution. Our approach paves the way for a better understanding of the reasons behind human drivers' choice of early or late apex maneuvers.
在这项工作中,我们提出了一种新颖的方法来调整人工赛车手(ARD)的驾驶风格,用于在线时间最优化轨迹规划。我们的方法利用了一个非线性模型预测控制(MPC)框架,该框架结合了时间最小化与规划视窗末尾处的最大出口速度最大化。我们引入了一种新的基于前一步MPC所计划轨迹的MPC终端成本公式,使ARD能够在实时中从早期到晚期顶点操作适应其驾驶风格的变化。我们的方法在计算上是高效的,允许低重新规划时间和长规划视窗。 通过模拟验证了我们的方法,并将其结果与离线最短圈速(MLT)最优控制和在线最小时间MPC解决方案进行了比较。结果显示,我们新的终端成本使ARD能够调整其驾驶风格,并且在线完成圈数的时间接近于MLT解决方案,同时比最小时间MPC解决方案更快。 这种方法为更好地理解人类驾驶员选择早顶点或晚顶点操作的原因铺平了道路。
https://arxiv.org/abs/2504.12744
B* is a novel optimization framework that addresses a critical challenge in fixed-base manipulator robotics: optimal base placement. Current methods rely on pre-computed kinematics databases generated through sampling to search for solutions. However, they face an inherent trade-off between solution optimality and computational efficiency when determining sampling resolution. To address these limitations, B* unifies multiple objectives without database dependence. The framework employs a two-layer hierarchical approach. The outer layer systematically manages terminal constraints through progressive tightening, particularly for base mobility, enabling feasible initialization and broad solution exploration. The inner layer addresses non-convexities in each outer-layer subproblem through sequential local linearization, converting the original problem into tractable sequential linear programming (SLP). Testing across multiple robot platforms demonstrates B*'s effectiveness. The framework achieves solution optimality five orders of magnitude better than sampling-based approaches while maintaining perfect success rates and reduced computational overhead. Operating directly in configuration space, B* enables simultaneous path planning with customizable optimization criteria. B* serves as a crucial initialization tool that bridges the gap between theoretical motion planning and practical deployment, where feasible trajectory existence is fundamental.
B* 是一种新颖的优化框架,旨在解决固定基座机器人操作器中的关键挑战:最优基座位置。当前的方法依赖于通过采样生成的预计算运动学数据库来寻找解决方案。然而,在确定采样分辨率时,这些方法在解的最优性和计算效率之间存在内在权衡。为了克服这些限制,B* 在不依赖数据库的情况下统一了多个目标。该框架采用两层分层方法:外层系统地管理终端约束,并通过逐步收紧特别是对于基座移动性的处理,实现可行的初始化和广泛的解决方案探索;内层则通过对每个外层子问题进行连续局部线性化来解决非凸性问题,从而将原始问题转化为可解的序列线性规划(SLP)。 在多个机器人平台上的测试展示了B*的有效性。该框架实现了比采样方法高出五倍数量级的解决方案最优性,并保持了完美的成功率和减少的计算开销。直接在配置空间中操作,B* 使得同时进行路径规划并自定义优化标准成为可能。作为重要的初始化工具,B* 桥接了理论运动规划与实际部署之间的差距,在可行轨迹的存在方面至关重要。
https://arxiv.org/abs/2504.12719
The development of artificial intelligence towards real-time interaction with the environment is a key aspect of embodied intelligence and robotics. Inverse dynamics is a fundamental robotics problem, which maps from joint space to torque space of robotic systems. Traditional methods for solving it rely on direct physical modeling of robots which is difficult or even impossible due to nonlinearity and external disturbance. Recently, data-based model-learning algorithms are adopted to address this issue. However, they often require manual parameter tuning and high computational costs. Neuromorphic computing is inherently suitable to process spatiotemporal features in robot motion control at extremely low costs. However, current research is still in its infancy: existing works control only low-degree-of-freedom systems and lack performance quantification and comparison. In this paper, we propose a neuromorphic control framework to control 7 degree-of-freedom robotic manipulators. We use Spiking Neural Network to leverage the spatiotemporal continuity of the motion data to improve control accuracy, and eliminate manual parameters tuning. We validated the algorithm on two robotic platforms, which reduces torque prediction error by at least 60% and performs a target position tracking task successfully. This work advances embodied neuromorphic control by one step forward from proof of concept to applications in complex real-world tasks.
人工智能向环境实时互动的发展是具身智能和机器人技术的关键方面。逆动力学问题是机器人技术中的一个基本问题,它从关节空间映射到机器人的扭矩空间。传统的方法依赖于对机器人的直接物理建模来解决这个问题,但由于非线性和外部干扰的存在,这种建模往往很难甚至不可能实现。近年来,基于数据的模型学习算法被用来应对这一挑战。然而,这些方法通常需要手动参数调整,并且计算成本高昂。神经形态计算天然适合在机器人运动控制中以极低的成本处理时空特征。但是,目前的研究仍处于初级阶段:现有的工作仅能控制自由度较低的系统,并且缺乏性能量化和比较。 本文提出了一种用于控制具有7个自由度机械臂的神经形态控制框架。我们使用脉冲神经网络来利用运动数据中的时空连续性以提高控制精度,并消除手动参数调整的需求。我们在两个机器人平台上验证了该算法,减少了至少60%的扭矩预测误差,并成功完成了一个目标位置跟踪任务。这项工作使具身神经形态控制从概念证明迈向复杂现实世界应用迈出了重要的一步。
https://arxiv.org/abs/2504.12702
This paper proposes a genetic algorithm-based kinodynamic planning algorithm (GAKD) for car-like vehicles navigating uneven terrains modeled as triangular meshes. The algorithm's distinct feature is trajectory optimization over a fixed-length receding horizon using a genetic algorithm with heuristic-based mutation, ensuring the vehicle's controls remain within its valid operational range. By addressing challenges posed by uneven terrain meshes, such as changing face normals, GAKD offers a practical solution for path planning in complex environments. Comparative evaluations against Model Predictive Path Integral (MPPI) and log-MPPI methods show that GAKD achieves up to 20 percent improvement in traversability cost while maintaining comparable path length. These results demonstrate GAKD's potential in improving vehicle navigation on challenging terrains.
本文提出了一种基于遗传算法的动力学规划算法(GAKD),用于在模拟为三角网格的不平整地形上导航的类似汽车的车辆。该算法的独特之处在于它使用带有启发式突变的遗传算法,在固定长度的时间范围内进行轨迹优化,确保车辆控制始终在其有效操作范围内。通过解决由不规则地形网格带来的挑战(如变化的面法线),GAKD为复杂环境中的路径规划提供了实用解决方案。与模型预测路径积分(MPPI)和对数MPPI方法相比,评估结果表明,GAKD在保持类似路径长度的同时,可以使通达成本降低高达20%。这些结果显示了GAKD在改善车辆在困难地形导航方面的能力。
https://arxiv.org/abs/2504.12678
This paper presents a novel autonomous drone-based smoke plume tracking system capable of navigating and tracking plumes in highly unsteady atmospheric conditions. The system integrates advanced hardware and software and a comprehensive simulation environment to ensure robust performance in controlled and real-world settings. The quadrotor, equipped with a high-resolution imaging system and an advanced onboard computing unit, performs precise maneuvers while accurately detecting and tracking dynamic smoke plumes under fluctuating conditions. Our software implements a two-phase flight operation, i.e., descending into the smoke plume upon detection and continuously monitoring the smoke movement during in-plume tracking. Leveraging Proportional Integral-Derivative (PID) control and a Proximal Policy Optimization based Deep Reinforcement Learning (DRL) controller enables adaptation to plume dynamics. Unreal Engine simulation evaluates performance under various smoke-wind scenarios, from steady flow to complex, unsteady fluctuations, showing that while the PID controller performs adequately in simpler scenarios, the DRL-based controller excels in more challenging environments. Field tests corroborate these findings. This system opens new possibilities for drone-based monitoring in areas like wildfire management and air quality assessment. The successful integration of DRL for real-time decision-making advances autonomous drone control for dynamic environments.
本文介绍了一种新颖的自主无人机烟羽追踪系统,该系统能够在高度不稳定的大气条件下导航和跟踪烟羽。该系统集成了先进的硬件与软件,并且包括一个全面的模拟环境,以确保在控制和现实世界设置中均能实现稳健性能。四旋翼飞行器配备了高分辨率成像系统和高级机载计算单元,在不断变化的情况下能够执行精确操作并准确地检测和跟踪动态烟羽。 我们的软件实施了两个阶段的飞行操作:即探测到烟羽后下降进入烟羽,并在进入烟羽后持续监测烟雾运动。通过利用比例积分微分(PID)控制以及基于近端策略优化(Proximal Policy Optimization)的深度强化学习(DRL)控制器,使系统能够适应烟羽动态变化。 借助Unreal Engine模拟器,在各种烟雾-风环境场景下评估了系统的性能,从稳定的气流到复杂的不稳定性波动。结果显示:虽然PID控制器在简单情况下表现良好,但基于DRL的控制器在更复杂和具有挑战性的环境中表现出色。实地测试验证了这些发现。 该系统为无人机监测开辟了新的可能性,特别是在野火管理和空气质量评估等领域。将深度强化学习成功集成到实时决策制定中,有助于自主无人机控制在动态环境中的发展与应用。
https://arxiv.org/abs/2504.12664
Robotic manipulation faces critical challenges in understanding spatial affordances--the "where" and "how" of object interactions--essential for complex manipulation tasks like wiping a board or stacking objects. Existing methods, including modular-based and end-to-end approaches, often lack robust spatial reasoning capabilities. Unlike recent point-based and flow-based affordance methods that focus on dense spatial representations or trajectory modeling, we propose A0, a hierarchical affordance-aware diffusion model that decomposes manipulation tasks into high-level spatial affordance understanding and low-level action execution. A0 leverages the Embodiment-Agnostic Affordance Representation, which captures object-centric spatial affordances by predicting contact points and post-contact trajectories. A0 is pre-trained on 1 million contact points data and fine-tuned on annotated trajectories, enabling generalization across platforms. Key components include Position Offset Attention for motion-aware feature extraction and a Spatial Information Aggregation Layer for precise coordinate mapping. The model's output is executed by the action execution module. Experiments on multiple robotic systems (Franka, Kinova, Realman, and Dobot) demonstrate A0's superior performance in complex tasks, showcasing its efficiency, flexibility, and real-world applicability.
机器人操作在理解空间可及性(即物体交互的“何地”和“如何”)方面面临重大挑战,这对于像擦拭板子或堆叠物品这样的复杂任务至关重要。现有的方法,包括模块化方法和端到端方法,通常缺乏强大的空间推理能力。与最近基于点的方法和基于流的方法不同,这些方法专注于密集的空间表示或轨迹建模,我们提出了一种层次化的、具有感知性的扩散模型A0,它将操作任务分解为高层次的空间可及性理解和低层次的动作执行。A0利用了无实体依赖的可及性表示,通过预测接触点和接触后的轨迹来捕捉以物体为中心的空间可及性。该模型在100万个接触点的数据上进行了预训练,并在标注过的轨迹数据上进行微调,从而可以在不同的平台上实现泛化。其关键组件包括位置偏移注意机制(用于运动感知特征提取)以及空间信息聚合层(用于精确坐标映射)。A0的输出由动作执行模块负责执行。 实验结果显示,在多个机器人系统(Franka、Kinova、Realman和Dobot)上,A0在复杂任务中的表现优于现有方法,展示了其效率、灵活性和现实世界应用的能力。
https://arxiv.org/abs/2504.12636
Safe and efficient path planning in parking scenarios presents a significant challenge due to the presence of cluttered environments filled with static and dynamic obstacles. To address this, we propose a novel and computationally efficient planning strategy that seamlessly integrates the predictions of dynamic obstacles into the planning process, ensuring the generation of collision-free paths. Our approach builds upon the conventional Hybrid A star algorithm by introducing a time-indexed variant that explicitly accounts for the predictions of dynamic obstacles during node exploration in the graph, thus enabling dynamic obstacle avoidance. We integrate the time-indexed Hybrid A star algorithm within an online planning framework to compute local paths at each planning step, guided by an adaptively chosen intermediate goal. The proposed method is validated in diverse parking scenarios, including perpendicular, angled, and parallel parking. Through simulations, we showcase our approach's potential in greatly improving the efficiency and safety when compared to the state of the art spline-based planning method for parking situations.
在停车场景中,安全且高效的路径规划面临着巨大挑战,由于这些环境中充斥着静态和动态障碍物。为了解决这个问题,我们提出了一种新颖且计算效率高的规划策略,该策略将对动态障碍物的预测无缝地集成到规划过程中,确保生成无碰撞路径。我们的方法基于传统的混合A*(Hybrid A*)算法,并引入了一个时间索引变体,在图中节点探索阶段显式考虑动态障碍物的预测,从而实现动态避障功能。我们将时间索引混合A*算法整合到在线规划框架中,以在每次规划步骤时计算局部路径,并根据自适应选择的中间目标进行引导。所提出的方法已在多种停车场景(包括垂直、倾斜和并行停车)中得到验证。通过模拟实验,我们展示了与当前最先进的基于样条曲线的停车情况规划方法相比,我们的方法在效率和安全性方面具有巨大改进潜力。
https://arxiv.org/abs/2504.12616