Robots can influence people to accomplish their tasks more efficiently: autonomous cars can inch forward at an intersection to pass through, and tabletop manipulators can go for an object on the table first. However, a robot's ability to influence can also compromise the safety of nearby people if naively executed. In this work, we pose and solve a novel robust reach-avoid dynamic game which enables robots to be maximally influential, but only when a safety backup control exists. On the human side, we model the human's behavior as goal-driven but conditioned on the robot's plan, enabling us to capture influence. On the robot side, we solve the dynamic game in the joint physical and belief space, enabling the robot to reason about how its uncertainty in human behavior will evolve over time. We instantiate our method, called SLIDE (Safely Leveraging Influence in Dynamic Environments), in a high-dimensional (39-D) simulated human-robot collaborative manipulation task solved via offline game-theoretic reinforcement learning. We compare our approach to a robust baseline that treats the human as a worst-case adversary, a safety controller that does not explicitly reason about influence, and an energy-function-based safety shield. We find that SLIDE consistently enables the robot to leverage the influence it has on the human when it is safe to do so, ultimately allowing the robot to be less conservative while still ensuring a high safety rate during task execution.
机器人可以更有效地影响人们以完成任务:自动驾驶汽车可以在路口逐步前进以通过,而桌面操作器可以首先尝试桌子上的物体。然而,机器人影响力的能力也可能危及附近人的安全,如果盲目执行。在这项工作中,我们提出并解决了新颖的鲁棒到达避免动态游戏,使机器人只有在存在安全备份控制时才能发挥最大影响力。在人类方面,我们将人类的行为建模为以目标为导向,但受到机器人计划的条件限制,使我们能够捕捉影响力。在机器人方面,我们在物理和信念空间中解决了动态游戏,使机器人能够关于其对人类行为的不确定性如何随时间演变进行思考。我们通过高维(39-D)的模拟人类-机器人协同操作任务,使用离线游戏理论强化学习来解决该任务,实例化我们的方法,称为SLIDE(在动态环境中安全利用影响力)。我们比较了我们的方法与将人类视为最坏情况 adversary 的稳健基线、不明确考虑影响力的安全控制器以及基于能量函数的安全护盾之间的差异。我们发现,SLIDE 能够使机器人安全地利用其对人类的影响力,从而在执行任务时允许机器人更加保守,但在任务执行过程中仍能确保高安全率。
https://arxiv.org/abs/2409.12153
Re-identification (ReID) is a critical challenge in computer vision, predominantly studied in the context of pedestrians and vehicles. However, robust object-instance ReID, which has significant implications for tasks such as autonomous exploration, long-term perception, and scene understanding, remains underexplored. In this work, we address this gap by proposing a novel dual-path object-instance re-identification transformer architecture that integrates multimodal RGB and depth information. By leveraging depth data, we demonstrate improvements in ReID across scenes that are cluttered or have varying illumination conditions. Additionally, we develop a ReID-based localization framework that enables accurate camera localization and pose identification across different viewpoints. We validate our methods using two custom-built RGB-D datasets, as well as multiple sequences from the open-source TUM RGB-D datasets. Our approach demonstrates significant improvements in both object instance ReID (mAP of 75.18) and localization accuracy (success rate of 83% on TUM-RGBD), highlighting the essential role of object ReID in advancing robotic perception. Our models, frameworks, and datasets have been made publicly available.
重新识别(Re-ID)是计算机视觉领域的一个关键挑战,主要研究在行人或车辆的背景下。然而,稳健的物体实例Re-ID,其在自动驾驶探索、长期感知和场景理解等任务中具有重要的影响,仍然没有被深入研究。在这项工作中,我们通过提出一种新颖的双路径物体实例Re-ID转换器架构来解决这一空白。该架构整合了多模态的RGB和深度信息。通过利用深度数据,我们证明了在复杂场景或具有不同照明条件的情况下,Re-ID的改善。此外,我们还开发了一个基于Re-ID的局部定位框架,可以准确地定位和识别不同视角下的相机。我们使用两个自定义的RGB-D数据集以及来自开源TUM RGB-D数据集的多序列进行验证。我们的方法在物体实例Re-ID(mAP为75.18)和局部定位精度(在TUM-RGBD上的成功率为83%)方面都取得了显著的改进,突显了物体Re-ID在推动机器人感知方面的重要性。我们的模型、框架和数据集已经公开发布。
https://arxiv.org/abs/2409.12002
To address the intricate challenges of decentralized cooperative scheduling and motion planning in Autonomous Mobility-on-Demand (AMoD) systems, this paper introduces LMMCoDrive, a novel cooperative driving framework that leverages a Large Multimodal Model (LMM) to enhance traffic efficiency in dynamic urban environments. This framework seamlessly integrates scheduling and motion planning processes to ensure the effective operation of Cooperative Autonomous Vehicles (CAVs). The spatial relationship between CAVs and passenger requests is abstracted into a Bird's-Eye View (BEV) to fully exploit the potential of the LMM. Besides, trajectories are cautiously refined for each CAV while ensuring collision avoidance through safety constraints. A decentralized optimization strategy, facilitated by the Alternating Direction Method of Multipliers (ADMM) within the LMM framework, is proposed to drive the graph evolution of CAVs. Simulation results demonstrate the pivotal role and significant impact of LMM in optimizing CAV scheduling and enhancing decentralized cooperative optimization process for each vehicle. This marks a substantial stride towards achieving practical, efficient, and safe AMoD systems that are poised to revolutionize urban transportation. The code is available at this https URL.
为解决自主移动需求系统(AMoD)中分布式协作调度和运动规划的复杂挑战,本文引入了LMMCoDrive,一种新颖的协作驾驶框架,利用大型多模态模型(LMM)在动态城市环境中提高交通效率。该框架将调度和运动规划过程无缝集成,确保协作自动驾驶车辆(CAVs)的有效运行。将CAV与乘客需求的地理关系抽象成鸟瞰图(BEV),充分发掘LMM的潜力。此外,在确保碰撞避免的安全约束条件下,为每个CAV精细优化轨迹。 在LMM框架内,由交替方向乘子法(ADMM)推动的分布式优化策略被提出,以驱动CAV的图进化。仿真结果表明,LMM在优化CAV调度和提高每个车辆的分布式合作优化过程方面具有关键作用和重大影响。这标志着朝着实现实用、高效和安全的AMoD系统迈出了重要一步,这些系统有潜力彻底颠覆城市交通。代码可在此处访问:https://www.url.
https://arxiv.org/abs/2409.11981
End-to-end models are emerging as the mainstream in autonomous driving perception. However, the inability to meticulously deconstruct their internal mechanisms results in diminished development efficacy and impedes the establishment of trust. Pioneering in the issue, we present the Independent Functional Module Evaluation for Bird's-Eye-View Perception Model (BEV-IFME), a novel framework that juxtaposes the module's feature maps against Ground Truth within a unified semantic Representation Space to quantify their similarity, thereby assessing the training maturity of individual functional modules. The core of the framework lies in the process of feature map encoding and representation aligning, facilitated by our proposed two-stage Alignment AutoEncoder, which ensures the preservation of salient information and the consistency of feature structure. The metric for evaluating the training maturity of functional modules, Similarity Score, demonstrates a robust positive correlation with BEV metrics, with an average correlation coefficient of 0.9387, attesting to the framework's reliability for assessment purposes.
端到端模型在自动驾驶感知中正逐渐成为主流。然而,无法详细分解其内部机制导致开发效果减弱,并阻碍了信任的建立。在这些问题上领先一步,我们提出了一个名为独立功能模块评估的模型(BEV-IFME),这是一种将鸟瞰感知模型的模块特征图在统一语义表示空间中与真实情况下的地面真相对比来量化其相似性的新框架,从而评估个人功能模块的训练成熟度。框架的核心在于特征图编码和表示的同步过程,通过我们提出的两阶段对齐自编码器确保保留突出信息并保持特征结构的一致性。评估功能模块训练成熟度的指标,相似度分数,与BEV指标之间展现出稳健的正相关关系,平均相关系数为0.9387,证明了该框架在评估目的上的可靠性。
https://arxiv.org/abs/2409.11969
The problem of safety for robotic systems has been extensively studied. However, little attention has been given to security issues for three-dimensional systems, such as quadrotors. Malicious adversaries can compromise robot sensors and communication networks, causing incidents, achieving illegal objectives, or even injuring people. This study first designs an intelligent control system for autonomous quadrotors. Then, it investigates the problems of optimal false data injection attack scheduling and countermeasure design for unmanned aerial vehicles. Using a state-of-the-art deep learning-based approach, an optimal false data injection attack scheme is proposed to deteriorate a quadrotor's tracking performance with limited attack energy. Subsequently, an optimal tracking control strategy is learned to mitigate attacks and recover the quadrotor's tracking performance. We base our work on Agilicious, a state-of-the-art quadrotor recently deployed for autonomous settings. This paper is the first in the United Kingdom to deploy this quadrotor and implement reinforcement learning on its platform. Therefore, to promote easy reproducibility with minimal engineering overhead, we further provide (1) a comprehensive breakdown of this quadrotor, including software stacks and hardware alternatives; (2) a detailed reinforcement-learning framework to train autonomous controllers on Agilicious agents; and (3) a new open-source environment that builds upon PyFlyt for future reinforcement learning research on Agilicious platforms. Both simulated and real-world experiments are conducted to show the effectiveness of the proposed frameworks in section 5.2.
机器机器人系统的安全性问题已经得到了广泛研究。然而,对于三维系统(如四旋翼)的安全性问题,关注较少。恶意攻击者可能攻击机器人传感器和通信网络,导致事故、实现非法目标或甚至伤害人员。本研究首先为自主四旋翼设计了智能控制系统。然后,研究了无人机上最优假数据注入攻击调度问题和反制设计问题。采用最先进的深度学习方法,提出了用有限攻击能量恶化四旋翼跟踪性能的最优假数据注入攻击方案。接着,学习最优跟踪控制策略以减轻攻击并恢复四旋翼跟踪性能。我们的工作基于最新部署的智能四旋翼Agilicious,这是英国首个在自主环境中部署的智能四旋翼,并在其平台上实现了强化学习。因此,为了通过最小化工程开发生度促进易于重复,我们进一步提供了(1)对Agilicious的全面拆分,包括软件堆栈和硬件替代方案;(2)用于在Agilicious代理上训练自主控制器的详细强化学习框架;(3)利用PyFlyt构建未来Agilicious平台上的强化学习研究的新开源环境。第5.2节中的模拟和现实世界实验都进行了研究,以证明所提出的框架的有效性。
https://arxiv.org/abs/2409.11897
In our previous research, we provided a reasoning system (called LeSAC) based on argumentation theory to provide legal support to designers during the design process. Building on this, this paper explores how to provide designers with effective explanations for their legally relevant design decisions. We extend the previous system for providing explanations by specifying norms and the key legal or ethical principles for justifying actions in normative contexts. Considering that first-order logic has strong expressive power, in the current paper we adopt a first-order deontic logic system with deontic operators and preferences. We illustrate the advantages and necessity of introducing deontic logic and designing explanations under LeSAC by modelling two cases in the context of autonomous driving. In particular, this paper also discusses the requirements of the updated LeSAC to guarantee rationality, and proves that a well-defined LeSAC can satisfy the rationality postulate for rule-based argumentation frameworks. This ensures the system's ability to provide coherent, legally valid explanations for complex design decisions.
在我们之前的研究中,我们提供了一个基于论证理论的推理系统(称为LeSAC),以在设计过程中为设计师提供法律支持。在此基础上,本文探讨了如何为设计师提供有关其法律相关设计决策的有效解释。我们通过指定规范和规范性上下文中的关键法律或伦理原则来扩展之前系统提供解释的功能。考虑到第一级逻辑具有很强的表达力,在本文中,我们采用具有道德操作符和偏好的一级义务逻辑系统。通过在自动驾驶领域中建模两个案例,我们说明了引入道德逻辑和设计解释在LeSAC中的优势和必要性。特别,本文还讨论了更新后的LeSAC需要满足的保证理性的要求,并证明了一个定义明确的一级义务逻辑系统可以满足基于规则的论证框架的理据。这确保了系统在复杂的设计决策中提供连贯、合法的解释的能力。
https://arxiv.org/abs/2409.11780
Recently, AI systems have made remarkable progress in various tasks. Deep Reinforcement Learning(DRL) is an effective tool for agents to learn policies in low-level state spaces to solve highly complex tasks. Researchers have introduced Intrinsic Motivation(IM) to the RL mechanism, which simulates the agent's curiosity, encouraging agents to explore interesting areas of the environment. This new feature has proved vital in enabling agents to learn policies without being given specific goals. However, even though DRL intelligence emerges through a sub-symbolic model, there is still a need for a sort of abstraction to understand the knowledge collected by the agent. To this end, the classical planning formalism has been used in recent research to explicitly represent the knowledge an autonomous agent acquires and effectively reach extrinsic goals. Despite classical planning usually presents limited expressive capabilities, PPDDL demonstrated usefulness in reviewing the knowledge gathered by an autonomous system, making explicit causal correlations, and can be exploited to find a plan to reach any state the agent faces during its experience. This work presents a new architecture implementing an open-ended learning system able to synthesize from scratch its experience into a PPDDL representation and update it over time. Without a predefined set of goals and tasks, the system integrates intrinsic motivations to explore the environment in a self-directed way, exploiting the high-level knowledge acquired during its experience. The system explores the environment and iteratively: (a) discover options, (b) explore the environment using options, (c) abstract the knowledge collected and (d) plan. This paper proposes an alternative approach to implementing open-ended learning architectures exploiting low-level and high-level representations to extend its knowledge in a virtuous loop.
近年来,AI系统在各种任务上取得了显著的进步。深度强化学习(DRL)是一种有效的工具,使智能体在低级状态空间中学习策略,以解决高度复杂的任务。研究人员引入了内生动机(IM)到强化学习(RL)机制中,模拟了智能体的好奇心,鼓励智能体探索环境中的有趣区域。这种新特性已经在使智能体在没有具体目标的情况下学习策略方面证明至关重要。然而,尽管DRL智能是通过子符号模型出现的,但仍然需要某种抽象来理解智能体收集到的知识。为此,在最近的研究中,经典规划形式被用于明确表示智能体获得的知識,并有效地达到外部的目标。尽管经典规划通常具有有限的表达能力,但PPDDL在回顾智能体收集到的知识以及明确因果关系方面表现出了有效性,并可以被用于找到智能体在经历其经验时面临的任何状态的规划方案。这项工作提出了一种新的架构,实现了一个自定义的学习系统,可以从零开始合成其经验并随时间更新。在没有预定义的目标和任务的情况下,系统通过内生动机以自导向的方式探索环境,利用其在经验中获得的先进知识。系统探索环境并递归执行:(a)发现选项,(b) 使用选项探索环境,(c) 抽象收集到的知识,(d) 规划。本文提出了利用低级和高级表示来扩展其知识以实现美德循环的另一种实现开放性学习架构的方法。
https://arxiv.org/abs/2409.11756
Multi-camera perception methods in Bird's-Eye-View (BEV) have gained wide application in autonomous driving. However, due to the differences between roadside and vehicle-side scenarios, there currently lacks a multi-camera BEV solution in roadside. This paper systematically analyzes the key challenges in multi-camera BEV perception for roadside scenarios compared to vehicle-side. These challenges include the diversity in camera poses, the uncertainty in Camera numbers, the sparsity in perception regions, and the ambiguity in orientation angles. In response, we introduce RopeBEV, the first dense multi-camera BEV approach. RopeBEV introduces BEV augmentation to address the training balance issues caused by diverse camera poses. By incorporating CamMask and ROIMask (Region of Interest Mask), it supports variable camera numbers and sparse perception, respectively. Finally, camera rotation embedding is utilized to resolve orientation ambiguity. Our method ranks 1st on the real-world highway dataset RoScenes and demonstrates its practical value on a private urban dataset that covers more than 50 intersections and 600 cameras.
总的来说,本文系统地分析了多相机视野(BEV)在道路场景中与车辆侧场景之间的关键挑战。这些挑战包括相机姿态的多样性、相机数量的不确定性、感知区域的稀疏性和方向角的模糊性。为了应对这些挑战,我们引入了RopeBEV,这是第一个密集的多相机BEV方法。RopeBEV通过引入BEV增强来解决由于不同相机姿态而引起的训练平衡问题。通过包含CamMask和ROIMask(区域兴趣掩码),它支持变量的相机数量和稀疏的感知。最后,通过使用相机旋转嵌入来解决方向角的模糊性。在现实世界的highway数据集RoScenes上,我们的方法排名第1,证明了其在覆盖超过50个交叉口和600个摄像机的大型城市数据集上的实际价值。
https://arxiv.org/abs/2409.11706
We introduce RMP-YOLO, a unified framework designed to provide robust motion predictions even with incomplete input data. Our key insight stems from the observation that complete and reliable historical trajectory data plays a pivotal role in ensuring accurate motion prediction. Therefore, we propose a new paradigm that prioritizes the reconstruction of intact historical trajectories before feeding them into the prediction modules. Our approach introduces a novel scene tokenization module to enhance the extraction and fusion of spatial and temporal features. Following this, our proposed recovery module reconstructs agents' incomplete historical trajectories by leveraging local map topology and interactions with nearby agents. The reconstructed, clean historical data is then integrated into the downstream prediction modules. Our framework is able to effectively handle missing data of varying lengths and remains robust against observation noise, while maintaining high prediction accuracy. Furthermore, our recovery module is compatible with existing prediction models, ensuring seamless integration. Extensive experiments validate the effectiveness of our approach, and deployment in real-world autonomous vehicles confirms its practical utility. In the 2024 Waymo Motion Prediction Competition, our method, RMP-YOLO, achieves state-of-the-art performance, securing third place.
我们提出了RMP-YOLO,一个统一框架,旨在在缺少完整输入数据的情况下提供可靠的运动预测。我们关键的见解源于观察到,完整和可靠的歷史轨迹数据在确保准确运动预测中扮演着关键角色。因此,我们提出了一个新的范式,优先考虑在将轨迹数据输入预测模块之前重构完整的歷史轨迹。我们的方法引入了一个新的场景标记模块,以增强提取和融合空间和时间特征。接着,我们提出的恢复模块通过利用局部地图拓扑结构和与附近代理的交互来重构代理的 incomplete historical trajectories。重构后的干净歷史数据随后被整合到下游预测模块中。我们的框架能够有效处理不同长度缺失数据,并对观察噪声保持高预测精度。此外,我们的恢复模块与现有的预测模型兼容,确保无缝集成。大量实验验证了我们的方法的有效性,而在現實世界自動駕駛車輛的部署中,证实了其實際效用。在2024年Waymo動向預測競賽中,我們的方法,RMP-YOLO,實現了最先进的性能,獲得第三名的佳績。
https://arxiv.org/abs/2409.11696
Autonomous driving technology has witnessed rapid advancements, with foundation models improving interactivity and user experiences. However, current autonomous vehicles (AVs) face significant limitations in delivering command-based driving styles. Most existing methods either rely on predefined driving styles that require expert input or use data-driven techniques like Inverse Reinforcement Learning to extract styles from driving data. These approaches, though effective in some cases, face challenges: difficulty obtaining specific driving data for style matching (e.g., in Robotaxis), inability to align driving style metrics with user preferences, and limitations to pre-existing styles, restricting customization and generalization to new commands. This paper introduces Words2Wheels, a framework that automatically generates customized driving policies based on natural language user commands. Words2Wheels employs a Style-Customized Reward Function to generate a Style-Customized Driving Policy without relying on prior driving data. By leveraging large language models and a Driving Style Database, the framework efficiently retrieves, adapts, and generalizes driving styles. A Statistical Evaluation module ensures alignment with user preferences. Experimental results demonstrate that Words2Wheels outperforms existing methods in accuracy, generalization, and adaptability, offering a novel solution for customized AV driving behavior. Code and demo available at this https URL.
自动驾驶技术经历了快速发展,基础模型提高了互动性和用户体验。然而,目前的自动驾驶车辆(AVs)在实现命令式驾驶风格方面存在显著的局限性。大多数现有方法要么依赖于专家输入的预定义驾驶风格,要么使用数据驱动技术如逆强化学习从驾驶数据中提取风格。尽管这些方法在某些情况下有效,但仍然存在挑战:难以获得特定的驾驶风格数据进行风格匹配(例如,在Robotaxis中),无法将驾驶风格指标与用户偏好对齐,以及限制了预先存在的风格的定制和泛化能力,从而限制了根据新命令进行自定义和推广的能力。本文介绍了一个名为Words2Wheels的框架,该框架根据自然语言用户命令自动生成定制化的驾驶策略。Words2Wheels采用风格定制奖励函数生成一个不需要依赖先前驾驶数据的风格定制驾驶策略。通过利用大型语言模型和驾驶风格数据库,该框架有效地检索、适应和推广驾驶风格。一个统计评估模块确保与用户偏好对齐。实验结果表明,Words2Wheels在准确性、泛化性和适应性方面优于现有方法,为定制化AV驾驶行为提供了新颖的解决方案。代码和演示示例可在该链接的https URL中找到。
https://arxiv.org/abs/2409.11694
The intricate nature of real-world driving environments, characterized by dynamic and diverse interactions among multiple vehicles and their possible future states, presents considerable challenges in accurately predicting the motion states of vehicles and handling the uncertainty inherent in the predictions. Addressing these challenges requires comprehensive modeling and reasoning to capture the implicit relations among vehicles and the corresponding diverse behaviors. This research introduces an integrated framework for autonomous vehicles (AVs) motion prediction to address these complexities, utilizing a novel Relational Hypergraph Interaction-informed Neural mOtion generator (RHINO). RHINO leverages hypergraph-based relational reasoning by integrating a multi-scale hypergraph neural network to model group-wise interactions among multiple vehicles and their multi-modal driving behaviors, thereby enhancing motion prediction accuracy and reliability. Experimental validation using real-world datasets demonstrates the superior performance of this framework in improving predictive accuracy and fostering socially aware automated driving in dynamic traffic scenarios.
现实世界中驾驶环境的复杂性,特点是多辆车辆之间可能的未来状态的动态和多样性互动,提出了在准确预测车辆运动状态并处理预测不确定性方面具有重大挑战的问题。为解决这些挑战,需要全面建模和推理以捕捉车辆之间的隐含关系和相应的多样行为。这项研究引入了一个用于自主车辆(AVs)运动预测的集成框架,利用了一种新颖的关系超图交互式神经网络(RHINO)。RHINO通过将多尺度超图神经网络与超图推理相结合,建模多辆车辆之间车辆之间的多模态驾驶行为,从而提高了运动预测的准确性和可靠性。使用真实世界数据集进行实验验证,证明了这种框架在提高预测准确性和促进社会意识自动驾驶在动态交通场景中的作用。
https://arxiv.org/abs/2409.11676
Safety is a critical concern for urban flights of autonomous Unmanned Aerial Vehicles. In populated environments, risk should be accounted for to produce an effective and safe path, known as risk-aware path planning. Risk-aware path planning can be modeled as a Constrained Shortest Path (CSP) problem, aiming to identify the shortest possible route that adheres to specified safety thresholds. CSP is NP-hard and poses significant computational challenges. Although many traditional methods can solve it accurately, all of them are very slow. Our method introduces an additional safety dimension to the traditional A* (called ASD A*), enabling A* to handle CSP. Furthermore, we develop a custom learning-based heuristic using transformer-based neural networks, which significantly reduces the computational load and improves the performance of the ASD A* algorithm. The proposed method is well-validated with both random and realistic simulation scenarios.
安全性是自主无人机城市飞行中的关键担忧。在人口稠密环境中,应该考虑风险,以产生一种有效且安全的路径,称为风险感知路径规划。风险感知路径规划可以建模为一种约束最短路径(CSP)问题,旨在确定符合指定安全阈值的最近可能的路线。CSP是NP-难的,并提出了相当大的计算挑战。虽然许多传统方法可以准确地解决这个问题,但它们都非常慢。我们提出了一种额外的安全维度来解决传统的A*(称为ASD A*),使得A*能够处理CSP。此外,我们还使用基于Transformer的神经网络开发了一种自定义学习基线,显著减少了计算负担并提高了ASD A*算法的性能。所提出的方法在随机和现实主义仿真场景中都得到了充分验证。
https://arxiv.org/abs/2409.11634
As robotic systems become increasingly integrated into complex real-world environments, there is a growing need for approaches that enable robots to understand and act upon natural language instructions without relying on extensive pre-programmed knowledge of their surroundings. This paper presents PLATO, an innovative system that addresses this challenge by leveraging specialized large language model agents to process natural language inputs, understand the environment, predict tool affordances, and generate executable actions for robotic systems. Unlike traditional systems that depend on hard-coded environmental information, PLATO employs a modular architecture of specialized agents to operate without any initial knowledge of the environment. These agents identify objects and their locations within the scene, generate a comprehensive high-level plan, translate this plan into a series of low-level actions, and verify the completion of each step. The system is particularly tested on challenging tool-use tasks, which involve handling diverse objects and require long-horizon planning. PLATO's design allows it to adapt to dynamic and unstructured settings, significantly enhancing its flexibility and robustness. By evaluating the system across various complex scenarios, we demonstrate its capability to tackle a diverse range of tasks and offer a novel solution to integrate LLMs with robotic platforms, advancing the state-of-the-art in autonomous robotic task execution. For videos and prompt details, please see our project website: this https URL
随着机器人系统越来越多地集成到复杂的现实环境中,对机器人的理解并行动自然语言指令的需求不断增加,而无需依赖其周围环境的广泛预编程知识。本文介绍了一种创新的系统PLATO,通过利用专门的 large language model 代理来处理自然语言输入,理解环境,预测工具倾向,并为机器人系统生成可执行行动。与传统系统不同,PLATO 采用专门代理的模块化架构来操作,没有任何关于环境的初始知识。这些代理确定场景中的物体及其位置,生成一个全面的高层次计划,将这个计划转化为一系列低层次动作,并验证每个步骤的完成。该系统在具有挑战性的工具使用任务中表现出色,这些任务涉及处理多种物体,并需要长时间的远见规划。PLATO 的设计使它能够适应动态和无结构的环境,显著提高了其灵活性和稳健性。通过评估系统在各种复杂场景中的表现,我们证明了其处理各种任务的潜力和为将 LLM 与机器人平台相结合提供新解决方案,推动了自动机器人任务执行的最新进展。有关视频和提示细节,请查看我们的项目网站:https:// this https URL。
https://arxiv.org/abs/2409.11580
In recent years, Light Detection and Ranging (LiDAR) technology, a critical sensor in robotics and autonomous systems, has seen significant advancements. These improvements include enhanced resolution of point clouds and the capability to provide 360° low-resolution images. These images encode various data such as depth, reflectivity, and near-infrared light within the pixels. However, an excessive density of points and conventional point cloud sampling can be counterproductive, particularly in applications such as LiDAR odometry, where misleading points and degraded geometry information may induce drift errors. Currently, extensive research efforts are being directed towards leveraging LiDAR-generated images to improve situational awareness. This paper presents a comprehensive review of current deep learning (DL) techniques, including colorization and super-resolution, which are traditionally utilized in conventional computer vision tasks. These techniques are applied to LiDAR-generated images and are analyzed qualitatively. Based on this analysis, we have developed a novel approach that selectively integrates the most suited colorization and super-resolution methods with LiDAR imagery to sample reliable points from the LiDAR point cloud. This approach aims to not only improve the accuracy of point cloud registration but also avoid mismatching caused by lacking geometry information, thereby augmenting the utility and precision of LiDAR systems in practical applications. In our evaluation, the proposed approach demonstrates superior performance compared to our previous work, achieving lower translation and rotation errors with a reduced number of points.
近年来,激光探测和测距(LiDAR)技术在机器人技术和自动驾驶系统中扮演着关键传感器的角色,取得了显著的进步。这些进步包括点云的高分辨率以及提供360°低分辨率图像的能力。这些图像在像素中编码各种数据,如深度、反射率和近红外光。然而,过度密集的点和传统的点云采样可能会产生反效果,尤其是在诸如LiDAR导航这样的应用中,误导性的点和失真的几何信息可能会引起漂移误差。目前,大量的研究努力集中在利用LiDAR生成的图像提高情境意识。本文对当前的深度学习(DL)技术进行了全面的回顾,包括颜色化和超分辨率,这些技术在传统的计算机视觉任务中得到了传统的应用。这些技术应用于LiDAR生成的图像并进行了定性的分析。根据这个分析,我们开发了一种新方法,将最合适的颜色化和超分辨率方法与LiDAR图像集成,以从LiDAR点云中采样可靠的点。这种方法旨在不仅提高点云配准的准确性,还避免由于缺乏几何信息而产生的匹配误差,从而提高了LiDAR系统在实际应用中的效用和精度。在我们的评估中,与我们的以前工作相比,所提出的方法表现出卓越的性能,通过减少点数实现了较低的平移和旋转误差。
https://arxiv.org/abs/2409.11532
End-to-end autonomous driving with vision-only is not only more cost-effective compared to LiDAR-vision fusion but also more reliable than traditional methods. To achieve a economical and robust purely visual autonomous driving system, we propose RenderWorld, a vision-only end-to-end autonomous driving framework, which generates 3D occupancy labels using a self-supervised gaussian-based Img2Occ Module, then encodes the labels by AM-VAE, and uses world model for forecasting and planning. RenderWorld employs Gaussian Splatting to represent 3D scenes and render 2D images greatly improves segmentation accuracy and reduces GPU memory consumption compared with NeRF-based methods. By applying AM-VAE to encode air and non-air separately, RenderWorld achieves more fine-grained scene element representation, leading to state-of-the-art performance in both 4D occupancy forecasting and motion planning from autoregressive world model.
端到端的自驾驶视觉只有通过视觉而非激光雷达融合来实现不仅比激光雷达视觉融合更具有成本效益,而且比传统方法更可靠。为了实现经济且可靠的纯粹视觉自动驾驶系统,我们提出了RenderWorld,一个基于自监督高斯基于Img2Occ模块生成3D占有标签的视觉端到端自动驾驶框架。然后通过AM-VAE编码标签,并使用世界模型进行预测和规划。通过高斯平铺来表示3D场景,渲染2D图像大大提高了分割精度,同时降低了GPU内存消耗与基于NeRF的方法相比。通过将AM-VAE应用于编码空气和非空气,RenderWorld实现了更细粒度的场景元素表示,在自回归世界模型的4D占有预测和运动规划中达到了最先进水平。
https://arxiv.org/abs/2409.11356
Mobile robots should be capable of planning cost-efficient paths for autonomous navigation. Typically, the terrain and robot properties are subject to variations. For instance, properties of the terrain such as friction may vary across different locations. Also, properties of the robot may change such as payloads or wear and tear, e.g., causing changing actuator gains or joint friction. Autonomous navigation approaches should thus be able to adapt to such variations. In this article, we propose a novel approach for learning a probabilistic, terrain- and robot-aware forward dynamics model (TRADYN) which can adapt to such variations and demonstrate its use for navigation. Our learning approach extends recent advances in meta-learning forward dynamics models based on Neural Processes for mobile robot navigation. We evaluate our method in simulation for 2D navigation of a robot with uni-cycle dynamics with varying properties on terrain with spatially varying friction coefficients. In our experiments, we demonstrate that TRADYN has lower prediction error over long time horizons than model ablations which do not adapt to robot or terrain variations. We also evaluate our model for navigation planning in a model-predictive control framework and under various sources of noise. We demonstrate that our approach yields improved performance in planning control-efficient paths by taking robot and terrain properties into account.
移动机器人应具备规划自主导航成本效益路径的能力。通常,地形和机器人的特性会发生变化。例如,地形的摩擦力可能会因不同地点而有所不同。此外,机器人的特性也可能会发生变化,例如负载或磨损,这可能导致改变执行器增益或关节摩擦。因此,自主导航方法需要能够适应这些变化。在本文中,我们提出了一个新方法来学习一个概率地形和机器人感知的向前动态模型(TRADYN),可以适应这些变化并展示其在导航中的应用。我们的学习方法扩展了基于神经过程的移动机器人导航元学习向前动态模型的最新进展。我们在具有不同属性(例如摩擦力)的地形上对单周期机器人进行仿真。我们的实验结果表明,TRADYN在长时间尺度上的预测误差比不考虑机器人或地形变化而采用的模型抽象要低。我们还评估了我们的模型在模型预测控制框架下的导航规划,以及在不同噪声来源下。我们证明了通过考虑机器人和地形属性,我们的方法在规划控制高效路径方面取得了较好的性能。
https://arxiv.org/abs/2409.11452
Autonomous navigation in ice-covered waters poses significant challenges due to the frequent lack of viable collision-free trajectories. When complete obstacle avoidance is infeasible, it becomes imperative for the navigation strategy to minimize collisions. Additionally, the dynamic nature of ice, which moves in response to ship maneuvers, complicates the path planning process. To address these challenges, we propose a novel deep learning model to estimate the coarse dynamics of ice movements triggered by ship actions through occupancy estimation. To ensure real-time applicability, we propose a novel approach that caches intermediate prediction results and seamlessly integrates the predictive model into a graph search planner. We evaluate the proposed planner both in simulation and in a physical testbed against existing approaches and show that our planner significantly reduces collisions with ice when compared to the state-of-the-art. Codes and demos of this work are available at this https URL.
自主导航在覆冰水域中具有显著的挑战,因为经常缺乏可行的碰撞避免轨迹。当无法实现完全避障时,导航策略必须最小化碰撞。此外,冰的动态特性,根据船舶操纵做出反应,使路径规划过程复杂化。为解决这些挑战,我们提出了一个新颖的深度学习模型,通过占有率估计船舶行动引发的冰运动的大致动态。为了确保实时适用性,我们提出了一个新颖的方法,通过缓存中间预测结果并无缝地将预测模型集成到图搜索规划器中。我们评估了所提出的规划器在模拟和实际测试台上的效果,与现有方法相比,我们的规划器显著减少了与冰的碰撞。本工作的代码和演示版本可在以下链接处获得:https://url.
https://arxiv.org/abs/2409.11326
Autonomous precision navigation to land onto the Moon relies on vision sensors. Computer vision algorithms are designed, trained and tested using synthetic simulations. High quality terrain models have been produced by Moon orbiters developed by several nations, with resolutions ranging from tens or hundreds of meters globally down to few meters locally. The SurRender software is a powerful simulator able to exploit the full potential of these datasets in raytracing. New interfaces include tools to fuse multi-resolution DEMs and procedural texture generation. A global model of the Moon at 20m resolution was integrated representing several terabytes of data which SurRender can render continuously and in real-time. This simulator will be a precious asset for the development of future missions.
自主精密导航在登陆月球上依靠视觉传感器。计算机视觉算法通过模拟仿真进行设计、训练和测试。由几个国家开发的月球探测器产生的高质地形模型,全球分辨率从数十米到几米,局部分辨率则有所不同。SurRender软件是一个强大的模拟器,能够充分利用这些数据集的完整潜力进行光线追踪。新界面包括将多分辨率DEM融合在一起以及程序纹理生成的工具。一个20米分辨率的月球全局模型被集成,包含几TB的数据,SurRender可以持续且实时渲染。这个模拟器将为未来任务的开发做出宝贵的贡献。
https://arxiv.org/abs/2409.11450
We present an innovative framework for traffic dynamics analysis using High-Order Evolving Graphs, designed to improve spatio-temporal representations in autonomous driving contexts. Our approach constructs temporal bidirectional bipartite graphs that effectively model the complex interactions within traffic scenes in real-time. By integrating Graph Neural Networks (GNNs) with high-order multi-aggregation strategies, we significantly enhance the modeling of traffic scene dynamics, providing a more accurate and detailed analysis of these interactions. Additionally, we incorporate inductive learning techniques inspired by the GraphSAGE framework, enabling our model to adapt to new and unseen traffic scenarios without the need for retraining, thus ensuring robust generalization. Through extensive experiments on the ROAD and ROAD Waymo datasets, we establish a comprehensive baseline for further developments, demonstrating the potential of our method in accurately capturing traffic behavior. Our results emphasize the value of high-order statistical moments and feature-gated attention mechanisms in improving traffic behavior analysis, laying the groundwork for advancing autonomous driving technologies. Our source code is available at: this https URL\_Order\_Graphs
我们提出了一个用于交通动态分析的创新框架,使用高阶演化图,旨在提高自动驾驶环境中的空间-时间表示。我们的方法构建了时空双向二分图,有效建模了交通场景中复杂互动。通过将图神经网络(GNNs)与高阶多聚策略集成,我们显著增强了交通场景动态的建模,提供了更准确和详细的分析。此外,我们还引入了 inspired by GraphSAGE 的归纳学习技术,使我们的模型能够适应新的和未见到的交通场景,无需重新训练,从而确保了鲁棒性。通过在ROAD和ROAD Waymo数据集上的广泛实验,我们建立了全面的基础设施,为未来的发展提供了依据,证明了我们的方法在准确捕捉交通行为方面具有潜力。我们的结果强调了高阶统计时刻和特征触发注意机制在提高交通行为分析中的价值,为自动驾驶技术的发展奠定了基础。我们的源代码可在此处访问:https://this URL\_Order\_Graphs
https://arxiv.org/abs/2409.11206
A key challenge in autonomous driving is that Autonomous Vehicles (AVs) must contend with multiple, often conflicting, planning requirements. These requirements naturally form in a hierarchy -- e.g., avoiding a collision is more important than maintaining lane. While the exact structure of this hierarchy remains unknown, to progress towards ensuring that AVs satisfy pre-determined behavior specifications, it is crucial to develop approaches that systematically account for it. Motivated by lexicographic behavior specification in AVs, this work addresses a lexicographic multi-objective motion planning problem, where each objective is incomparably more important than the next -- consider that avoiding a collision is incomparably more important than a lane change violation. This work ties together two elements. Firstly, a multi-objective candidate function that asymptotically represents lexicographic orders is introduced. Unlike existing multi-objective cost function formulations, this approach assures that returned solutions asymptotically align with the lexicographic behavior specification. Secondly, inspired by continuation methods, we propose two algorithms that asymptotically approach minimum rank decisions -- i.e., decisions that satisfy the highest number of important rules possible. Through a couple practical examples, we showcase that the proposed candidate function asymptotically represents the lexicographic hierarchy, and that both proposed algorithms return minimum rank decisions, even when other approaches do not.
自动驾驶中的一个关键挑战是,自动驾驶车辆(AVs)必须应对多个经常相互矛盾的规划要求。这些要求自然形成了一个层次结构,例如,避免碰撞比保持车道更重要。虽然这个层次结构的确切结构仍然未知,但为了确保AVs满足预先确定的行为规范,开发系统地考虑这个层次结构是至关重要的。为了激励AV中的词典行为规范,本文解决了一个词典多目标运动规划问题,其中每个目标都比下一个目标更重要 - 考虑到避免碰撞的重要性远远超过换道违规 - 这两个元素将在一起。首先,我们引入了一个多目标候选函数,它以渐进方式表示词典顺序。与现有的多目标成本函数形式不同,这种方法确保返回的解决方案渐进地与词典行为规范对齐。其次,我们受到继续方法启发,提出了两个渐进逼近最小秩决策的算法。通过几个实际例子,我们展示了所提出的候选函数渐进地代表了词典层次结构,并且所提出的两个算法在即使其他方法不工作时,也能返回最小秩决策。
https://arxiv.org/abs/2409.11199