We present Residual Descent Differential Dynamic Game (RD3G), a Newton-based solver for constrained multi-agent game-control problems. The proposed solver seeks a local Nash equilibrium for problems where agents are coupled through their rewards and state constraints. We compare the proposed method against competing state-of-the-art techniques and showcase the computational benefits of the RD3G algorithm on several example problems.
我们提出了Residual Descent Differential Dynamic Game (RD3G),一种基于Newton法的多智能体游戏控制问题的求解方法。该求解器旨在寻找在奖励和状态约束下相互耦合的智能体所面临的局部纳什均衡。我们比较所提出的技术 against 当前最先进的竞争技术,并展示了RD3G算法在多个例子问题上的计算优势。
https://arxiv.org/abs/2409.12152
Large Language Models' (LLM) reasoning can be improved using test-time aggregation strategies, i.e., generating multiple samples and voting among generated samples. While these improve performance, they often reach a saturation point. Refinement offers an alternative by using LLM-generated feedback to improve solution quality. However, refinement introduces 3 key challenges: (1) Excessive refinement: Uniformly refining all instances can over-correct and reduce the overall performance. (2) Inability to localize and address errors: LLMs have a limited ability to self-correct and struggle to identify and correct their own mistakes. (3) Insufficient refinement: Deciding how many iterations of refinement are needed is non-trivial, and stopping too soon could leave errors unaddressed. To tackle these issues, we propose MAgICoRe, which avoids excessive refinement by categorizing problem difficulty as easy or hard, solving easy problems with coarse-grained aggregation and hard ones with fine-grained and iterative multi-agent refinement. To improve error localization, we incorporate external step-wise reward model (RM) scores. Moreover, to ensure effective refinement, we employ a multi-agent loop with three agents: Solver, Reviewer (which generates targeted feedback based on step-wise RM scores), and the Refiner (which incorporates feedback). To ensure sufficient refinement, we re-evaluate updated solutions, iteratively initiating further rounds of refinement. We evaluate MAgICoRe on Llama-3-8B and GPT-3.5 and show its effectiveness across 5 math datasets. Even one iteration of MAgICoRe beats Self-Consistency by 3.4%, Best-of-k by 3.2%, and Self-Refine by 4.0% while using less than half the samples. Unlike iterative refinement with baselines, MAgICoRe continues to improve with more iterations. Finally, our ablations highlight the importance of MAgICoRe's RMs and multi-agent communication.
大语言模型的推理可以通过在测试时间内进行聚合策略来提高,即生成多个样本并从中投票。尽管这些方法提高了性能,但它们通常达到饱和点。通过使用LLM生成的反馈来提高解决方案的质量,提供了另一种方法。然而,这种方法引入了三个关键挑战:(1)过度细化:均匀地细化所有实例可能会过度纠正并降低整体性能。(2)无法定位和解决错误:LLM的自我纠正能力有限,很难识别和纠正自己的错误。(3)不够细化:决定需要多少轮细化是一个非 trivial 的问题, stopping too soon could leave errors unaddressed。为了解决这些问题,我们提出了MAgICoRe,它通过将问题难度分类为容易或困难来避免过度细化,用粗粒度聚合解决容易问题,用细粒度多代理器迭代解决困难问题。为了改善错误定位,我们引入了外部逐步奖励模型(RM)得分。此外,为了确保有效的迭代,我们使用了一个多代理器循环,包括求解器、评论者(根据逐步 RM 得分生成定向反馈)和优化器(包含反馈)。为了确保足够的迭代,我们重新评估了更新的解决方案,并迭代启动进一步的优化轮数。我们在Llama-3-8B和GPT-3.5上评估了MAgICoRe,并证明了其在5个数学数据集上的有效性。即使在只使用一半样本的情况下,MAgICoRe的自我一致性比基线提高了3.4%,最佳推理顺序比基线提高了3.2%,自修复提高了4.0%。与基线迭代方法不同,MAgICoRe在更多迭代后继续改进。最后,我们的实验表明,MAgICoRe的RM和多代理器通信非常重要。
https://arxiv.org/abs/2409.12147
Tables, figures, and listings (TFLs) are essential tools for summarizing clinical trial data. Creation of TFLs for reporting activities is often a time-consuming task encountered routinely during the execution of clinical trials. This study explored the use of large language models (LLMs) to automate the generation of TFLs through prompt engineering and few-shot transfer learning. Using public clinical trial data in ADaM format, our results demonstrated that LLMs can efficiently generate TFLs with prompt instructions, showcasing their potential in this domain. Furthermore, we developed a conservational agent named Clinical Trial TFL Generation Agent: An app that matches user queries to predefined prompts that produce customized programs to generate specific predefined TFLs.
表格、图表和列表(TFLs)是总结临床试验数据的关键工具。为报告活动创建TFL通常是临床试验执行过程中经常遇到的时间耗费的任务。本研究探讨了利用大型语言模型(LLMs)通过提示工程和少样本迁移学习自动生成TFL的可能性。使用ADaM格式的公开临床试验数据,我们的结果表明,LLMs可以有效地生成带有提示指令的TFL,展示了它们在这个领域的前景。此外,我们开发了一个名为Clinical Trial TFL生成代理:一个应用程序,将用户查询与预定义提示匹配,生成定制化的程序以生成特定的预定义TFL。
https://arxiv.org/abs/2409.12046
Object manipulation capabilities are essential skills that set apart embodied agents engaging with the world, especially in the realm of robotics. The ability to predict outcomes of interactions with objects is paramount in this setting. While model-based control methods have started to be employed for tackling manipulation tasks, they have faced challenges in accurately manipulating objects. As we analyze the causes of this limitation, we identify the cause of underperformance in the way current world models represent crucial positional information, especially about the target's goal specification for object positioning tasks. We introduce a general approach that empowers world model-based agents to effectively solve object-positioning tasks. We propose two declinations of this approach for generative world models: position-conditioned (PCP) and latent-conditioned (LCP) policy learning. In particular, LCP employs object-centric latent representations that explicitly capture object positional information for goal specification. This naturally leads to the emergence of multimodal capabilities, enabling the specification of goals through spatial coordinates or a visual goal. Our methods are rigorously evaluated across several manipulation environments, showing favorable performance compared to current model-based control approaches.
对象操作能力是使与物体交互的实体代理人在机器人领域具有关键技能。在这样一个设置中,预测交互结果的能力至关重要。虽然基于模型的控制方法已经开始用于解决操作任务,但它们在准确操作物体方面遇到了挑战。在我们分析这一限制的原因时,我们得出了在当前世界模型中代表目标位置信息不足的问题,尤其是在目标定位任务中。我们引入了一种通用的方法,使基于世界模型的代理有效地解决物体定位任务。我们提出了两种基于世界模型学习策略的变体:位置条件(PCP)和隐式条件(LCP)策略学习。特别地,LCP采用对象中心化的潜在表示,明确地捕捉目标定位信息。这自然导致了多模态能力的出现,通过空间坐标或视觉目标来指定目标。我们对多个操作环境进行了严格的评估,我们的方法与当前基于模型的控制方法相比显示出良好的性能。
https://arxiv.org/abs/2409.12005
Offline multi-agent reinforcement learning (MARL) is an exciting direction of research that uses static datasets to find optimal control policies for multi-agent systems. Though the field is by definition data-driven, efforts have thus far neglected data in their drive to achieve state-of-the-art results. We first substantiate this claim by surveying the literature, showing how the majority of works generate their own datasets without consistent methodology and provide sparse information about the characteristics of these datasets. We then show why neglecting the nature of the data is problematic, through salient examples of how tightly algorithmic performance is coupled to the dataset used, necessitating a common foundation for experiments in the field. In response, we take a big step towards improving data usage and data awareness in offline MARL, with three key contributions: (1) a clear guideline for generating novel datasets; (2) a standardisation of over 80 existing datasets, hosted in a publicly available repository, using a consistent storage format and easy-to-use API; and (3) a suite of analysis tools that allow us to understand these datasets better, aiding further development.
离线多智能体强化学习(MARL)是一个令人兴奋的研究方向,它使用静态数据集来寻找最优控制策略 for 多智能体系统。尽管该领域本质上是数据驱动的,但迄今为止的努力都忽略了数据在追求最优结果的过程中。我们首先通过调查文献来证实这一说法,并表明大多数工作在生成自己的数据集时没有采用一致的方法论,并且关于这些数据集的特征提供得很稀疏。然后我们展示了忽视数据性质的严重问题,通过引人注目的例子说明,算法的性能与使用的数据集密切相关,需要在该领域建立共同的基础进行实验。为了回应这一问题,我们迈出了改善离线MARL数据使用和数据意识的重要一步,包括以下三个关键贡献:(1)生成新数据集的清晰指南;(2)使用一致存储格式并在公开可用的存储库中托管的 80 多个现有数据集的标准化;(3)一套分析工具,使我们能够更好地理解这些数据集,并有助于进一步发展。
https://arxiv.org/abs/2409.12001
The problem of safety for robotic systems has been extensively studied. However, little attention has been given to security issues for three-dimensional systems, such as quadrotors. Malicious adversaries can compromise robot sensors and communication networks, causing incidents, achieving illegal objectives, or even injuring people. This study first designs an intelligent control system for autonomous quadrotors. Then, it investigates the problems of optimal false data injection attack scheduling and countermeasure design for unmanned aerial vehicles. Using a state-of-the-art deep learning-based approach, an optimal false data injection attack scheme is proposed to deteriorate a quadrotor's tracking performance with limited attack energy. Subsequently, an optimal tracking control strategy is learned to mitigate attacks and recover the quadrotor's tracking performance. We base our work on Agilicious, a state-of-the-art quadrotor recently deployed for autonomous settings. This paper is the first in the United Kingdom to deploy this quadrotor and implement reinforcement learning on its platform. Therefore, to promote easy reproducibility with minimal engineering overhead, we further provide (1) a comprehensive breakdown of this quadrotor, including software stacks and hardware alternatives; (2) a detailed reinforcement-learning framework to train autonomous controllers on Agilicious agents; and (3) a new open-source environment that builds upon PyFlyt for future reinforcement learning research on Agilicious platforms. Both simulated and real-world experiments are conducted to show the effectiveness of the proposed frameworks in section 5.2.
机器机器人系统的安全性问题已经得到了广泛研究。然而,对于三维系统(如四旋翼)的安全性问题,关注较少。恶意攻击者可能攻击机器人传感器和通信网络,导致事故、实现非法目标或甚至伤害人员。本研究首先为自主四旋翼设计了智能控制系统。然后,研究了无人机上最优假数据注入攻击调度问题和反制设计问题。采用最先进的深度学习方法,提出了用有限攻击能量恶化四旋翼跟踪性能的最优假数据注入攻击方案。接着,学习最优跟踪控制策略以减轻攻击并恢复四旋翼跟踪性能。我们的工作基于最新部署的智能四旋翼Agilicious,这是英国首个在自主环境中部署的智能四旋翼,并在其平台上实现了强化学习。因此,为了通过最小化工程开发生度促进易于重复,我们进一步提供了(1)对Agilicious的全面拆分,包括软件堆栈和硬件替代方案;(2)用于在Agilicious代理上训练自主控制器的详细强化学习框架;(3)利用PyFlyt构建未来Agilicious平台上的强化学习研究的新开源环境。第5.2节中的模拟和现实世界实验都进行了研究,以证明所提出的框架的有效性。
https://arxiv.org/abs/2409.11897
Non-stationarity poses a fundamental challenge in Multi-Agent Reinforcement Learning (MARL), arising from agents simultaneously learning and altering their policies. This creates a non-stationary environment from the perspective of each individual agent, often leading to suboptimal or even unconverged learning outcomes. We propose an open-source framework named XP-MARL, which augments MARL with auxiliary prioritization to address this challenge in cooperative settings. XP-MARL is 1) founded upon our hypothesis that prioritizing agents and letting higher-priority agents establish their actions first would stabilize the learning process and thus mitigate non-stationarity and 2) enabled by our proposed mechanism called action propagation, where higher-priority agents act first and communicate their actions, providing a more stationary environment for others. Moreover, instead of using a predefined or heuristic priority assignment, XP-MARL learns priority-assignment policies with an auxiliary MARL problem, leading to a joint learning scheme. Experiments in a motion-planning scenario involving Connected and Automated Vehicles (CAVs) demonstrate that XP-MARL improves the safety of a baseline model by 84.4% and outperforms a state-of-the-art approach, which improves the baseline by only 12.8%. Code: this http URL
多智能体强化学习(MARL)中的非平稳性在很大程度上源于同时学习并修改策略的智能体。这导致每个智能体的观点上都存在非平稳的环境,从而导致学习结果不最优或甚至不可收敛。我们提出了一个名为XP-MARL的开源框架,该框架在合作设置中通过辅助优先级排序来解决这一挑战。XP-MARL包括以下几个方面: 1) 我们假设,在优先级排序并让优先级较高的智能体先行动的情况下,可以稳定学习过程,从而减轻非平稳性和不收敛。 2) 得益于我们提出的动作传播机制,优先级较高的智能体先行动并沟通其行动,为其他人创建了一个更平稳的环境。 3) 采用辅助MARL问题学习优先级分配策略,实现了一种联合学习方案。 在运动规划场景中,涉及连接和自动车辆(CAVs)的运动,XP-MARL可以提高基线模型的安全性84.4%,而最先进的解决方案只提高了基线模型12.8%。代码:这个链接
https://arxiv.org/abs/2409.11852
This paper explores the potential application of Deep Reinforcement Learning in the furniture industry. To offer a broad product portfolio, most furniture manufacturers are organized as a job shop, which ultimately results in the Job Shop Scheduling Problem (JSSP). The JSSP is addressed with a focus on extending traditional models to better represent the complexities of real-world production environments. Existing approaches frequently fail to consider critical factors such as machine setup times or varying batch sizes. A concept for a model is proposed that provides a higher level of information detail to enhance scheduling accuracy and efficiency. The concept introduces the integration of DRL for production planning, particularly suited to batch production industries such as the furniture industry. The model extends traditional approaches to JSSPs by including job volumes, buffer management, transportation times, and machine setup times. This enables more precise forecasting and analysis of production flows and processes, accommodating the variability and complexity inherent in real-world manufacturing processes. The RL agent learns to optimize scheduling decisions. It operates within a discrete action space, making decisions based on detailed observations. A reward function guides the agent's decision-making process, thereby promoting efficient scheduling and meeting production deadlines. Two integration strategies for implementing the RL agent are discussed: episodic planning, which is suitable for low-automation environments, and continuous planning, which is ideal for highly automated plants. While episodic planning can be employed as a standalone solution, the continuous planning approach necessitates the integration of the agent with ERP and Manufacturing Execution Systems. This integration enables real-time adjustments to production schedules based on dynamic changes.
本文探讨了在家具行业中运用深度强化学习(Deep Reinforcement Learning)的潜在应用。为了提供广泛的产品线,大多数家具制造商都按就业站组织,这最终导致就业站调度问题(JSSP)。JSSP通过扩展传统模型以更好地表示现实生产环境中的复杂性来解决。现有的方法通常未能考虑诸如机器设置时间或批量大小的关键因素。为了提高预测精度和效率,提出了一个模型概念,该概念在提高调度准确性和效率方面提供了更高层次的信息。该概念将生产计划中的深度强化学习(DRL)集成引入了批次生产行业,如家具行业。通过包括工作量、缓冲管理、运输时间和机器设置时间,将传统方法扩展到JSSP。这使得更精确地预测和分析生产流程和流量,并适应现实生产过程的变异性。RL代理学会优化调度决策。它在一个离散的动作空间中做出决策,基于详细的观察结果。一个奖励函数指导代理的决策过程,从而促进高效的调度并达到生产目标。本文讨论了实施RL代理的两个集成策略:episodic计划,适用于低自动化环境,和连续计划,适用于高度自动化企业。虽然episodic计划可以作为独立的解决方案,但连续计划方法需要将代理与ERP和生产执行系统集成。这种集成使可以根据动态变化实时调整生产计划。
https://arxiv.org/abs/2409.11820
Recently, AI systems have made remarkable progress in various tasks. Deep Reinforcement Learning(DRL) is an effective tool for agents to learn policies in low-level state spaces to solve highly complex tasks. Researchers have introduced Intrinsic Motivation(IM) to the RL mechanism, which simulates the agent's curiosity, encouraging agents to explore interesting areas of the environment. This new feature has proved vital in enabling agents to learn policies without being given specific goals. However, even though DRL intelligence emerges through a sub-symbolic model, there is still a need for a sort of abstraction to understand the knowledge collected by the agent. To this end, the classical planning formalism has been used in recent research to explicitly represent the knowledge an autonomous agent acquires and effectively reach extrinsic goals. Despite classical planning usually presents limited expressive capabilities, PPDDL demonstrated usefulness in reviewing the knowledge gathered by an autonomous system, making explicit causal correlations, and can be exploited to find a plan to reach any state the agent faces during its experience. This work presents a new architecture implementing an open-ended learning system able to synthesize from scratch its experience into a PPDDL representation and update it over time. Without a predefined set of goals and tasks, the system integrates intrinsic motivations to explore the environment in a self-directed way, exploiting the high-level knowledge acquired during its experience. The system explores the environment and iteratively: (a) discover options, (b) explore the environment using options, (c) abstract the knowledge collected and (d) plan. This paper proposes an alternative approach to implementing open-ended learning architectures exploiting low-level and high-level representations to extend its knowledge in a virtuous loop.
近年来,AI系统在各种任务上取得了显著的进步。深度强化学习(DRL)是一种有效的工具,使智能体在低级状态空间中学习策略,以解决高度复杂的任务。研究人员引入了内生动机(IM)到强化学习(RL)机制中,模拟了智能体的好奇心,鼓励智能体探索环境中的有趣区域。这种新特性已经在使智能体在没有具体目标的情况下学习策略方面证明至关重要。然而,尽管DRL智能是通过子符号模型出现的,但仍然需要某种抽象来理解智能体收集到的知识。为此,在最近的研究中,经典规划形式被用于明确表示智能体获得的知識,并有效地达到外部的目标。尽管经典规划通常具有有限的表达能力,但PPDDL在回顾智能体收集到的知识以及明确因果关系方面表现出了有效性,并可以被用于找到智能体在经历其经验时面临的任何状态的规划方案。这项工作提出了一种新的架构,实现了一个自定义的学习系统,可以从零开始合成其经验并随时间更新。在没有预定义的目标和任务的情况下,系统通过内生动机以自导向的方式探索环境,利用其在经验中获得的先进知识。系统探索环境并递归执行:(a)发现选项,(b) 使用选项探索环境,(c) 抽象收集到的知识,(d) 规划。本文提出了利用低级和高级表示来扩展其知识以实现美德循环的另一种实现开放性学习架构的方法。
https://arxiv.org/abs/2409.11756
Human-in-the-loop reinforcement learning integrates human expertise to accelerate agent learning and provide critical guidance and feedback in complex fields. However, many existing approaches focus on single-agent tasks and require continuous human involvement during the training process, significantly increasing the human workload and limiting scalability. In this paper, we propose HARP (Human-Assisted Regrouping with Permutation Invariant Critic), a multi-agent reinforcement learning framework designed for group-oriented tasks. HARP integrates automatic agent regrouping with strategic human assistance during deployment, enabling and allowing non-experts to offer effective guidance with minimal intervention. During training, agents dynamically adjust their groupings to optimize collaborative task completion. When deployed, they actively seek human assistance and utilize the Permutation Invariant Group Critic to evaluate and refine human-proposed groupings, allowing non-expert users to contribute valuable suggestions. In multiple collaboration scenarios, our approach is able to leverage limited guidance from non-experts and enhance performance. The project can be found at this https URL.
人类回馈强化学习将人类专业知识整合到加速智能体学习,为复杂领域提供关键指导和支持。然而,许多现有方法关注单一智能体任务,并且在训练过程中需要持续的人类参与,导致 human workload 显著增加,限制了可扩展性。在本文中,我们提出了 HARP(用于面向团队的强化学习框架),一种多智能体强化学习框架,旨在解决面向团队的任务。HARP 集成了自动智能体分组和战略人类辅助在部署过程中,使非专家能够在不干预的情况下提供有效的指导。在训练过程中,智能体动态调整其分组以优化合作任务完成。部署时,它们积极寻求人类帮助,并利用 Permutation Invariant Group Critic(PIGC)对人类提出的分组进行评估和优化,使非专家用户能够做出宝贵的建议。在多个合作场景中,我们的方法能够利用非专家的有限指导,并提高性能。这个项目可以在这个链接找到:https://www.academia.edu/39411841/HARP_for_Human_Collaborative_Task_Completion_in_Multi
https://arxiv.org/abs/2409.11741
Large language model (LLM) role-playing has gained widespread attention, where the authentic character knowledge is crucial for constructing realistic LLM role-playing agents. However, existing works usually overlook the exploration of LLMs' ability to detect characters' known knowledge errors (KKE) and unknown knowledge errors (UKE) while playing roles, which would lead to low-quality automatic construction of character trainable corpus. In this paper, we propose a probing dataset to evaluate LLMs' ability to detect errors in KKE and UKE. The results indicate that even the latest LLMs struggle to effectively detect these two types of errors, especially when it comes to familiar knowledge. We experimented with various reasoning strategies and propose an agent-based reasoning method, Self-Recollection and Self-Doubt (S2RD), to further explore the potential for improving error detection capabilities. Experiments show that our method effectively improves the LLMs' ability to detect error character knowledge, but it remains an issue that requires ongoing attention.
大语言模型(LLM)角色扮演已经引起了广泛的关注,其中真正的人物知识对于构建真实的LLM角色扮演代理至关重要。然而,现有的作品通常忽视了在角色扮演过程中LLMs能够检测到的已知知识错误(KKE)和未知知识错误(UKE),这会导致高质量自动构建角色训练语料库。在本文中,我们提出了一个探究数据集,以评估LLMs在检测KKE和UKE方面的能力。结果显示,即使是最新版本的LLM在熟悉知识方面也难以有效检测这两种类型的错误。为了进一步探索提高错误检测能力的可能性,我们实验了各种推理策略,并提出了基于代理的推理方法:自省和自我怀疑(S2RD)。实验表明,我们的方法有效地提高了LLMs检测错误人物知识的能力,但仍然需要持续关注。
https://arxiv.org/abs/2409.11726
Human cognition can leverage fundamental conceptual knowledge, like geometric and kinematic ones, to appropriately perceive, comprehend and interact with novel objects. Motivated by this finding, we aim to endow machine intelligence with an analogous capability through performing at the conceptual level, in order to understand and then interact with articulated objects, especially for those in novel categories, which is challenging due to the intricate geometric structures and diverse joint types of articulated objects. To achieve this goal, we propose Analytic Ontology Template (AOT), a parameterized and differentiable program description of generalized conceptual ontologies. A baseline approach called AOTNet driven by AOTs is designed accordingly to equip intelligent agents with these generalized concepts, and then empower the agents to effectively discover the conceptual knowledge on the structure and affordance of articulated objects. The AOT-driven approach yields benefits in three key perspectives: i) enabling concept-level understanding of articulated objects without relying on any real training data, ii) providing analytic structure information, and iii) introducing rich affordance information indicating proper ways of interaction. We conduct exhaustive experiments and the results demonstrate the superiority of our approach in understanding and then interacting with articulated objects.
人类认知可以利用基本的概念知识,如几何和运动学知识,适当地感知、理解和与新物体互动。为了实现这一发现,我们旨在通过在概念层面上进行操作,赋予机器智能类似的能力,以便理解和然后与 articulated物体互动,尤其是对于那些新类别的物体,这由于复杂的几何结构和多样关节类型而具有挑战性。为实现这一目标,我们提出了Analytic Ontology Template(AOT),一种参数化和可导的程序描述通用概念本体的方法。基于AOTs的AOTNet是一种基准方法,旨在为智能体提供这些通用概念,并使智能体能够有效发现 articulated 物体的结构和表示。AOT驱动的方法在三个关键方面具有优势:i)无需依赖任何真实训练数据,实现对关节对象的 Concept-level 理解;ii)提供分析结构信息;iii)引入丰富的 affordance 信息,表明适当的行为方式。我们进行了全面的实验,结果表明,我们的方法在理解和然后与关节物体互动方面具有优越性。
https://arxiv.org/abs/2409.11702
We introduce RMP-YOLO, a unified framework designed to provide robust motion predictions even with incomplete input data. Our key insight stems from the observation that complete and reliable historical trajectory data plays a pivotal role in ensuring accurate motion prediction. Therefore, we propose a new paradigm that prioritizes the reconstruction of intact historical trajectories before feeding them into the prediction modules. Our approach introduces a novel scene tokenization module to enhance the extraction and fusion of spatial and temporal features. Following this, our proposed recovery module reconstructs agents' incomplete historical trajectories by leveraging local map topology and interactions with nearby agents. The reconstructed, clean historical data is then integrated into the downstream prediction modules. Our framework is able to effectively handle missing data of varying lengths and remains robust against observation noise, while maintaining high prediction accuracy. Furthermore, our recovery module is compatible with existing prediction models, ensuring seamless integration. Extensive experiments validate the effectiveness of our approach, and deployment in real-world autonomous vehicles confirms its practical utility. In the 2024 Waymo Motion Prediction Competition, our method, RMP-YOLO, achieves state-of-the-art performance, securing third place.
我们提出了RMP-YOLO,一个统一框架,旨在在缺少完整输入数据的情况下提供可靠的运动预测。我们关键的见解源于观察到,完整和可靠的歷史轨迹数据在确保准确运动预测中扮演着关键角色。因此,我们提出了一个新的范式,优先考虑在将轨迹数据输入预测模块之前重构完整的歷史轨迹。我们的方法引入了一个新的场景标记模块,以增强提取和融合空间和时间特征。接着,我们提出的恢复模块通过利用局部地图拓扑结构和与附近代理的交互来重构代理的 incomplete historical trajectories。重构后的干净歷史数据随后被整合到下游预测模块中。我们的框架能够有效处理不同长度缺失数据,并对观察噪声保持高预测精度。此外,我们的恢复模块与现有的预测模型兼容,确保无缝集成。大量实验验证了我们的方法的有效性,而在現實世界自動駕駛車輛的部署中,证实了其實際效用。在2024年Waymo動向預測競賽中,我們的方法,RMP-YOLO,實現了最先进的性能,獲得第三名的佳績。
https://arxiv.org/abs/2409.11696
Goal recognition (GR) involves inferring an agent's unobserved goal from a sequence of observations. This is a critical problem in AI with diverse applications. Traditionally, GR has been addressed using 'inference to the best explanation' or abduction, where hypotheses about the agent's goals are generated as the most plausible explanations for observed behavior. Alternatively, some approaches enhance interpretability by ensuring that an agent's behavior aligns with an observer's expectations or by making the reasoning behind decisions more transparent. In this work, we tackle a different challenge: explaining the GR process in a way that is comprehensible to humans. We introduce and evaluate an explainable model for goal recognition (GR) agents, grounded in the theoretical framework and cognitive processes underlying human behavior explanation. Drawing on insights from two human-agent studies, we propose a conceptual framework for human-centered explanations of GR. Using this framework, we develop the eXplainable Goal Recognition (XGR) model, which generates explanations for both why and why not questions. We evaluate the model computationally across eight GR benchmarks and through three user studies. The first study assesses the efficiency of generating human-like explanations within the Sokoban game domain, the second examines perceived explainability in the same domain, and the third evaluates the model's effectiveness in aiding decision-making in illegal fishing detection. Results demonstrate that the XGR model significantly enhances user understanding, trust, and decision-making compared to baseline models, underscoring its potential to improve human-agent collaboration.
目标识别(GR)涉及从观察序列中推断代理者的未观察到目标。这是人工智能具有多样应用的一个关键问题。传统上,GR通过“推理到最佳解释”或类比来解决,其中观察行为的最佳解释生成假设代理者目标。另外,一些方法通过确保代理者的行为与观察者的期望相一致或使决策背后的推理更加透明来增强可解释性。在这项工作中,我们面临一个不同的挑战:用易于理解的方式解释GR过程。我们基于人类行为解释的理论框架和认知过程,引入并评估了一个可解释的目标识别(GR)代理者的模型。借鉴来自两个人类-代理研究的结果,我们提出了一个人类中心GR解释的框架。使用这个框架,我们开发了可解释目标识别(XGR)模型,它可以生成原因和原因不方面的解释。我们通过计算在八个GR基准和三个用户研究中评估了该模型。第一项研究评估了在Sokoban游戏领域中生成人类似解释的效率,第二项研究研究了该领域中的感知可解释性,第三项研究评估了模型在非法渔获检测中的决策辅助效果。结果表明,与基线模型相比,XGR模型显著增强了用户的理解、信任和决策,进一步强调了其改善人类代理者合作的潜力。
https://arxiv.org/abs/2409.11675
Histopathology analysis is the gold standard for medical diagnosis. Accurate classification of whole slide images (WSIs) and region-of-interests (ROIs) localization can assist pathologists in diagnosis. The gigapixel resolution of WSI and the absence of fine-grained annotations make direct classification and analysis challenging. In weakly supervised learning, multiple instance learning (MIL) presents a promising approach for WSI classification. The prevailing strategy is to use attention mechanisms to measure instance importance for classification. However, attention mechanisms fail to capture inter-instance information, and self-attention causes quadratic computational complexity. To address these challenges, we propose AMD-MIL, an agent aggregator with a mask denoise mechanism. The agent token acts as an intermediate variable between the query and key for computing instance importance. Mask and denoising matrices, mapped from agents-aggregated value, dynamically mask low-contribution representations and eliminate noise. AMD-MIL achieves better attention allocation by adjusting feature representations, capturing micro-metastases in cancer, and improving interpretability. Extensive experiments on CAMELYON-16, CAMELYON-17, TCGA-KIDNEY, and TCGA-LUNG show AMD-MIL's superiority over state-of-the-art methods.
病理学分析是医疗诊断的黄金标准。准确地对整个切片图像(WSIs)和感兴趣区域(ROIs)进行分类和定位可以帮助病理学家进行诊断。WSI的巨像素分辨率以及缺乏细粒度注释使得直接分类和分析具有挑战性。在弱监督学习中,多实例学习(MIL)对WSI分类是一个有前途的方法。然而,目前的策略使用关注机制来衡量实例的重要性进行分类。然而,关注机制无法捕捉实例之间的交互信息,自注意力会导致平方计算复杂度。为了应对这些挑战,我们提出了AMD-MIL,一个带口罩去噪机制的代理聚合器。代理令牌充当查询和键的中间变量,用于计算实例重要性。口罩和去噪矩阵从代理聚合值进行映射,动态地遮盖低贡献表示并消除噪声。通过调整特征表示,AMD-MIL实现了更好的关注分配,捕获了癌症中的微转移,并提高了可解释性。在CAMELYON-16、CAMELYON-17、TCGA-KIDNEY和TCGA-LUNG等大量实验中,AMD-MIL优越于最先进的方法。
https://arxiv.org/abs/2409.11664
Bounded rational agents often make decisions by evaluating a finite selection of choices, typically derived from a reference point termed the $`$default policy,' based on previous experience. However, the inherent rigidity of the static default policy presents significant challenges for agents when operating in unknown environment, that are not included in agent's prior knowledge. In this work, we introduce a context-generative default policy that leverages the region observed by the robot to predict unobserved part of the environment, thereby enabling the robot to adaptively adjust its default policy based on both the actual observed map and the $\textit{imagined}$ unobserved map. Furthermore, the adaptive nature of the bounded rationality framework enables the robot to manage unreliable or incorrect imaginations by selectively sampling a few trajectories in the vicinity of the default policy. Our approach utilizes a diffusion model for map prediction and a sampling-based planning with B-spline trajectory optimization to generate the default policy. Extensive evaluations reveal that the context-generative policy outperforms the baseline methods in identifying and avoiding unseen obstacles. Additionally, real-world experiments conducted with the Crazyflie drones demonstrate the adaptability of our proposed method, even when acting in environments outside the domain of the training distribution.
受约束的理性代理人在通常基于先验知识的基础上,通过评估有限的选择来做出决策。这些选择通常来自称为`默认策略`的参考点。然而,静态默认策略的固有刚性在未知环境中对代理商运作时提出了重大挑战,这些挑战并不包括代理商的先验知识。在这项工作中,我们引入了一种上下文生成默认策略,它利用机器人观察到的区域来预测未观测到的环境部分,从而使机器人能够根据实际观测到的地图和想象的未观测地图自适应地调整其默认策略。此外,有限理性框架的适应性使得机器人能够通过选择附近默认策略的少数轨迹来管理不可靠或不正确的想象。我们的方法利用了地平线模型进行地图预测和基于B-spline轨迹优化进行抽样规划来生成默认策略。大量评估显示,上下文生成策略在识别和避免未见到的障碍方面优于基线方法。此外,使用Crazyflie无人机进行的真实世界实验也证明了我们在领域外环境中的方法具有可适应性。
https://arxiv.org/abs/2409.11604
As robotic systems become increasingly integrated into complex real-world environments, there is a growing need for approaches that enable robots to understand and act upon natural language instructions without relying on extensive pre-programmed knowledge of their surroundings. This paper presents PLATO, an innovative system that addresses this challenge by leveraging specialized large language model agents to process natural language inputs, understand the environment, predict tool affordances, and generate executable actions for robotic systems. Unlike traditional systems that depend on hard-coded environmental information, PLATO employs a modular architecture of specialized agents to operate without any initial knowledge of the environment. These agents identify objects and their locations within the scene, generate a comprehensive high-level plan, translate this plan into a series of low-level actions, and verify the completion of each step. The system is particularly tested on challenging tool-use tasks, which involve handling diverse objects and require long-horizon planning. PLATO's design allows it to adapt to dynamic and unstructured settings, significantly enhancing its flexibility and robustness. By evaluating the system across various complex scenarios, we demonstrate its capability to tackle a diverse range of tasks and offer a novel solution to integrate LLMs with robotic platforms, advancing the state-of-the-art in autonomous robotic task execution. For videos and prompt details, please see our project website: this https URL
随着机器人系统越来越多地集成到复杂的现实环境中,对机器人的理解并行动自然语言指令的需求不断增加,而无需依赖其周围环境的广泛预编程知识。本文介绍了一种创新的系统PLATO,通过利用专门的 large language model 代理来处理自然语言输入,理解环境,预测工具倾向,并为机器人系统生成可执行行动。与传统系统不同,PLATO 采用专门代理的模块化架构来操作,没有任何关于环境的初始知识。这些代理确定场景中的物体及其位置,生成一个全面的高层次计划,将这个计划转化为一系列低层次动作,并验证每个步骤的完成。该系统在具有挑战性的工具使用任务中表现出色,这些任务涉及处理多种物体,并需要长时间的远见规划。PLATO 的设计使它能够适应动态和无结构的环境,显著提高了其灵活性和稳健性。通过评估系统在各种复杂场景中的表现,我们证明了其处理各种任务的潜力和为将 LLM 与机器人平台相结合提供新解决方案,推动了自动机器人任务执行的最新进展。有关视频和提示细节,请查看我们的项目网站:https:// this https URL。
https://arxiv.org/abs/2409.11580
A team of multiple robots seamlessly and safely working in human-filled public environments requires adaptive task allocation and socially-aware navigation that account for dynamic human behavior. Current approaches struggle with highly dynamic pedestrian movement and the need for flexible task allocation. We propose Hyper-SAMARL, a hypergraph-based system for multi-robot task allocation and socially-aware navigation, leveraging multi-agent reinforcement learning (MARL). Hyper-SAMARL models the environmental dynamics between robots, humans, and points of interest (POIs) using a hypergraph, enabling adaptive task assignment and socially-compliant navigation through a hypergraph diffusion mechanism. Our framework, trained with MARL, effectively captures interactions between robots and humans, adapting tasks based on real-time changes in human activity. Experimental results demonstrate that Hyper-SAMARL outperforms baseline models in terms of social navigation, task completion efficiency, and adaptability in various simulated scenarios.
多个机器人无缝地且安全地在人类填充的公共环境中工作,需要适应任务分配和社会感知导航,以考虑动态的人类行为。目前的方法在高度动态的行人运动和需要灵活任务分配方面遇到困难。我们提出了Hyper-SAMARL,一种基于超图的多机器人任务分配和社会感知导航的超图系统,利用多智能体强化学习(MARL)。Hyper-SAMARL使用超图建模机器人、人类和兴趣点(POIs)之间的环境动态,通过超图扩散机制实现自适应任务分配和社会合规导航。用MARL训练的我们的框架有效捕捉了机器人和人类之间的互动,根据人类活动的实时变化适应任务。实验结果表明,Hyper-SAMARL在社交导航、任务完成效率和各种模拟场景的适应性方面优于基线模型。
https://arxiv.org/abs/2409.11561
Multi-agent strategies have emerged as a promising approach to enhance the reasoning abilities of Large Language Models (LLMs) by assigning specialized roles in the problem-solving process. Concurrently, Tree of Thoughts (ToT) methods have shown potential in improving reasoning for complex question-answering tasks by exploring diverse reasoning paths. A critical limitation in multi-agent reasoning is the 'Reasoner' agent's shallow exploration of reasoning paths. While ToT strategies could help mitigate this problem, they may generate flawed reasoning branches, which could harm the trustworthiness of the final answer. To leverage the strengths of both multi-agent reasoning and ToT strategies, we introduce a novel approach combining ToT-based Reasoner agents with a Thought Validator agent. Multiple Reasoner agents operate in parallel, employing ToT to explore diverse reasoning paths. The Thought Validator then scrutinizes these paths, considering a Reasoner's conclusion only if its reasoning is valid. This method enables a more robust voting strategy by discarding faulty reasoning paths, enhancing the system's ability to tackle tasks requiring systematic and trustworthy reasoning. Our method demonstrates superior performance compared to existing techniques when evaluated on the GSM8K dataset, outperforming the standard ToT strategy by an average 5.6\% across four LLMs.
多智能体策略已成为提高大型语言模型(LLMs)推理能力的有前途的方法,通过为解决问题过程分配专门角色。与此同时,树形思考(ToT)方法在改进复杂问题解答任务的推理能力方面显示出潜力,通过探索多种推理路径。多智能体推理的一个关键限制是“推理者” agent 对推理路径的浅尝辄止。虽然ToT策略可能有助于减轻这个问题,但它们可能会生成有缺陷的推理分支,这可能会损害最终答案的可信度。要充分利用多智能体推理和ToT策略的优势,我们引入了一种结合ToT基于推理器代理和思想验证代理的新型方法。多个推理器代理并行工作,使用ToT来探索多样化的推理路径。然后,思想验证代理审视这些路径,只有在推理器agent的结论正确的情况下才考虑这些路径。这种方法通过丢弃有缺陷的推理路径,增强了系统处理需要系统化和可信推理的任务的能力。在GSM8K数据集上评估我们的方法时,我们的方法表现出优于现有技术的优异性能,平均比标准ToT策略高5.6%。
https://arxiv.org/abs/2409.11527
This paper addresses a distributed leader-follower formation control problem for a group of agents, each using a body-fixed camera with a limited field of view (FOV) for state estimation. The main challenge arises from the need to coordinate the agents' movements with their cameras' FOV to maintain visibility of the leader for accurate and reliable state estimation. To address this challenge, we propose a novel perception-aware distributed leader-follower safe control scheme that incorporates FOV limits as state constraints. A Control Barrier Function (CBF) based quadratic program is employed to ensure the forward invariance of a safety set defined by these constraints. Furthermore, new neural network based and double bounding boxes based estimators, combined with temporal filters, are developed to estimate system states directly from real-time image data, providing consistent performance across various environments. Comparison results in the Gazebo simulator demonstrate the effectiveness and robustness of the proposed framework in two distinct environments.
本文解决了一个由多个具有有限视野(FOV)的相机组成的分布式领导-跟随者形成控制问题,每个相机都用于状态估计。主要挑战来自于需要协调代理人的运动与相机FOV,以保持对领导者的可见性,进行准确和可靠的态估计。为解决这一挑战,我们提出了一个新颖的感知意识分布式领导-跟随者安全控制方案,该方案将FOV限制作为状态约束。为了确保安全集合定义的限制,我们采用基于二次规划的控制障碍函数(CBF)。此外,我们还开发了基于新神经网络和双边界框的估计算器,结合时间滤波器,直接从实时图像数据中估计系统状态,从而在各种环境中提供一致的性能。在Gazebo仿真器中的比较结果证明了所提出的框架在两个不同的环境中具有有效性和鲁棒性。
https://arxiv.org/abs/2409.11394