In this study, we explore the application of Large Language Models (LLMs) in "Jubensha" (Chinese murder mystery role-playing games), a novel area in AI-driven gaming. We introduce the first Chinese dataset specifically for Jubensha, including character scripts and game rules, to foster AI agent development in this complex narrative environment. Our work also presents a unique multi-agent interaction framework using LLMs, allowing AI agents to autonomously engage in the game, enhancing the dynamics of Jubensha gameplay. To evaluate these AI agents, we developed specialized methods targeting their mastery of case information and reasoning skills. Furthermore, we incorporated the latest advancements in in-context learning to improve the agents' performance in critical aspects like information gathering, murderer detection, and logical reasoning. The experimental results validate the effectiveness of our proposed methods. This work aims to offer a fresh perspective on understanding LLM capabilities and establish a new benchmark for evaluating large language model-based agents to researchers in the field.
在这项研究中,我们探讨了在"Jubensha"(中文谋杀 mystery 角色扮演游戏)中应用大规模语言模型(LLMs)的情况,这是一个 AI 驱动游戏的新兴领域。我们介绍了专门针对 Jubensha 的第一个中文数据集,包括角色脚本和游戏规则,以促进 AI 代理在复杂的故事环境中开发。我们的工作还提出了使用 LLMs 的独特的多代理互动框架,使 AI 代理能够自主参与游戏,增强了 Jubensha 游戏玩法 dynamics。为了评估这些 AI 代理,我们开发了针对其对案情信息和推理能力的主观评估方法。此外,我们还采用了最先进的上下文学习技术,以提高代理在关键方面的性能,如信息收集、凶手检测和逻辑推理。实验结果证实了我们提出方法的有效性。本研究旨在为理解 LLM 能力提供一种新颖的视角,并为该领域的研究人员建立一个新的基准。
https://arxiv.org/abs/2312.00746
3D object detection in Bird's-Eye-View (BEV) space has recently emerged as a prevalent approach in the field of autonomous driving. Despite the demonstrated improvements in accuracy and velocity estimation compared to perspective view methods, the deployment of BEV-based techniques in real-world autonomous vehicles remains challenging. This is primarily due to their reliance on vision-transformer (ViT) based architectures, which introduce quadratic complexity with respect to the input resolution. To address this issue, we propose an efficient BEV-based 3D detection framework called BEVENet, which leverages a convolutional-only architectural design to circumvent the limitations of ViT models while maintaining the effectiveness of BEV-based methods. Our experiments show that BEVENet is 3$\times$ faster than contemporary state-of-the-art (SOTA) approaches on the NuScenes challenge, achieving a mean average precision (mAP) of 0.456 and a nuScenes detection score (NDS) of 0.555 on the NuScenes validation dataset, with an inference speed of 47.6 frames per second. To the best of our knowledge, this study stands as the first to achieve such significant efficiency improvements for BEV-based methods, highlighting their enhanced feasibility for real-world autonomous driving applications.
3D物体检测在Bird's-Eye-View(BEV)空间已成为自动驾驶领域的一种普遍方法。尽管与采用透视视图( perspective view)方法的相比,在准确性和速度估计方面显示的改进,但在现实世界的自动驾驶车辆中应用BEV基于的技术仍然具有挑战性。这主要是因为它们依赖于基于视觉Transformer(ViT)的架构,这引入了关于输入分辨率的两倍复杂度。为了应对这个问题,我们提出了一个高效的BEV基于3D检测框架,称为BEVENet,它利用卷积仅架构来绕过ViT模型的限制,同时保持BEV方法的有效性。我们的实验结果表明,BEVENet在NuScenes挑战中的速度是当今最先进的(SOTA)方法的3倍,在NuScenes验证数据集上实现了平均平均精度(mAP)为0.456和NuScenes检测得分(NDS)为0.555,推理速度为47.6帧每秒。据我们所知,这项研究是第一个实现如此显著的效率改进的BEV基于方法的研究,突出了它们在现实世界自动驾驶应用中更可行的特点。
https://arxiv.org/abs/2312.00633
This study delves into the flight behaviors of Budgerigars (Melopsittacus undulatus) to gain insights into their flight trajectories and movements. Using 3D reconstruction from stereo video camera recordings, we closely examine the velocity and acceleration patterns during three flight motion takeoff, flying and landing. The findings not only contribute to our understanding of bird behaviors but also hold significant implications for the advancement of algorithms in Unmanned Aerial Vehicles (UAVs). The research aims to bridge the gap between biological principles observed in birds and the application of these insights in developing more efficient and autonomous UAVs. In the context of the increasing use of drones, this study focuses on the biologically inspired principles drawn from bird behaviors, particularly during takeoff, flying and landing flight, to enhance UAV capabilities. The dataset created for this research sheds light on Budgerigars' takeoff, flying, and landing techniques, emphasizing their ability to control speed across different situations and surfaces. The study underscores the potential of incorporating these principles into UAV algorithms, addressing challenges related to short-range navigation, takeoff, flying, and landing.
本研究深入研究了斑鼩鼱(Melopsittacus undulatus)的飞行行为,以深入了解它们的飞行轨迹和运动。通过使用双摄像头立体视频相机记录的三维重建,我们详细研究了三种飞行动作起飞、飞行和着陆时的速度和加速度模式。研究结果不仅有助于我们了解鸟类的行为,还有助于促进无人驾驶飞行器(UAVs)算法的进步。该研究旨在将观察鸟类行为中发现的生物学原则与开发更高效和自主的UAV相结合。在无人机应用日益增加的背景下,本研究重点关注从鸟类行为中获得的生物启发原则,特别是起飞、飞行和着陆飞行,以提高UAV能力。这个研究数据集揭示了斑鼩鼱的起飞、飞行和着陆技术,强调了它们在不同情况和表面的控制速度的能力。本研究强调了将这些原则纳入UAV算法中的潜力,解决了与短距离导航、起飞、飞行和着陆相关的挑战。
https://arxiv.org/abs/2312.00597
The quest for fully autonomous vehicles (AVs) capable of navigating complex real-world scenarios with human-like understanding and responsiveness. In this paper, we introduce Dolphins, a novel vision-language model architected to imbibe human-like abilities as a conversational driving assistant. Dolphins is adept at processing multimodal inputs comprising video (or image) data, text instructions, and historical control signals to generate informed outputs corresponding to the provided instructions. Building upon the open-sourced pretrained Vision-Language Model, OpenFlamingo, we first enhance Dolphins's reasoning capabilities through an innovative Grounded Chain of Thought (GCoT) process. Then we tailored Dolphins to the driving domain by constructing driving-specific instruction data and conducting instruction tuning. Through the utilization of the BDD-X dataset, we designed and consolidated four distinct AV tasks into Dolphins to foster a holistic understanding of intricate driving scenarios. As a result, the distinctive features of Dolphins are characterized into two dimensions: (1) the ability to provide a comprehensive understanding of complex and long-tailed open-world driving scenarios and solve a spectrum of AV tasks, and (2) the emergence of human-like capabilities including gradient-free instant adaptation via in-context learning and error recovery via reflection.
寻求完全自动驾驶车辆(AVs),具有类似人类理解和响应能力,能够处理复杂的现实世界场景。在本文中,我们介绍了一个名为Dolphins的新型愿景语言模型架构,旨在将人类类似能力嵌入到交谈式自动驾驶助手中。Dolphins擅长处理多模态输入,包括视频(或图像)数据、文本指令和历史控制信号,以生成相应指令的知情的输出。 基于开源预训练的Vision-Language模型,OpenFlamingo,我们首先通过创新 grounded chain of thought (GCoT) 过程增强 Dolphins 的推理能力。然后,我们将 Dolphins tailored 到驾驶领域,通过构建驾驶特定指令数据并进行指令调整,使其适用于驾驶场景。 通过利用BDD-X数据集,我们将四个不同的AV任务设计合并到Dolphins中,以促进对复杂驾驶场景的全面理解。因此,Dolphins 的独特特征可以归结为两个维度: (1) 提供对复杂和长尾开放世界驾驶场景的全面理解和解决各种AV任务的能力,以及(2) 出现人类类似能力,包括通过上下文学习无梯度适应和通过反射进行错误恢复。
https://arxiv.org/abs/2312.00438
Stereo matching, a pivotal technique in computer vision, plays a crucial role in robotics, autonomous navigation, and augmented reality. Despite the development of numerous impressive methods in recent years, replicating their results and determining the most suitable architecture for practical application remains challenging. Addressing this gap, our paper introduces a comprehensive benchmark focusing on practical applicability rather than solely on performance enhancement. Specifically, we develop a flexible and efficient stereo matching codebase, called OpenStereo. OpenStereo includes training and inference codes of more than 12 network models, making it, to our knowledge, the most complete stereo matching toolbox available. Based on OpenStereo, we conducted experiments on the SceneFlow dataset and have achieved or surpassed the performance metrics reported in the original paper. Additionally, we conduct an in-depth revisitation of recent developments in stereo matching through ablative experiments. These investigations inspired the creation of StereoBase, a simple yet strong baseline model. Our extensive comparative analyses of StereoBase against numerous contemporary stereo matching methods on the SceneFlow dataset demonstrate its remarkably strong performance. The source code is available at this https URL.
立体匹配,计算机视觉中的一个关键技术,在机器人学、自主导航和增强现实等领域中发挥着重要作用。尽管近年来出现了许多令人印象深刻的算法,但复制其结果并确定最合适的实际应用架构仍然具有挑战性。为了填补这一空白,我们的论文介绍了一个全面的立体匹配基准,重点关注其实用性而不是仅仅关注性能提升。具体来说,我们开发了一个灵活且高效的立体匹配代码库,名为OpenStereo。OpenStereo包括超过12个网络模型的训练和推理代码,据我们所知,这是目前最完整的立体匹配工具箱。基于OpenStereo,我们在SceneFlow数据集上进行了实验,并取得了或超过了原始论文中报告的性能指标。此外,我们通过消减实验深入研究了立体匹配领域近期的发展。这些调查启发了StereoBase的创建,这是一个简单而强大的基线模型。我们对StereoBase与 SceneFlow 数据集中的众多当代立体匹配方法进行广泛的比较分析,证明了其惊人的性能。源代码可在此处访问:https://www.kazuhiko.net/StereoBase/
https://arxiv.org/abs/2312.00343
In this work, we present a novel framework for camera relocation in autonomous vehicles, leveraging deep neural networks (DNN). While existing literature offers various DNN-based camera relocation methods, their deployment is hindered by their high computational demands during inference. In contrast, our approach addresses this challenge through edge cloud collaboration. Specifically, we strategically offload certain modules of the neural network to the server and evaluate the inference time of data frames under different network segmentation schemes to guide our offloading decisions. Our findings highlight the vital role of server-side offloading in DNN-based camera relocation for autonomous vehicles, and we also discuss the results of data fusion. Finally, we validate the effectiveness of our proposed framework through experimental evaluation.
在这项工作中,我们提出了一种名为“深度神经网络(DNN)相机位移新框架”,用于自动驾驶车辆。该框架利用深度神经网络(DNN)。虽然现有的文献提供了各种基于DNN的相机位移方法,但它们在推理过程中的高计算需求会阻碍其部署。相比之下,我们的方法通过边缘云协同解决这个挑战。具体来说,我们通过将神经网络中的一些模块卸载到服务器来减轻计算负担,并评估不同网络分割方案下数据帧的推理时间,以指导我们的卸载决策。我们的研究结果突出了在DNN相机位移中,服务器端卸载在自动驾驶车辆中的至关重要作用,我们还讨论了数据融合的结果。最后,我们通过实验评估了我们提出的框架的有效性。
https://arxiv.org/abs/2312.00316
In this paper, we present a bilevel optimal motion planning (BOMP) model for autonomous parking. The BOMP model treats motion planning as an optimal control problem, in which the upper level is designed for vehicle nonlinear dynamics, and the lower level is for geometry collision-free constraints. The significant feature of the BOMP model is that the lower level is a linear programming problem that serves as a constraint for the upper-level problem. That is, an optimal control problem contains an embedded optimization problem as constraints. Traditional optimal control methods cannot solve the BOMP problem directly. Therefore, the modified approximate Karush-Kuhn-Tucker theory is applied to generate a general nonlinear optimal control problem. Then the pseudospectral optimal control method solves the converted problem. Particularly, the lower level is the $J_2$-function that acts as a distance function between convex polyhedron objects. Polyhedrons can approximate vehicles in higher precision than spheres or ellipsoids. Besides, the modified $J_2$-function (MJ) and the active-points based modified $J_2$-function (APMJ) are proposed to reduce the variables number and time complexity. As a result, an iteirative two-stage BOMP algorithm for autonomous parking concerning dynamical feasibility and collision-free property is proposed. The MJ function is used in the initial stage to find an initial collision-free approximate optimal trajectory and the active points, then the APMJ function in the final stage finds out the optimal trajectory. Simulation results and experiment on Turtlebot3 validate the BOMP model, and demonstrate that the computation speed increases almost two orders of magnitude compared with the area criterion based collision avoidance method.
在本文中,我们提出了一个双层最优运动规划(BOMP)模型用于自动驾驶停车。BOMP模型将运动规划视为一个最优控制问题,其中上层用于车辆非线性动力学,下层用于几何非碰撞约束。BOMP模型的关键特征是,下层是一个线性规划问题,作为上层问题的约束。即最优控制问题包含一个内嵌优化问题作为约束。传统的最优控制方法无法直接解决BOMP问题。因此,应用修正的近Kuhn-Tucker理论来生成一个一般非线性最优控制问题。然后伪谱最优控制方法解决了转换问题。特别地,下层是$J_2$函数,作为凸多面体对象之间的距离函数。多面体可以比球体或椭球体更精确地近似车辆。此外,还提出了基于主动点的修正$J_2$函数(MJ)和主动点基于修正$J_2$函数(APMJ)以减少变量数量和时间复杂度。因此,基于动态可行性和非碰撞性质的自驾停车递归两层BOMP算法被提出。$J_2$函数在初始阶段用于寻找一个初始非碰撞精确最优轨迹和主动点,然后$APMJ$函数在最后阶段确定最优轨迹。通过Turtlebot3的仿真结果和实验证实了BOMP模型,并表明与基于面积避障方法相比,计算速度提高了几乎两个数量级。
https://arxiv.org/abs/2312.00314
Robots operating in an open world will encounter novel objects with unknown physical properties, such as mass, friction, or size. These robots will need to sense these properties through interaction prior to performing downstream tasks with the objects. We propose a method that autonomously learns tactile exploration policies by developing a generative world model that is leveraged to 1) estimate the object's physical parameters using a differentiable Bayesian filtering algorithm and 2) develop an exploration policy using an information-gathering model predictive controller. We evaluate our method on three simulated tasks where the goal is to estimate a desired object property (mass, height or toppling height) through physical interaction. We find that our method is able to discover policies that efficiently gather information about the desired property in an intuitive manner. Finally, we validate our method on a real robot system for the height estimation task, where our method is able to successfully learn and execute an information-gathering policy from scratch.
在一个开放的世界中,机器人将遇到具有未知物理性质(如质量、摩擦或尺寸)的新物体。这些机器人需要在执行与物体相关的下游任务之前,通过交互来感知这些性质。我们提出了一种方法,通过开发一个基于差分贝叶斯滤波算法的生成式世界模型来自主学习触觉探索策略,以估计物体的物理参数。此外,我们还使用信息收集模型预测控制器来开发探索策略。我们在三个模拟任务中评估我们的方法,这些任务的目标是通过物理交互估计所需物体的属性(质量、高度或倾翻高度)。我们发现,我们的方法能够以直观的方式发现有效的信息收集策略。最后,我们在高度估计任务上验证了我们的方法,我们的方法能够成功地从头开始学习并执行信息收集策略。
https://arxiv.org/abs/2312.00215
Artificial intelligence (AI) technology has become increasingly prevalent and transforms our everyday life. One important application of AI technology is the development of autonomous vehicles (AV). However, the reliability of an AV needs to be carefully demonstrated via an assurance test so that the product can be used with confidence in the field. To plan for an assurance test, one needs to determine how many AVs need to be tested for how many miles and the standard for passing the test. Existing research has made great efforts in developing reliability demonstration tests in the other fields of applications for product development and assessment. However, statistical methods have not been utilized in AV test planning. This paper aims to fill in this gap by developing statistical methods for planning AV reliability assurance tests based on recurrent events data. We explore the relationship between multiple criteria of interest in the context of planning AV reliability assurance tests. Specifically, we develop two test planning strategies based on homogeneous and non-homogeneous Poisson processes while balancing multiple objectives with the Pareto front approach. We also offer recommendations for practical use. The disengagement events data from the California Department of Motor Vehicles AV testing program is used to illustrate the proposed assurance test planning methods.
人工智能(AI)技术已成为我们日常生活 increasingly普及,其中一个重要的应用是自动驾驶车辆(AV)的发展。然而,为了在市场上对AV产品充满信心地使用,需要通过一个验证测试来证明AV的可靠性。为计划验证测试,需要确定需要测试多少AV以证明多少英里,以及通过测试的标准。在应用程序开发和评估领域,其他领域已经做了很大努力来开发可靠性演示测试。然而,在AV测试计划中尚未充分利用统计方法。本文旨在填补这一空白,通过基于事件数据的发展AV可靠性保证测试的统计方法。我们探讨了在规划AV可靠性保证测试时,多个利益相关标准之间的关系。具体来说,我们在基于同质和异质泊松过程的条件下,通过帕累托前沿方法平衡多个目标。我们还提供了实际应用的建议。加利福尼亚州交通部AV测试项目的脱敏事件数据为例,展示了所提出的保证测试规划方法。
https://arxiv.org/abs/2312.00186
Simulators are widely used to test Autonomous Driving Systems (ADS), but their potential flakiness can lead to inconsistent test results. We investigate test flakiness in simulation-based testing of ADS by addressing two key questions: (1) How do flaky ADS simulations impact automated testing that relies on randomized algorithms? and (2) Can machine learning (ML) effectively identify flaky ADS tests while decreasing the required number of test reruns? Our empirical results, obtained from two widely-used open-source ADS simulators and five diverse ADS test setups, show that test flakiness in ADS is a common occurrence and can significantly impact the test results obtained by randomized algorithms. Further, our ML classifiers effectively identify flaky ADS tests using only a single test run, achieving F1-scores of $85$%, $82$% and $96$% for three different ADS test setups. Our classifiers significantly outperform our non-ML baseline, which requires executing tests at least twice, by $31$%, $21$%, and $13$% in F1-score performance, respectively. We conclude with a discussion on the scope, implications and limitations of our study. We provide our complete replication package in a Github repository.
模拟器广泛用于测试自动驾驶系统(ADS),但它们潜在的可靠性问题可能导致测试结果不统一。我们通过回答两个关键问题来研究基于模拟器的ADS测试中的测试可靠性: (1)如何影响依赖随机算法的自动测试中的flaky ADS模拟? (2)机器学习(ML)能否有效地在减少测试重跑数量的同时识别flaky ADS测试? 我们通过使用两个广泛使用的开源ADS模拟器和五个不同的ADS测试环境获得的实证结果表明,ADS中的测试可靠性是一个普遍存在的问题,并可能显著影响通过随机算法获得的测试结果。此外,我们的ML分类器仅使用一个测试运行即可有效识别flaky ADS测试,实现F1分数分别为85%,82%和96%。我们的分类器在F1分数性能上显著优于我们的非ML基线,分别提高了31%,21%和13%。我们得出结论,就本研究而言,我们的工作提供了范围、影响和局限性的讨论。我们在Github存储库中提供了完整的重复软件包。
https://arxiv.org/abs/2311.18768
Recently, the incredible progress of large language models (LLMs) has ignited the spark of task automation, which decomposes the complex tasks described by user instructions into sub-tasks, and invokes external tools to execute them, and plays a central role in autonomous agents. However, there lacks a systematic and standardized benchmark to foster the development of LLMs in task automation. To this end, we introduce TaskBench to evaluate the capability of LLMs in task automation. Specifically, task automation can be formulated into three critical stages: task decomposition, tool invocation, and parameter prediction to fulfill user intent. This complexity makes data collection and evaluation more challenging compared to common NLP tasks. To generate high-quality evaluation datasets, we introduce the concept of Tool Graph to represent the decomposed tasks in user intent, and adopt a back-instruct method to simulate user instruction and annotations. Furthermore, we propose TaskEval to evaluate the capability of LLMs from different aspects, including task decomposition, tool invocation, and parameter prediction. Experimental results demonstrate that TaskBench can effectively reflects the capability of LLMs in task automation. Benefiting from the mixture of automated data construction and human verification, TaskBench achieves a high consistency compared to the human evaluation, which can be utilized as a comprehensive and faithful benchmark for LLM-based autonomous agents.
近年来,大型语言模型(LLMs)的进步点燃了任务自动化的火花,它将用户指令描述的复杂任务分解为子任务,并调用外部工具来执行它们,在自主代理中发挥了关键作用。然而,在任务自动化方面缺乏一个系统化和标准化的基准来促进LLM在任务自动化方面的开发。为此,我们引入了TaskBench来评估LLM在任务自动化方面的能力。具体来说,任务自动化可以划分为三个关键阶段:任务分解、工具调用和参数预测以满足用户意图。这种复杂性使得数据收集和评估更加具有挑战性,与常见的自然语言处理任务相比。为了生成高质量的评价数据集,我们引入了工具图的概念来表示用户意图中的分解任务,并采用反向指令方法来模拟用户指令和注释。此外,我们提出了TaskEval来评估LLM的不同方面,包括任务分解、工具调用和参数预测。实验结果表明,TaskBench可以有效地反映LLM在任务自动化方面的能力。得益于自动数据构建和人类验证的混合,TaskBench在人类评估的基础上实现了高一致性,可以作为LLM为基础的自控代理的全面和真实的基准。
https://arxiv.org/abs/2311.18760
Assisted and autonomous driving are rapidly gaining momentum, and will soon become a reality. Among their key enablers, artificial intelligence and machine learning are expected to play a prominent role, also thanks to the massive amount of data that smart vehicles will collect from their onboard sensors. In this domain, federated learning is one of the most effective and promising techniques for training global machine learning models, while preserving data privacy at the vehicles and optimizing communications resource usage. In this work, we propose VREM-FL, a computation-scheduling co-design for vehicular federated learning that leverages mobility of vehicles in conjunction with estimated 5G radio environment maps. VREM-FL jointly optimizes the global model learned at the server while wisely allocating communication resources. This is achieved by orchestrating local computations at the vehicles in conjunction with the transmission of their local model updates in an adaptive and predictive fashion, by exploiting radio channel maps. The proposed algorithm can be tuned to trade model training time for radio resource usage. Experimental results demonstrate the efficacy of utilizing radio maps. VREM-FL outperforms literature benchmarks for both a linear regression model (learning time reduced by 28%) and a deep neural network for a semantic image segmentation task (doubling the number of model updates within the same time window).
辅助和自动驾驶技术正在迅速获得势头,不久将变为现实。在这些关键推动力中,人工智能和机器学习预计将扮演突出角色,这得益于智能车辆从车载传感器大量收集的数据。在这个领域,联邦学习是训练全局机器学习模型最有效和最有前途的技术之一,同时保护车辆中的数据隐私,优化通信资源的使用。在这项工作中,我们提出了VREM-FL,一种车载联邦学习计算调度方法,它利用了车辆的流动性与估计5G无线环境地图相结合。VREM-FL在服务器上共同优化全局模型,同时智能地分配通信资源。通过利用无线通道图进行车辆局部计算,实现了这一点。所提出的算法可以通过调整模型训练时间来换取无线电资源使用量。实验结果证明了利用无线地图的有效性。VREM-FL在线性回归模型(学习时间减少28%)和语义图像分割任务的深度神经网络方面都超过了文献基准。
https://arxiv.org/abs/2311.18741
Precisely predicting the future trajectories of surrounding traffic participants is a crucial but challenging problem in autonomous driving, due to complex interactions between traffic agents, map context and traffic rules. Vector-based approaches have recently shown to achieve among the best performances on trajectory prediction benchmarks. These methods model simple interactions between traffic agents but don't distinguish between relation-type and attributes like their distance along the road. Furthermore, they represent lanes only by sequences of vectors representing center lines and ignore context information like lane dividers and other road elements. We present a novel approach for vector-based trajectory prediction that addresses these shortcomings by leveraging three crucial sources of information: First, we model interactions between traffic agents by a semantic scene graph, that accounts for the nature and important features of their relation. Second, we extract agent-centric image-based map features to model the local map context. Finally, we generate anchor paths to enforce the policy in multi-modal prediction to permitted trajectories only. Each of these three enhancements shows advantages over the baseline model HoliGraph.
精确预测周围交通参与者的未来轨迹是自动驾驶中的一个关键但具有挑战性的问题,因为交通代理、地图上下文和交通规则之间的复杂相互作用。基于向量的方法在最近的研究中显示出在轨迹预测基准测试中的最佳性能。这些方法简单地建模了交通代理之间的交互,但没有区分关系类型和道路上的属性。此外,它们仅通过表示中心线序列的向量来表示车道,而忽略了上下文信息如分道符和其他道路元素。我们提出了一个全新的基于向量的轨迹预测方法,通过利用三个关键信息来源来解决这些问题:首先,我们通过语义场景图来建模交通代理之间的交互,考虑了它们关系自然和重要特征;其次,我们提取了以图像为基础的代理中心化地图特征来建模局部地图上下文;最后,我们生成了约束路径,以在多模态预测中仅允许所允许的轨迹。这三个增强都超过了基线模型HoliGraph的优势。
https://arxiv.org/abs/2311.18553
In recent years, learning-based approaches have demonstrated significant promise in addressing intricate navigation tasks. Traditional methods for training deep neural network navigation policies rely on meticulously designed reward functions or extensive teleoperation datasets as navigation demonstrations. However, the former is often confined to simulated environments, and the latter demands substantial human labor, making it a time-consuming process. Our vision is for robots to autonomously learn navigation skills and adapt their behaviors to environmental changes without any human intervention. In this work, we discuss the self-supervised navigation problem and present Dynamic Graph Memory (DGMem), which facilitates training only with on-board observations. With the help of DGMem, agents can actively explore their surroundings, autonomously acquiring a comprehensive navigation policy in a data-efficient manner without external feedback. Our method is evaluated in photorealistic 3D indoor scenes, and empirical studies demonstrate the effectiveness of DGMem.
近年来,基于学习的导航方法在解决复杂导航任务方面显示出巨大的潜力。传统的方法在训练深度神经网络导航策略时,通常依赖于精心设计的奖励函数或广泛的遥操作数据集作为导航演示。然而,前者的常见局限是限制在模拟环境中,而后者需要大量的人力劳动,使得它是一个耗时过程。我们的愿景是让机器人自主学习导航技能,并能够适应环境变化而无需任何人类干预。在这项工作中,我们讨论了自监督导航问题,并介绍了Dynamic Graph Memory(DGMem),它通过车载观测数据进行训练。DGMem可以帮助代理主动探索周围环境,以数据 efficient 的方式在没有外部反馈的情况下自主获取全面的导航策略。我们的方法在实况三维室内场景中进行评估,实证研究证明了DGMem的有效性。
https://arxiv.org/abs/2311.18473
Heterogeneous robots equipped with multi-modal sensors (e.g., UAV, wheeled and legged terrestrial robots) provide rich and complementary functions that may help human operators to accomplish complex tasks in unknown environments. However, seamlessly integrating heterogeneous agents and making them interact and collaborate still arise challenging issues. In this paper, we define a ROS 2 based software architecture that allows to build incarnated heterogeneous multi-agent systems (HMAS) in a generic way. We showcase its effectiveness through a scenario integrating aerial drones, quadruped robots, and human operators (see this https URL). In addition, agent spatial awareness in unknown outdoor environments is a critical step for realizing autonomous individual movements, interactions, and collaborations. Through intensive experimental measurements, RTK-GPS is shown to be a suitable solution for achieving the required locating accuracy.
异构机器人配备多种传感器(例如,无人机、轮式和腿式地面机器人)为人类操作员提供丰富和互补的功能,可能有助于在未知环境中完成复杂任务。然而,实现异构代理器的无缝集成以及使它们相互交互和合作仍然具有挑战性。在本文中,我们定义了一个基于ROS 2的软件架构,允许以通用方式构建化身异质多代理系统(HMAS)。我们通过一个场景展示了其有效性的(见此链接)。此外,对于实现自主个人运动、交互和合作,了解未知室外环境中的代理器空间感知至关重要。通过密集的实验测量,RTK-GPS被证明是一个合适的解决方案,以实现所需的定位精度。
https://arxiv.org/abs/2311.18394
Advanced vehicle control is a fundamental building block in the development of autonomous driving systems. Reinforcement learning (RL) promises to achieve control performance superior to classical approaches while keeping computational demands low during deployment. However, standard RL approaches like soft-actor critic (SAC) require extensive amounts of training data to be collected and are thus impractical for real-world application. To address this issue, we apply recently developed data-efficient deep RL methods to vehicle trajectory control. Our investigation focuses on three methods, so far unexplored for vehicle control: randomized ensemble double Q-learning (REDQ), probabilistic ensembles with trajectory sampling and model predictive path integral optimizer (PETS-MPPI), and model-based policy optimization (MBPO). We find that in the case of trajectory control, the standard model-based RL formulation used in approaches like PETS-MPPI and MBPO is not suitable. We, therefore, propose a new formulation that splits dynamics prediction and vehicle localization. Our benchmark study on the CARLA simulator reveals that the three identified data-efficient deep RL approaches learn control strategies on a par with or better than SAC, yet reduce the required number of environment interactions by more than one order of magnitude.
高级车辆控制是自动驾驶系统开发的基本构建模块。强化学习(RL)承诺在保持计算开销低的同时,实现经典方法的控制性能超群。然而,标准的强化学习方法(如软actor批判(SAC))需要大量训练数据来收集,因此在实际应用中不实用。为解决这个问题,我们将最近开发的数据效率高的深度强化学习方法应用于车辆轨迹控制。我们的研究聚焦于三种尚未在车辆控制中探索的方法:随机集成双Q学习(REDQ)、具有轨迹采样和模型预测路径积分优化器(PETS-MPPI)的概率集成以及基于模型的策略优化(MBPO)。我们发现,在轨迹控制方面,类似于PETS-MPPI和MBPO的方法使用的标准模型基强化学习公式是不合适的。因此,我们提出了一个新的公式,将动态预测和车辆定位分割开来。我们对CARLA仿真器的基准研究揭示了,这三种数据效率高的深度强化学习方法与SAC相当,甚至具有更好的控制策略,同时将所需的环境交互数量减少了一个量级。
https://arxiv.org/abs/2311.18393
Adept traffic models are critical to both planning and closed-loop simulation for autonomous vehicles (AV), and key design objectives include accuracy, diverse multimodal behaviors, interpretability, and downstream compatibility. Recently, with the advent of large language models (LLMs), an additional desirable feature for traffic models is LLM compatibility. We present Categorical Traffic Transformer (CTT), a traffic model that outputs both continuous trajectory predictions and tokenized categorical predictions (lane modes, homotopies, etc.). The most outstanding feature of CTT is its fully interpretable latent space, which enables direct supervision of the latent variable from the ground truth during training and avoids mode collapse completely. As a result, CTT can generate diverse behaviors conditioned on different latent modes with semantic meanings while beating SOTA on prediction accuracy. In addition, CTT's ability to input and output tokens enables integration with LLMs for common-sense reasoning and zero-shot generalization.
先进的交通模型对于自动驾驶车辆(AV)的规划和闭环仿真至关重要,关键的设计目标包括准确性、多样化的多模态行为、可解释性和下游兼容性。最近,随着大型语言模型的(LLMs)的出现,交通模型的另一个有益的特征是LLM兼容性。我们提出了Categorical Traffic Transformer(CTT),一种交通模型,它输出连续轨迹预测和标记分类预测(车道模式、同构等)。CTT最引人注目的特点是其具有完全可解释的潜在空间,在训练过程中可以直接从地面真实值监督潜在变量,并完全避免模态崩溃。因此,CTT可以在预测准确率的基础上生成各种行为,具有语义含义。此外,CTT的输入和输出标记使其能够与LLMs实现常识推理和零击穿泛化。
https://arxiv.org/abs/2311.18307
Trajectory prediction is of significant importance in computer vision. Accurate pedestrian trajectory prediction benefits autonomous vehicles and robots in planning their motion. Pedestrians' trajectories are greatly influenced by their intentions. Prior studies having introduced various deep learning methods only pay attention to the spatial and temporal information of trajectory, overlooking the explicit intention information. In this study, we introduce a novel model, termed the \textbf{S-T CRF}: \textbf{S}patial-\textbf{T}emporal \textbf{C}onditional \textbf{R}andom \textbf{F}ield, which judiciously incorporates intention information besides spatial and temporal information of trajectory. This model uses a Conditional Random Field (CRF) to generate a representation of future intentions, greatly improving the prediction of subsequent trajectories when combined with spatial-temporal representation. Furthermore, the study innovatively devises a space CRF loss and a time CRF loss, meticulously designed to enhance interaction constraints and temporal dynamics, respectively. Extensive experimental evaluations on dataset ETH/UCY and SDD demonstrate that the proposed method surpasses existing baseline approaches.
轨迹预测在计算机视觉中具有重要的意义。准确的行人轨迹预测对自动驾驶车辆和机器人规划其运动具有积极影响。行人的轨迹很大程度上受到其意图的影响。之前的研究引入了各种深度学习方法,仅关注轨迹的空间和时间信息,而忽略了显式意图信息。在本文中,我们引入了一种名为S-T CRF的新模型,即空间-时间条件随机场模型,该模型在空间和时间信息的基础上,还恰当地融入了意图信息。该模型使用条件随机场(CRF)生成未来意图的表示,大大提高了与空间-时间表示结合后的后续轨迹预测。此外,研究创新地设计了一个空间CRF损失和一个时间CRF损失,分别用于增强交互约束和时间动态。对数据集Eth/UCY和SDD的广泛实验评估证明,与现有基线方法相比,所提出的方法超越了它们。
https://arxiv.org/abs/2311.18198
Modern astronomical experiments are designed to achieve multiple scientific goals, from studies of galaxy evolution to cosmic acceleration. These goals require data of many different classes of night-sky objects, each of which has a particular set of observational needs. These observational needs are typically in strong competition with one another. This poses a challenging multi-objective optimization problem that remains unsolved. The effectiveness of Reinforcement Learning (RL) as a valuable paradigm for training autonomous systems has been well-demonstrated, and it may provide the basis for self-driving telescopes capable of optimizing the scheduling for astronomy campaigns. Simulated datasets containing examples of interactions between a telescope and a discrete set of sky locations on the celestial sphere can be used to train an RL model to sequentially gather data from these several locations to maximize a cumulative reward as a measure of the quality of the data gathered. We use simulated data to test and compare multiple implementations of a Deep Q-Network (DQN) for the task of optimizing the schedule of observations from the Stone Edge Observatory (SEO). We combine multiple improvements on the DQN and adjustments to the dataset, showing that DQNs can achieve an average reward of 87%+-6% of the maximum achievable reward in each state on the test set. This is the first comparison of offline RL algorithms for a particular astronomical challenge and the first open-source framework for performing such a comparison and assessment task.
现代天文学实验旨在实现多个科学目标,从研究星系演化到宇宙加速度。这些目标需要许多不同类型的夜间天体数据,每种数据都有特定的观测需求。这些观测需求通常相互竞争。这提出了一个具有挑战性的多目标优化问题,尚未解决。强化学习(RL)作为一种有价值的自适应系统训练范式已被充分证明,可能为自主驾驶望远镜提供基础,这些望远镜能够优化天文活动日程安排。包含望远镜与天体表面上的一组离散位置之间的互动的模拟数据可用于训练一个RL模型,使其从这几个位置依次收集数据,以最大化累积奖励,作为衡量数据质量的指标。我们使用模拟数据来测试和比较多个实现深度Q网络(DQN)用于优化Stone Edge天文台(SEO)观测日程的多个实现。我们结合了多个DQN的改进和数据集的调整,展示了DQNs可以在测试集中实现平均奖励的最大值加上-6%。这是针对特定天文挑战的离线RL算法的第一轮比较,也是第一个开放源代码框架,用于执行这种比较和评估任务。
https://arxiv.org/abs/2311.18094
Shape-morphing capabilities are crucial for enabling multifunctionality in both biological and artificial systems. Various strategies for shape morphing have been proposed for applications in metamaterials and robotics. However, few of these approaches have achieved the ability to seamlessly transform into a multitude of volumetric shapes post-fabrication using a relatively simple actuation and control mechanism. Taking inspiration from thick origami and hierarchies in nature, we present a new hierarchical construction method based on polyhedrons to create an extensive library of compact origami metastructures. We show that a single hierarchical origami structure can autonomously adapt to over 103 versatile architectural configurations, achieved with the utilization of fewer than 3 actuation degrees of freedom and employing simple transition kinematics. We uncover the fundamental principles governing theses shape transformation through theoretical models. Furthermore, we also demonstrate the wide-ranging potential applications of these transformable hierarchical structures. These include their uses as untethered and autonomous robotic transformers capable of various gait-shifting and multidirectional locomotion, as well as rapidly self-deployable and self-reconfigurable architecture, exemplifying its scalability up to the meter scale. Lastly, we introduce the concept of multitask reconfigurable and deployable space robots and habitats, showcasing the adaptability and versatility of these metastructures.
形状变形能力对于在生物和人工系统中实现多功能至关重要。为应用在metamaterials和机器人领域,已经提出了许多形状变形的策略。然而,很少有这些方法在生产后使用相对简单的驱动和控制机制,实现无缝地转换成多种体积形状。从自然界中的厚纸花和层次结构中汲取灵感,我们提出了基于多面体的新的层次结构构建方法,以创建一个庞大的压缩 origami 子结构库。我们证明了,单个层次结构可以通过利用不到3个驱动自由度,采用简单的转换运动学,实现超过103种多样化的建筑设计。我们通过理论模型揭示了这些形状变换的基本原理。此外,我们还展示了这些可转换层次结构广泛的潜在应用。这些应用包括它们作为无缆和自主机器人变体,具有各种步移和多方向移动能力,以及快速自部署和自重构建筑,展示了其扩展到米级尺寸的潜力。最后,我们引入了多任务可重构和部署空间机器人和栖息地的概念,展示了这些子结构的适应性和多样性。
https://arxiv.org/abs/2311.18055