Collaborative perception, which greatly enhances the sensing capability of connected and autonomous vehicles (CAVs) by incorporating data from external resources, also brings forth potential security risks. CAVs' driving decisions rely on remote untrusted data, making them susceptible to attacks carried out by malicious participants in the collaborative perception system. However, security analysis and countermeasures for such threats are absent. To understand the impact of the vulnerability, we break the ground by proposing various real-time data fabrication attacks in which the attacker delivers crafted malicious data to victims in order to perturb their perception results, leading to hard brakes or increased collision risks. Our attacks demonstrate a high success rate of over 86\% on high-fidelity simulated scenarios and are realizable in real-world experiments. To mitigate the vulnerability, we present a systematic anomaly detection approach that enables benign vehicles to jointly reveal malicious fabrication. It detects 91.5% of attacks with a false positive rate of 3% in simulated scenarios and significantly mitigates attack impacts in real-world scenarios.
协同感知技术通过从外部资源中整合数据,大大提高了连接和自主车辆(CAV)的感知能力,但也带来了潜在的安全风险。 CAV 的驾驶决策依赖于远程不可信的数据,使其容易受到在协同感知系统中恶意参与者的攻击。然而,对此类威胁的安全分析和对策却不存在。为了理解漏洞的影响,我们提出了各种实时数据伪造攻击,攻击者向受害者发送精心构造的恶意数据,以干扰其感知结果,导致强硬刹车或增加碰撞风险。我们的攻击在高保真的模拟场景中表现出超过 86% 的成功率和真实的实验可以实现。为了缓解漏洞的影响,我们提出了一种系统性异常检测方法,使良性车辆能够共同揭露恶意伪造。该方法在模拟场景中检测到 91.5% 的攻击,但假阳性率仅为 3%,在真实的场景中显著减轻了攻击影响。
https://arxiv.org/abs/2309.12955
In recent years, Artificial Neural Networks (ANN) have become a standard in robotic control. However, a significant drawback of large-scale ANNs is their increased power consumption. This becomes a critical concern when designing autonomous aerial vehicles, given the stringent constraints on power and weight. Especially in the case of blimps, known for their extended endurance, power-efficient control methods are essential. Spiking neural networks (SNN) can provide a solution, facilitating energy-efficient and asynchronous event-driven processing. In this paper, we have evolved SNNs for accurate altitude control of a non-neutrally buoyant indoor blimp, relying solely on onboard sensing and processing power. The blimp's altitude tracking performance significantly improved compared to prior research, showing reduced oscillations and a minimal steady-state error. The parameters of the SNNs were optimized via an evolutionary algorithm, using a Proportional-Derivative-Integral (PID) controller as the target signal. We developed two complementary SNN controllers while examining various hidden layer structures. The first controller responds swiftly to control errors, mitigating overshooting and oscillations, while the second minimizes steady-state errors due to non-neutral buoyancy-induced drift. Despite the blimp's drivetrain limitations, our SNN controllers ensured stable altitude control, employing only 160 spiking neurons.
近年来,人工神经网络(ANN)已成为机器人控制的标准。然而,大规模ANN的一个严重缺点是其增加的功耗。考虑到功率和重量的严格限制,在设计自主飞行飞行器时,这个问题变得至关重要。特别是考虑到风筝这种以其持久的耐力著称的飞行器,高效的控制方法是至关重要的。脉冲神经网络(SNN)可以提供解决方案,以促进高效的、异步的事件驱动处理。在本文中,我们演化了SNNs,以精确控制一个非中性浮力室内风筝的海拔,仅依靠体内的感知和处理能力。风筝的海拔跟踪性能相比先前的研究显著提高,减少了振荡,并最小化了稳定的误差。SNNs的参数通过进化算法进行了优化,使用比例-积分(PID)控制器作为目标信号。在检查各种隐藏层结构的同时,我们开发了两个互补的SNN控制器。第一个控制器迅速响应控制错误,减轻过度延伸和振荡,而第二个控制器由于非中性浮力引起的漂移最小化了稳定的误差。尽管风筝的动力系统限制,我们的SNN控制器确保了稳定的海拔控制,仅使用了160个脉冲神经元。
https://arxiv.org/abs/2309.12937
This paper addresses the problem of safety-critical control of autonomous robots, considering the ubiquitous uncertainties arising from unmodeled dynamics and noisy sensors. To take into account these uncertainties, probabilistic state estimators are often deployed to obtain a belief over possible states. Namely, Particle Filters (PFs) can handle arbitrary non-Gaussian distributions in the robot's state. In this work, we define the belief state and belief dynamics for continuous-discrete PFs and construct safe sets in the underlying belief space. We design a controller that provably keeps the robot's belief state within this safe set. As a result, we ensure that the risk of the unknown robot's state violating a safety specification, such as avoiding a dangerous area, is bounded. We provide an open-source implementation as a ROS2 package and evaluate the solution in simulations and hardware experiments involving high-dimensional belief spaces.
本文解决了自主机器人安全控制的问题,考虑了由未建模动力学和噪声传感器带来的无处不在的不确定性。为了考虑这些不确定性,常常使用概率状态估计器来获取对可能状态的信仰。例如,粒子滤波器(PFs)可以在机器人状态中处理任意非高斯分布。在本文中,我们定义了连续离散PF的信仰状态和信仰动态,并在信仰空间 underlying belief space 中构建安全集。我们设计了一个控制器,可以证明将该机器人的信仰状态保持在安全集内。因此,我们确保未知机器人状态违反了安全规格,例如避免危险区域的风险是有限的。我们提供了一份开源实现,作为ROS2包,并在涉及高维信仰空间的模拟和硬件实验中评估了解决方案。
https://arxiv.org/abs/2309.12857
We address the challenge of enhancing navigation autonomy for planetary space rovers using reinforcement learning (RL). The ambition of future space missions necessitates advanced autonomous navigation capabilities for rovers to meet mission objectives. RL's potential in robotic autonomy is evident, but its reliance on simulations poses a challenge. Transferring policies to real-world scenarios often encounters the "reality gap", disrupting the transition from virtual to physical environments. The reality gap is exacerbated in the context of mapless navigation on Mars and Moon-like terrains, where unpredictable terrains and environmental factors play a significant role. Effective navigation requires a method attuned to these complexities and real-world data noise. We introduce a novel two-stage RL approach using offline noisy data. Our approach employs a teacher-student policy learning paradigm, inspired by the "learning by cheating" method. The teacher policy is trained in simulation. Subsequently, the student policy is trained on noisy data, aiming to mimic the teacher's behaviors while being more robust to real-world uncertainties. Our policies are transferred to a custom-designed rover for real-world testing. Comparative analyses between the teacher and student policies reveal that our approach offers improved behavioral performance, heightened noise resilience, and more effective sim-to-real transfer.
我们解决了利用强化学习(RL)提高行星空间机器人自主导航的挑战。未来太空任务的雄心需要机器人满足任务目标,因此需要提高机器人的自主导航能力。RL在机器人自主导航方面的潜力是显而易见的,但依赖仿真面临挑战。将策略应用到现实世界场景时经常遇到“现实差距”,破坏了从虚拟到物理环境的转型。在火星和类似月球的地形上,地形和环境因素的作用至关重要,现实差距更加严重。有效的导航需要适应这些复杂性和现实世界数据噪声的方法。我们介绍了一种使用离线噪声数据的新两阶段RL方法。我们采用学生老师的政策学习范式,受到“通过欺骗学习”方法的启发。老师政策在仿真中训练,随后,学生政策在噪声数据上训练,旨在模仿老师的行为,同时更加鲁棒地应对现实世界的不确定性。我们的政策被转移到一个定制的机器人进行现实世界测试。教师和学生政策之间的比较分析表明,我们的方法提供了改进的行为表现、加强噪声恢复力,以及更有效的模拟到现实的转移。
https://arxiv.org/abs/2309.12807
Trajectory tracking control of autonomous trolley collection robots (ATCR) is an ambitious work due to the complex environment, serious noise and external disturbances. This work investigates a control scheme for ATCR subjecting to severe environmental interference. A kinematics model based adaptive sliding mode disturbance observer with fast convergence is first proposed to estimate the lumped disturbances. On this basis, a robust controller with prescribed performance is proposed using a backstepping technique, which improves the transient performance and guarantees fast convergence. Simulation outcomes have been provided to illustrate the effectiveness of the proposed control scheme.
自主拖拽机器人收集机器人的路径跟踪控制是一项具有挑战性的项目,因为环境非常复杂、噪声非常严重以及外部干扰。该项目研究了受严重环境影响的ATCR控制方案。基于运动学模型的自适应滑动模式干扰观察器被提出,以估计积聚的干扰。基于这种方法,提出了一种具有规定性能的鲁棒控制器,使用回退技术,可以提高暂态性能并保证快速收敛。模拟结果提供了以说明所提出的控制方案有效性的示例。
https://arxiv.org/abs/2309.12660
With the long-term goal of reducing the image processing time on an autonomous mobile robot in mind we explore in this paper the use of log-polar like image data with gaze control. The gaze control is not done on the Cartesian image but on the log-polar like image data. For this we start out from the classic deep reinforcement learning approach for Atari games. We extend an A3C deep RL approach with an LSTM network, and we learn the policy for playing three Atari games and a policy for gaze control. While the Atari games already use low-resolution images of 80 by 80 pixels, we are able to further reduce the amount of image pixels by a factor of 5 without losing any gaming performance.
着眼于减少自主移动机器人图像处理时间的长期目标,本文探讨了使用log-polar like图像数据和 gaze control的方法。 gaze control 不是通过坐标图像实现的,而是通过log-polar like图像数据进行的。为此,我们从头开始了经典的Deep Reinforcement Learning方法,扩展了A3C Deep RL方法,并学习了玩三件Atari游戏和 gaze control 的政策。虽然Atari游戏已经使用低分辨率的图像,如80×80像素的图像,但我们能够通过一个因子5的减少图像像素数量,而不影响游戏性能。
https://arxiv.org/abs/2309.12634
Language models (LMs) are no longer restricted to ML community, and instruction-tuned LMs have led to a rise in autonomous AI agents. As the accessibility of LMs grows, it is imperative that an understanding of their capabilities, intended usage, and development cycle also improves. Model cards are a popular practice for documenting detailed information about an ML model. To automate model card generation, we introduce a dataset of 500 question-answer pairs for 25 ML models that cover crucial aspects of the model, such as its training configurations, datasets, biases, architecture details, and training resources. We employ annotators to extract the answers from the original paper. Further, we explore the capabilities of LMs in generating model cards by answering questions. Our initial experiments with ChatGPT-3.5, LLaMa, and Galactica showcase a significant gap in the understanding of research papers by these aforementioned LMs as well as generating factual textual responses. We posit that our dataset can be used to train models to automate the generation of model cards from paper text and reduce human effort in the model card curation process. The complete dataset is available on this https URL
语言模型(LM)不再局限于机器学习社区,经过指令调整的LM已经导致自主人工智能代理的兴起。随着LM的可用性不断增加,理解其能力、预期使用和开发周期也变得越来越重要。模型卡片是一种常见的方法,用于记录一个机器学习模型的详细信息。为了自动化模型卡片的生成,我们介绍了一个包含25个机器学习模型的500问答对的 dataset,涵盖了模型的关键方面,例如训练配置、数据集、偏差、建筑细节和训练资源。我们雇用了标注员从原始论文中抽取答案。进一步,我们探索了LM 在回答问题时生成模型卡片的能力。我们对ChatGPT-3.5、LLaMa和Galactica的最初实验展示了上述LMs对研究论文的理解和生成实际文本响应方面存在显著差距。我们假设我们的 dataset 可以用于训练模型自动从文本中提取模型卡片,并减少模型卡片整理过程中人类的工作量。完整 dataset 可以在 this https URL 上找到。
https://arxiv.org/abs/2309.12616
Optimization-based safety filters, such as control barrier function (CBF) based quadratic programs (QPs), have demonstrated success in controlling autonomous systems to achieve complex goals. These CBF-QPs can be shown to be continuous, but are generally not smooth, let alone continuously differentiable. In this paper, we present a general characterization of smooth safety filters -- smooth controllers that guarantee safety in a minimally invasive fashion -- based on the Implicit Function Theorem. This characterization leads to families of smooth universal formulas for safety-critical controllers that quantify the conservatism of the resulting safety filter, the utility of which is demonstrated through illustrative examples.
基于优化的安全过滤器,如控制屏障函数(CBF)基于quadratic programs(QPs)的安全控制器,已经证明可以在控制自主系统以实现复杂目标方面取得成功。这些CBF-QPs可以证明是连续的,但通常不是平滑的,更不用说连续微分了。在本文中,我们基于Implicit Function Theorem提出了一种通用的描述平滑安全过滤器的方法——以最小 invasive方式保证安全性的平滑控制器。这种方法导致了一组安全关键控制器的平滑通用公式,这些公式量化了 resulting safety filter的保守性,并使用了举例来展示其有用性。
https://arxiv.org/abs/2309.12614
Recent transportation research suggests that autonomous vehicles (AVs) have the potential to improve traffic flow efficiency as they are able to maintain smaller car-following distances. Nevertheless, being a unique class of ground robots, AVs are susceptible to robotic errors, particularly in their perception module, leading to uncertainties in their movements and an increased risk of collisions. Consequently, conservative operational strategies, such as larger headway and slower speeds, are implemented to prioritize safety over traffic capacity in real-world operations. To reconcile the inconsistency, this paper proposes an analytical model framework that delineates the endogenous reciprocity between traffic safety and efficiency that arises from robotic uncertainty in AVs. Car-following scenarios are extensively examined, with uncertain headway as the key parameter for bridging the single-lane capacity and the collision probability. A Markov chain is then introduced to describe the dynamics of the lane capacity, and the resulting expected collision-inclusive capacity is adopted as the ultimate performance measure for fully autonomous traffic. With the help of this analytical model, it is possible to support the settings of critical parameters in AV operations and incorporate optimization techniques to assist traffic management strategies for autonomous traffic.
最近的运输研究表明,自动驾驶汽车(AVs)有提高交通流效率的潜力,因为它们能够保持较小的汽车跟随距离。然而,由于它们是地面机器人的一种独特类型,容易受到机器人错误,特别是其感知模块的错误,导致他们的运动不确定性增加,Collision 风险也增加。因此,采取保守的行为方式,例如更大的出发速度和更慢的速度,在现实世界行动中优先考虑安全性而次要考虑交通容量。为了调和一致性,本文提出了一种分析模型框架,该框架描述了由 AVs 中的机器人不确定性引起的交通安全和效率之间的自适应性反循环。对汽车跟随场景进行了深入研究,以确定单一车道容量和Collision 概率之间的关键参数。然后,引入马尔可夫链来描述车道容量的动态,并采用预期的最大Collision 包容能力作为完全自动驾驶交通的终极性能指标。借助这种方法,可以支持 AV 行动中关键参数的设置,并采用优化技术协助自动驾驶交通的 traffic 管理策略。
https://arxiv.org/abs/2309.12611
Autonomous mobile robots need to perceive the environments with their onboard sensors (e.g., LiDARs and RGB cameras) and then make appropriate navigation decisions. In order to navigate human-inhabited public spaces, such a navigation task becomes more than only obstacle avoidance, but also requires considering surrounding humans and their intentions to somewhat change the navigation behavior in response to the underlying social norms, i.e., being socially compliant. Machine learning methods are shown to be effective in capturing those complex and subtle social interactions in a data-driven manner, without explicitly hand-crafting simplified models or cost functions. Considering multiple available sensor modalities and the efficiency of learning methods, this paper presents a comprehensive study on learning social robot navigation with multimodal perception using a large-scale real-world dataset. The study investigates social robot navigation decision making on both the global and local planning levels and contrasts unimodal and multimodal learning against a set of classical navigation approaches in different social scenarios, while also analyzing the training and generalizability performance from the learning perspective. We also conduct a human study on how learning with multimodal perception affects the perceived social compliance. The results show that multimodal learning has a clear advantage over unimodal learning in both dataset and human studies. We open-source our code for the community's future use to study multimodal perception for learning social robot navigation.
自主移动机器人需要使用内置传感器(例如激光雷达和RGB摄像头)感知环境,然后做出适当的导航决策。为了 navigate 人类居住的公共场所,这种导航任务不仅需要避免障碍物,还需要考虑周围的人类及其意图,以在某种程度上根据 underlying 社会规范(即社交合规)改变导航行为,即实现社交合规性。机器学习方法被证明能够有效地利用数据驱动的方式捕获那些复杂的、微妙的社交互动,而无需明确构建简化模型或 cost function。考虑到多种可用传感器模式以及学习方法的效率,本文提出了一种全面的研究,以使用大规模实际数据集学习社交机器人多感官导航。该研究在全球性和地方性规划级别上研究了社交机器人导航决策,并对比了单一感官和多感官学习在不同社交场景下的古典导航方法,同时分析训练和泛化性能。我们还进行了一项人类研究,探讨学习多感官感知如何影响 perceived 社交合规性。结果显示,多感官学习在数据集和人类研究中具有明显优势。我们将开源我们的代码,为社区未来使用研究多感官感知学习社交机器人导航。
https://arxiv.org/abs/2309.12568
We propose a risk-aware crash mitigation system (RCMS), to augment any existing motion planner (MP), that enables an autonomous vehicle to perform evasive maneuvers in high-risk situations and minimize the severity of collision if a crash is inevitable. In order to facilitate a smooth transition between RCMS and MP, we develop a novel activation mechanism that combines instantaneous as well as predictive collision risk evaluation strategies in a unified hysteresis-band approach. For trajectory planning, we deploy a modular receding horizon optimization-based approach that minimizes a smooth situational risk profile, while adhering to the physical road limits as well as vehicular actuator limits. We demonstrate the performance of our approach in a simulation environment.
我们提议了一种风险意识的崩溃缓解系统(RCMS),以补充任何现有的运动规划器(MP),使自动驾驶车辆能够在高风险情况下进行规避动作,并如果发生车祸则尽可能减轻碰撞的严重程度。为了便于RCMS和MP的平滑过渡,我们开发了一种独特的激活机制,它采用即时和预测的碰撞风险评估策略,在一个统一的Hyssible Band方法中综合应用。对于路径规划,我们采用了模块化延期 horizon 优化方法,最大限度地减少平滑情境风险概况,同时遵守物理道路限制和车辆致动限制。我们在一个模拟环境中演示了我们的 approach 的性能。
https://arxiv.org/abs/2309.12531
Simulation of autonomous vehicle systems requires that simulated traffic participants exhibit diverse and realistic behaviors. The use of prerecorded real-world traffic scenarios in simulation ensures realism but the rarity of safety critical events makes large scale collection of driving scenarios expensive. In this paper, we present DJINN - a diffusion based method of generating traffic scenarios. Our approach jointly diffuses the trajectories of all agents, conditioned on a flexible set of state observations from the past, present, or future. On popular trajectory forecasting datasets, we report state of the art performance on joint trajectory metrics. In addition, we demonstrate how DJINN flexibly enables direct test-time sampling from a variety of valuable conditional distributions including goal-based sampling, behavior-class sampling, and scenario editing.
模拟自动驾驶系统需要模拟交通参与者表现出多样性和真实的行为。在模拟中使用录制的真实世界交通场景确保真实感,但安全关键事件罕见的情况使得大规模收集驾驶场景非常昂贵。在本文中,我们介绍了DIJNN - 一种基于扩散的方法生成交通场景的方法。我们的算法联合扩散所有Agent的轨迹,以灵活的时间观测序列为条件从过去、现在或未来收集。在流行的轨迹预测数据集上,我们报告了最先进的 joint轨迹 metrics 的性能。此外,我们展示了DIJNN如何灵活地支持直接测试时间采样,包括目标采样、行为class采样和场景编辑等多种有价值的条件分布。
https://arxiv.org/abs/2309.12508
Discovering potential failures of an autonomous system is important prior to deployment. Falsification-based methods are often used to assess the safety of such systems, but the cost of running many accurate simulation can be high. The validation can be accelerated by identifying critical failure scenarios for the system under test and by reducing the simulation runtime. We propose a Bayesian approach that integrates meta-learning strategies with a multi-armed bandit framework. Our method involves learning distributions over scenario parameters that are prone to triggering failures in the system under test, as well as a distribution over fidelity settings that enable fast and accurate simulations. In the spirit of meta-learning, we also assess whether the learned fidelity settings distribution facilitates faster learning of the scenario parameter distributions for new scenarios. We showcase our methodology using a cutting-edge 3D driving simulator, incorporating 16 fidelity settings for an autonomous vehicle stack that includes camera and lidar sensors. We evaluate various scenarios based on an autonomous vehicle pre-crash typology. As a result, our approach achieves a significant speedup, up to 18 times faster compared to traditional methods that solely rely on a high-fidelity simulator.
在部署之前发现自主系统的潜在失败非常重要。基于 Falsification 的方法经常被用于评估此类系统的安全,但运行许多准确的模拟需要花费高昂的代价。通过确定测试系统中关键失败场景,可以加速验证过程,并减少模拟运行时的时间。我们提出了一种贝叶斯方法,它将元学习策略与多臂博弈框架相结合。我们的方法涉及学习场景参数的分布,这些参数在测试系统中容易触发失败,以及能够快速且准确地模拟的逼真设置分布。在元学习的精神下,我们还评估了学习逼真设置分布是否能够加速对新场景参数分布的学习。我们使用最先进的 3D 驾驶模拟器,其中包括一个包含相机和激光传感器的自主车辆堆,我们根据自主车辆事故前类型评估了各种场景。因此,我们的方法实现了显著的加速,与仅依靠高逼真模拟的传统方法相比,速度提高了18倍。
https://arxiv.org/abs/2309.12474
Panoramic videos contain richer spatial information and have attracted tremendous amounts of attention due to their exceptional experience in some fields such as autonomous driving and virtual reality. However, existing datasets for video segmentation only focus on conventional planar images. To address the challenge, in this paper, we present a panoramic video dataset, PanoVOS. The dataset provides 150 videos with high video resolutions and diverse motions. To quantify the domain gap between 2D planar videos and panoramic videos, we evaluate 15 off-the-shelf video object segmentation (VOS) models on PanoVOS. Through error analysis, we found that all of them fail to tackle pixel-level content discontinues of panoramic videos. Thus, we present a Panoramic Space Consistency Transformer (PSCFormer), which can effectively utilize the semantic boundary information of the previous frame for pixel-level matching with the current frame. Extensive experiments demonstrate that compared with the previous SOTA models, our PSCFormer network exhibits a great advantage in terms of segmentation results under the panoramic setting. Our dataset poses new challenges in panoramic VOS and we hope that our PanoVOS can advance the development of panoramic segmentation/tracking.
全景视频包含了更丰富的空间信息,因此在一些领域如自动驾驶和虚拟现实中吸引了大量的关注,例如。然而,现有的视频分割数据集只关注传统的平面图像。为了应对这个问题,在本文中,我们提出了一个全景视频数据集,PanoVOS。该数据集提供了150个高分辨率的视频和多种运动。为了量化2D平面视频和全景视频之间的领域差异,我们评估了15个现有视频物体分割模型(VOS)在PanoVOS上的表现。通过错误分析,我们发现它们都无法解决全景视频的像素级内容中断问题。因此,我们提出了全景空间一致性Transformer(PSC former),它能够有效利用上一句的语义边界信息,对当前帧进行像素级匹配。广泛的实验结果表明,与以前的SOTA模型相比,我们的PSC former网络在全景设置下的视频分割结果表现优异。我们的数据集在全景VOS方面提出了新的挑战,我们希望能够推动全景分割/跟踪的发展。
https://arxiv.org/abs/2309.12303
Human drivers can seamlessly adapt their driving decisions across geographical locations with diverse conditions and rules of the road, e.g., left vs. right-hand traffic. In contrast, existing models for autonomous driving have been thus far only deployed within restricted operational domains, i.e., without accounting for varying driving behaviors across locations or model scalability. In this work, we propose AnyD, a single geographically-aware conditional imitation learning (CIL) model that can efficiently learn from heterogeneous and globally distributed data with dynamic environmental, traffic, and social characteristics. Our key insight is to introduce a high-capacity geo-location-based channel attention mechanism that effectively adapts to local nuances while also flexibly modeling similarities among regions in a data-driven manner. By optimizing a contrastive imitation objective, our proposed approach can efficiently scale across inherently imbalanced data distributions and location-dependent events. We demonstrate the benefits of our AnyD agent across multiple datasets, cities, and scalable deployment paradigms, i.e., centralized, semi-supervised, and distributed agent training. Specifically, AnyD outperforms CIL baselines by over 14% in open-loop evaluation and 30% in closed-loop testing on CARLA.
人类司机可以在各种条件和规则不同的地理区域中无缝适应驾驶决策,例如左行右舵交通。相比之下,现有的自动驾驶模型目前只能在特定 operational domains 内部署,即未考虑不同地点或模型 scalability 的变化。在本研究中,我们提出了一种名为 AnyD 的 Geographically Aware Conditional Imitation Learning (CIL) 模型,可以从具有动态环境、交通和社会特征的异质全球数据中高效学习。我们的关键发现是引入一个高容量的地理位置通道注意力机制,有效地适应当地细节,同时通过数据驱动的方式灵活建模不同地区之间的相似性。通过优化对比模仿目标,我们提出的这种方法可以高效地跨越天生不平衡的数据分布和地点依赖性事件。我们在不同的数据集、城市和可扩展部署范式中展示了 AnyD Agent 的优点,其 open-loop 评估中超过 14%,在 closed-loop 测试中超过 30% 的性能表现优于 CIL 基线模型。具体来说,在卡尔拉测试中,AnyD 在 open-loop 评估中比 CIL 基线模型表现更好。
https://arxiv.org/abs/2309.12295
Despite large advances in recent years, real-time capable motion planning for autonomous road vehicles remains a huge challenge. In this work, we present a decision module that is based on set-based reachability analysis: First, we identify all possible driving corridors by computing the reachable set for the longitudinal position of the vehicle along the lanelets of the road network, where lane changes are modeled as discrete events. Next, we select the best driving corridor based on a cost function that penalizes lane changes and deviations from a desired velocity profile. Finally, we generate a reference trajectory inside the selected driving corridor, which can be used to guide or warm start low-level trajectory planners. For the numerical evaluation we combine our decision module with a motion-primitive-based and an optimization-based planner and evaluate the performance on 2000 challenging CommonRoad traffic scenarios as well in the realistic CARLA simulator. The results demonstrate that our decision module is real-time capable and yields significant speed-ups compared to executing a motion planner standalone without a decision module.
尽管近年来取得了巨大的进步,但对于自主汽车实时 capable 的 motion planning 仍然是一个巨大的挑战。在本文中,我们提出了基于集合 reachability 分析的决策模块:首先,我们计算出车辆沿着道路网络的车道线,将车道变化建模为离散事件,计算出所有可能的交通走廊。接下来,我们根据一个惩罚函数选择最佳的交通走廊,该函数惩罚车道变化和偏离期望的速度特性。最后,我们生成选定交通走廊内部的参考轨迹,该轨迹可用于指导或热身低级别轨迹规划师。为了进行数值评估,我们结合了运动基本方法和优化方法的决策模块,并对 2000 个挑战性的Common Road 交通场景以及真实的CARLA模拟器进行评估。结果表明,我们的决策模块具有实时能力,相比单独执行 motion planner,能够显著提高速度。
https://arxiv.org/abs/2309.12289
The integration of Large Language Models (LLMs) into robotics has revolutionized human-robot interactions and autonomous task planning. However, these systems are often unable to self-correct during the task execution, which hinders their adaptability in dynamic real-world environments. To address this issue, we present a Hierarchical Closed-loop Robotic Intelligent Self-correction Planner (HiCRISP), an innovative framework that enables robots to correct errors within individual steps during the task execution. HiCRISP actively monitors and adapts the task execution process, addressing both high-level planning and low-level action errors. Extensive benchmark experiments, encompassing virtual and real-world scenarios, showcase HiCRISP's exceptional performance, positioning it as a promising solution for robotic task planning with LLMs.
将大型语言模型(LLM)集成到机器人中已经彻底改变了人类机器人互动和自主任务规划的方式。然而,这些系统在任务执行时常常无法自我纠正,这限制了它们在动态现实世界中的适应能力。为了解决这个问题,我们提出了一个Hierarchical Closed-loop Robotic Intelligent Self-Correction Planner(HiCRISP),这是一个创新的框架,可以在任务执行过程中使机器人在每个步骤中纠正错误。HiCRISP actively monitor和适应任务执行过程,既解决了高层次规划和低层次行动错误的问题,也解决了普通规划算法无法处理的问题。广泛的基准实验涵盖了虚拟和现实世界的场景,展示了HiCRISP的卓越性能,将其作为与LLM进行机器人任务规划的有前途的解决方案。
https://arxiv.org/abs/2309.12089
To integrate action recognition methods into autonomous robotic systems, it is crucial to consider adverse situations involving target occlusions. Such a scenario, despite its practical relevance, is rarely addressed in existing self-supervised skeleton-based action recognition methods. To empower robots with the capacity to address occlusion, we propose a simple and effective method. We first pre-train using occluded skeleton sequences, then use k-means clustering (KMeans) on sequence embeddings to group semantically similar samples. Next, we employ K-nearest-neighbor (KNN) to fill in missing skeleton data based on the closest sample neighbors. Imputing incomplete skeleton sequences to create relatively complete sequences as input provides significant benefits to existing skeleton-based self-supervised models. Meanwhile, building on the state-of-the-art Partial Spatio-Temporal Learning (PSTL), we introduce an Occluded Partial Spatio-Temporal Learning (OPSTL) framework. This enhancement utilizes Adaptive Spatial Masking (ASM) for better use of high-quality, intact skeletons. The effectiveness of our imputation methods is verified on the challenging occluded versions of the NTURGB+D 60 and NTURGB+D 120. The source code will be made publicly available at this https URL.
将行动识别方法整合到自主机器人系统中,必须考虑涉及目标遮挡的不利情况。尽管这种场景的实际 relevance 很低,但在当前基于骨骼的行动识别方法中却很少有人考虑。为了赋予机器人处理遮挡的能力,我们提出了一种简单而有效的方法。我们首先使用遮挡的骨骼序列进行预训练,然后使用 k-means 聚类(KMeans)将序列嵌入向量分组语义相似的样本。接下来,我们使用 KNN 根据最接近的样本邻居填充缺失的骨骼数据。将不完整的骨骼序列输入生成相对完整的序列作为输入,为当前基于骨骼的自监督模型带来重大的好处。同时,基于当前先进的 partial Spatial-Temporal Learning(PSTL)技术,我们提出了一个被改进的遮挡 partial Spatial-Temporal Learning(OPSTL)框架。这种改进利用自适应空间遮蔽(ASM)更好地利用高质量的完整的骨骼。我们的代入方法的有效性在 NturGB+D 60 和 NturGB+D 120 等挑战性的遮挡版本上进行了验证。源代码将在 this https://www.tensorflow.org/zh/api_docs/python/tf/keras/models/Sequential 网站上公开发布。
https://arxiv.org/abs/2309.12029
Autonomous wheel loading involves selecting actions that maximize the total performance over many repetitions. The actions should be well adapted to the current state of the pile and its future states. Selecting the best actions is difficult since the pile states are consequences of previous actions and thus are highly unknown. To aid the selection of actions, this paper investigates data-driven models to predict the loaded mass, time, work, and resulting pile state of a loading action given the initial pile state. Deep neural networks were trained on data using over 10,000 simulations to an accuracy of 91-97,% with the pile state represented either by a heightmap or by its slope and curvature. The net outcome of sequential loading actions is predicted by repeating the model inference at five milliseconds per loading. As errors accumulate during the inferences, long-horizon predictions need to be combined with a physics-based model.
自动驾驶汽车的装载涉及到选择能够最大化多次重复总性能的行动。这些行动应该适应堆的状态,并适应其未来状态。选择最好的行动非常困难,因为堆的状态是先前行动的结果,因此高度非常未知。为了协助选择行动,本文研究了基于数据的模型,以预测给定初始堆状态的流量行动的加载重量、时间、工作以及最终堆状态。使用超过10,000次模拟训练了深度神经网络,其精度高达91-97%。每个加载行动的累积结果都可以通过在每个加载行动后重复模型推理5毫秒进行预测。随着推理中的误差不断增加,长期预测需要与基于物理的模型相结合。
https://arxiv.org/abs/2309.12016
Perceiving and mapping the surroundings are essential for enabling autonomous navigation in any robotic platform. The algorithm class that enables accurate mapping while correcting the odometry errors present in most robotics systems is Simultaneous Localization and Mapping (SLAM). Today, fully onboard mapping is only achievable on robotic platforms that can host high-wattage processors, mainly due to the significant computational load and memory demands required for executing SLAM algorithms. For this reason, pocket-size hardware-constrained robots offload the execution of SLAM to external infrastructures. To address the challenge of enabling SLAM algorithms on resource-constrained processors, this paper proposes NanoSLAM, a lightweight and optimized end-to-end SLAM approach specifically designed to operate on centimeter-size robots at a power budget of only 87.9 mW. We demonstrate the mapping capabilities in real-world scenarios and deploy NanoSLAM on a nano-drone weighing 44 g and equipped with a novel commercial RISC-V low-power parallel processor called GAP9. The algorithm is designed to leverage the parallel capabilities of the RISC-V processing cores and enables mapping of a general environment with an accuracy of 4.5 cm and an end-to-end execution time of less than 250 ms.
感知和周围环境的绘制是任何机器人平台实现自主导航所必需的。能够实现准确地图同时纠正大多数机器人系统中的步距估计误差的算法类别是同时定位和地图(SLAM)。如今,只有能够搭载高性能处理器的机器人平台才能实现完整的内置地图,这主要是因为执行SLAM算法需要大量的计算资源和内存要求。因此,小型硬件受限的机器人只能将SLAM的执行委托给外部基础设施。为了解决在资源受限的处理器上实现SLAM算法的挑战,本论文提出了纳米SLAM,它是一种轻量级、优化的端到端SLAM方法,专门设计用于操作厘米级别的机器人,功率预算仅87.9毫瓦。我们在现实场景中展示了地图能力,并部署了体重44克、搭载名为GAP9的 novel RISC-V低功耗并行处理器的纳米机器人。该算法利用RISC-V处理器的并行能力,能够实现对环境的精确地图,精度为4.5厘米,且端到端执行时间小于250毫秒。
https://arxiv.org/abs/2309.12008