Within academia and industry, there has been a need for expansive simulation frameworks that include model-based simulation of sensors, mobile vehicles, and the environment around them. To this end, the modular, real-time, and open-source AirSim framework has been a popular community-built system that fulfills some of those needs. However, the framework required adding systems to serve some complex industrial applications, including designing and testing new sensor modalities, Simultaneous Localization And Mapping (SLAM), autonomous navigation algorithms, and transfer learning with machine learning models. In this work, we discuss the modification and additions to our open-source version of the AirSim simulation framework, including new sensor modalities, vehicle types, and methods to generate realistic environments with changeable objects procedurally. Furthermore, we show the various applications and use cases the framework can serve.
在学术界和工业界,需要有扩展性的模拟框架,其中包括基于模型的传感器、移动车辆及其周围环境的模拟。为此,模块化、实时且开源的AirSim框架已成为一个受欢迎的社区构建系统,满足了其中一些需求。然而,框架需要添加系统以服务一些复杂的工业应用,包括设计和测试新的传感器模式、同时定位和地图(SLAM)、自主导航算法以及与机器学习模型的转移学习。在这项工作中,我们讨论了我们开源版本的AirSim模拟框架的修改和添加,包括新的传感器模式、车辆类型和方法,以生成具有可变化对象的实际环境。此外,我们展示了框架可以服务的多种应用和 use cases。
https://arxiv.org/abs/2303.13381
Today, many systems use artificial intelligence (AI) to solve complex problems. While this often increases system effectiveness, developing a production-ready AI-based system is a difficult task. Thus, solid AI engineering practices are required to ensure the quality of the resulting system and to improve the development process. While several practices have already been proposed for the development of AI-based systems, detailed practical experiences of applying these practices are rare. In this paper, we aim to address this gap by collecting such experiences during a case study, namely the development of an autonomous stock trading system that uses machine learning functionality to invest in stocks. We selected 10 AI engineering practices from the literature and systematically applied them during development, with the goal to collect evidence about their applicability and effectiveness. Using structured field notes, we documented our experiences. Furthermore, we also used field notes to document challenges that occurred during the development, and the solutions we applied to overcome them. Afterwards, we analyzed the collected field notes, and evaluated how each practice improved the development. Lastly, we compared our evidence with existing literature. Most applied practices improved our system, albeit to varying extent, and we were able to overcome all major challenges. The qualitative results provide detailed accounts about 10 AI engineering practices, as well as challenges and solutions associated with such a project. Our experiences therefore enrich the emerging body of evidence in this field, which may be especially helpful for practitioner teams new to AI engineering.
当今世界,许多系统使用人工智能(AI)解决复杂的问题。虽然这通常可以增加系统的有效性,但开发生产级别的基于AI的系统是一项困难的任务。因此,需要 solid AI engineering practices 来确保生成的系统质量,并改进开发过程。虽然已经有几种方法被提议用于开发基于AI的系统,但实施这些实践的经验非常罕见。在本文中,我们旨在通过在一个案例研究中收集这些经验来解决这一差距,即开发一个使用机器学习功能投资于股票的自主股票交易系统。我们从文献中选择了10个AI engineering practices,并系统地在开发过程中应用它们,旨在收集它们的应用和有效性的证据。使用结构化的Field notes,我们记录了我们的经历。此外,我们还使用Field notes记录了开发过程中发生的挑战和我们所应用的解决措施。之后,我们对收集的Field notes进行了分析,并评估了每种实践如何改进开发。最后,我们比较了我们的证据证明与现有的文献。大多数应用实践改进了我们的系统,虽然程度不一,我们成功地克服了所有主要挑战。定性结果提供了关于10个AI engineering practices 的详细描述,以及与这个项目相关的挑战和解决方案。我们的经历因此丰富了该领域的证据,可能对AI工程的新团队特别有用。
https://arxiv.org/abs/2303.13216
Pedestrian occlusion is challenging for autonomous vehicles (AVs) at midblock locations on multilane roadways because an AV cannot detect crossing pedestrians that are fully occluded by downstream vehicles in adjacent lanes. This paper tests the capability of vehicle-to-vehicle (V2V) communication between an AV and its downstream vehicles to share midblock pedestrian crossings information. The researchers developed a V2V-based collision-avoidance decision strategy and compared it to a base scenario (i.e., decision strategy without the utilization of V2V). Simulation results showed that for the base scenario, the near-zero time-to-collision (TTC) indicated no time for the AV to take appropriate action and resulted in dramatic braking followed by collisions. But the V2V-based collision-avoidance decision strategy allowed for a proportional braking approach to increase the TTC allowing the pedestrian to cross safely. To conclude, the V2V-based collision-avoidance decision strategy has higher safety benefits for an AV interacting with fully occluded pedestrians at midblock locations on multilane roadways.
行人阻塞对无人驾驶汽车(AV)在多车道道路的中线附近位置是非常困难的,因为AV无法检测相邻车道上完全阻塞的行人。本文测试了AV及其下游车辆的车对车通信能力,以分享中线附近行人穿越信息。研究人员开发了基于V2V的避免碰撞决策策略,并将其与基情假设进行比较(即不使用V2V的决策策略)。模拟结果显示,对于基情假设,接近零的碰撞避免时间(TTC)表示AV没有时间采取适当行动,导致戏剧性的刹车和碰撞。但基于V2V的避免碰撞决策策略允许按比例刹车,以增加TTC,从而使行人能够安全通过。因此,结论是,基于V2V的避免碰撞决策策略对于在多车道道路的中线附近与完全阻塞的行人交互的AV有更大的安全性好处。
https://arxiv.org/abs/2303.13032
Robust real-time perception of 3D world is essential to the autonomous vehicle. We introduce an end-to-end surround camera perception system for self-driving. Our perception system is a novel multi-task, multi-camera network which takes a variable set of time-synced camera images as input and produces a rich collection of 3D signals such as sizes, orientations, locations of obstacles, parking spaces and free-spaces, etc. Our perception network is modular and end-to-end: 1) the outputs can be consumed directly by downstream modules without any post-processing such as clustering and fusion -- improving speed of model deployment and in-car testing 2) the whole network training is done in one single stage -- improving speed of model improvement and iterations. The network is well designed to have high accuracy while running at 53 fps on NVIDIA Orin SoC (system-on-a-chip). The network is robust to sensor mounting variations (within some tolerances) and can be quickly customized for different vehicle types via efficient model fine-tuning thanks of its capability of taking calibration parameters as additional inputs during training and testing. Most importantly, our network has been successfully deployed and being tested on real roads.
3D实时感知对于无人驾驶车辆至关重要。我们介绍了一个端到端周围的摄像头感知系统,用于无人驾驶。我们的感知系统是一个新型多任务多摄像头网络,它以变量一组时间同步的摄像头图像作为输入,产生丰富的3D信号,如大小、方向、障碍物、停车位和空间等。我们的感知网络是模块化的,端到端: 1) 输出可以直接被后续模块消耗,不需要任何后处理,如聚类和融合,以提高模型部署和车内测试速度 2) 整个网络训练在一次阶段完成,以提高模型改进和迭代速度。该网络精心设计,在NVIDIA Orin SoC(芯片内置系统)上以53帧每秒运行,具有高准确性。该网络对传感器安装变化具有较强的鲁棒性(在一定容忍度内),可以通过高效的模型微调快速为不同类型的车辆定制。最重要的是,我们的网络已经成功地在真实道路上部署和测试。
https://arxiv.org/abs/2303.12976
Crash data of autonomous vehicles (AV) or vehicles equipped with advanced driver assistance systems (ADAS) are the key information to understand the crash nature and to enhance the automation systems. However, most of the existing crash data sources are either limited by the sample size or suffer from missing or unverified data. To contribute to the AV safety research community, we introduce AVOID: an open AV crash dataset. Three types of vehicles are considered: Advanced Driving System (ADS) vehicles, Advanced Driver Assistance Systems (ADAS) vehicles, and low-speed autonomous shuttles. The crash data are collected from the National Highway Traffic Safety Administration (NHTSA), California Department of Motor Vehicles (CA DMV) and incident news worldwide, and the data are manually verified and summarized in ready-to-use format. In addition, land use, weather, and geometry information are also provided. The dataset is expected to accelerate the research on AV crash analysis and potential risk identification by providing the research community with data of rich samples, diverse data sources, clear data structure, and high data quality.
无人驾驶车辆(AV)或装备先进驾驶辅助系统(ADAS)的车辆的事故数据是理解事故性质和提高自动化系统的关键技术信息。然而,大多数现有的事故数据来源都受到样本大小的限制,或者存在缺失或未验证的数据。为了为AV安全研究社区做出贡献,我们介绍了AVOID:一个开放的AV事故数据集。考虑了三种车辆类型:先进的驾驶系统(ADS)车辆、先进的驾驶辅助系统(ADAS)车辆和低速无人驾驶公交车。事故数据从全国高速公路交通安全管理局(NHTSA)、加利福尼亚州汽车管理局(CA DMV)和世界各地的新闻中收集,数据手动验证和总结,以 ready-to-use 格式呈现。此外,土地使用、天气和几何信息也提供了。预计数据集将加速AV事故分析和潜在风险识别的研究,通过提供丰富的样本、多样化的数据来源、清晰的数据结构和高质量的数据。
https://arxiv.org/abs/2303.12889
Autonomous swarms of robots can bring robustness, scalability and adaptability to safety-critical tasks such as search and rescue but their application is still very limited. Using semi-autonomous swarms with human control can bring robot swarms to real-world applications. Human operators can define goals for the swarm, monitor their performance and interfere with, or overrule, the decisions and behaviour. We present the ``Human And Robot Interactive Swarm'' simulator (HARIS) that allows multi-user interaction with a robot swarm and facilitates qualitative and quantitative user studies through simulation of robot swarms completing tasks, from package delivery to search and rescue, with varying levels of human control. In this demonstration, we showcase the simulator by using it to study the performance gain offered by maintaining a ``human-in-the-loop'' over a fully autonomous system as an example. This is illustrated in the context of search and rescue, with an autonomous allocation of resources to those in need.
机器人群集( Swarms of robots)可以在搜索和救援等安全性关键任务中提供 robustness、 scalability 和灵活性,但是其应用仍然非常有限。使用具有人类控制下的半自主机器人群集可以将其带到现实世界的应用中。人类操作员可以定义机器人群集的目标,监测其表现,干扰或推翻决策和行为。我们介绍了“人类和机器人互动群集模拟器”(Haris),它允许多个用户与机器人群集交互,并通过模拟机器人群集完成任务,从包裹配送到搜索和救援,在不同水平人类控制的情况下提供定性和定量的用户研究。在这个演示中,我们展示了模拟器,并通过使用它来研究保持“人类参与循环”对完全自主系统的性能增益的影响。这个例子在搜索和救援上下文中得到了体现,通过自主分配资源以援助需要的人。
https://arxiv.org/abs/2303.12390
Reliable localization is crucial for autonomous robots to navigate efficiently and safely. Some navigation methods can plan paths with high localizability (which describes the capability of acquiring reliable localization). By following these paths, the robot can access the sensor streams that facilitate more accurate location estimation results by the localization algorithms. However, most of these methods require prior knowledge and struggle to adapt to unseen scenarios or dynamic changes. To overcome these limitations, we propose a novel approach for localizability-enhanced navigation via deep reinforcement learning in dynamic human environments. Our proposed planner automatically extracts geometric features from 2D laser data that are helpful for localization. The planner learns to assign different importance to the geometric features and encourages the robot to navigate through areas that are helpful for laser localization. To facilitate the learning of the planner, we suggest two techniques: (1) an augmented state representation that considers the dynamic changes and the confidence of the localization results, which provides more information and allows the robot to make better decisions, (2) a reward metric that is capable to offer both sparse and dense feedback on behaviors that affect localization accuracy. Our method exhibits significant improvements in lost rate and arrival rate when tested in previously unseen environments.
可靠的定位对于自主机器人高效、安全地导航至关重要。一些导航方法可以规划具有高定位可靠性的路径(这描述了获取可靠定位的能力)。通过遵循这些路径,机器人可以访问有助于定位算法更准确地定位传感器流,从而更轻松地实现定位算法。然而,这些方法的大部分都需要先前的知识,并且很难适应未曾遇到的情况和动态变化。为了克服这些限制,我们提出了一种基于深度强化学习的新颖定位增强方法,通过动态人类环境。我们提议的规划者自动从2D激光数据中提取几何特征,这些特征对于定位有用。规划者学习将不同的几何特征赋予不同的重要性,并鼓励机器人穿越有助于激光定位的区域。为了促进规划者的学习,我们建议两种方法:(1)一个扩展的状态表示法,考虑动态变化和定位结果的可信度,提供了更多的信息,使机器人能够做出更好的决策,(2)一个奖励指标,能够提供稀疏和稠密的反馈,影响定位准确性的行为。我们在之前未曾测试过的环境中测试时,该方法表现出显著的减少丢失率和到达率的提高。
https://arxiv.org/abs/2303.12354
With the dramatic progress of artificial intelligence algorithms in recent times, it is hoped that algorithms will soon supplant human decision-makers in various fields, such as contract design. We analyze the possible consequences by experimentally studying the behavior of algorithms powered by Artificial Intelligence (Multi-agent Q-learning) in a workhorse \emph{dual contract} model for dual-principal-agent problems. We find that the AI algorithms autonomously learn to design incentive-compatible contracts without external guidance or communication among themselves. We emphasize that the principal, powered by distinct AI algorithms, can play mixed-sum behavior such as collusion and competition. We find that the more intelligent principals tend to become cooperative, and the less intelligent principals are endogenizing myopia and tend to become competitive. Under the optimal contract, the lower contract incentive to the agent is sustained by collusive strategies between the principals. This finding is robust to principal heterogeneity, changes in the number of players involved in the contract, and various forms of uncertainty.
近年来人工智能技术的飞速发展,我们希望算法很快能够在各种领域取代人类决策者,例如合同设计。我们实验研究了基于人工智能(多Agent Q-learning)的算法在经典模型“ dual contract”中的行为,以研究可能的后果。我们发现,在没有外部指导或内部通信的情况下,人工智能算法自主地学习设计具有内在奖励兼容的合同。我们强调,由不同的人工智能算法驱动的决策者可以表现出混合行为,例如合作和竞争。我们发现,智商更高的决策者通常更加合作,而智商较低的决策者则正在增强近视并可能变得更具竞争力。在最优合同下,决策者之间的合作策略可以维持Agent更低的合同奖励。这一发现对于决策者多样性、合同参与方数量变化以及各种不确定性形式都具有很强的鲁棒性。
https://arxiv.org/abs/2303.12350
Music-driven choreography is a challenging problem with a wide variety of industrial applications. Recently, many methods have been proposed to synthesize dance motions from music for a single dancer. However, generating dance motion for a group remains an open problem. In this paper, we present $\rm AIOZ-GDANCE$, a new large-scale dataset for music-driven group dance generation. Unlike existing datasets that only support single dance, our new dataset contains group dance videos, hence supporting the study of group choreography. We propose a semi-autonomous labeling method with humans in the loop to obtain the 3D ground truth for our dataset. The proposed dataset consists of $16.7$ hours of paired music and 3D motion from in-the-wild videos, covering $7$ dance styles and $16$ music genres. We show that naively applying single dance generation technique to creating group dance motion may lead to unsatisfactory results, such as inconsistent movements and collisions between dancers. Based on our new dataset, we propose a new method that takes an input music sequence and a set of 3D positions of dancers to efficiently produce multiple group-coherent choreographies. We propose new evaluation metrics for measuring group dance quality and perform intensive experiments to demonstrate the effectiveness of our method.
音乐驱动的编舞是一个具有多种工业应用的挑战性问题。近年来,已经提出了许多方法,用于从音乐中合成舞蹈动作,为单个舞者生成舞蹈动作仍然是一个开放性问题。在本文中,我们提出了$\rm AIOZ-GDANCE$,一个新的大规模数据集,用于音乐驱动的团体舞蹈生成。与仅支持单个舞蹈的现有数据集不同,我们的新数据集包含团体舞蹈视频,因此支持团体编舞的研究。我们提出了一种半自主标记方法,与人类循环参与,以获得我们的数据集的3D基准事实。该提议数据集包括从野生视频中配对的音乐和3D运动长达16.7小时,涵盖了7种舞蹈风格和16种音乐类型。我们表明,天真地应用单个舞蹈生成技术创建团体舞蹈动作可能会产生不满意的结果,如不一致的动作和舞者之间的碰撞。基于我们的新数据集,我们提出了一种新方法,利用输入音乐序列和舞蹈者3D位置块,高效生成多个团体一致性的编舞。我们提出了用于衡量团体舞蹈质量的新评估指标,并进行了大量的实验,以证明我们方法的有效性。
https://arxiv.org/abs/2303.12337
The deployment of Autonomous Vehicles (AVs) poses considerable challenges and unique opportunities for the design and management of future urban road infrastructure. In light of this disruptive transformation, the Right-Of-Way (ROW) composition of road space has the potential to be renewed. Design approaches and intelligent control models have been proposed to address this problem, but we lack an operational framework that can dynamically generate ROW plans for AVs and pedestrians in response to real-time demand. Based on microscopic traffic simulation, this study explores Reinforcement Learning (RL) methods for evolving ROW compositions. We implement a centralised paradigm and a distributive learning paradigm to separately perform the dynamic control on several road network configurations. Experimental results indicate that the algorithms have the potential to improve traffic flow efficiency and allocate more space for pedestrians. Furthermore, the distributive learning algorithm outperforms its centralised counterpart regarding computational cost (49.55\%), benchmark rewards (25.35\%), best cumulative rewards (24.58\%), optimal actions (13.49\%) and rate of convergence. This novel road management technique could potentially contribute to the flow-adaptive and active mobility-friendly streets in the AVs era.
无人驾驶汽车(AVs)的部署为未来城市道路基础设施的设计和管理提出了巨大的挑战和独特的机会。考虑到这种破坏性变革,道路空间的 Right-Of-Way (ROW) 组成有潜力得到更新。为此,提出了设计方法和智慧控制模型来解决这个问题,但缺乏能够根据实时需求动态生成 AVs 和行人 ROW 计划的 operational 框架。基于微观交通模拟,本研究探索了演化 ROW 组成的强化学习(RL)方法。我们分别实现了集中范式和分散学习范式,对多个道路网络配置进行了动态控制。实验结果表明,算法有潜力提高交通流量效率和分配更多空间给行人。此外,分散学习算法在计算成本(49.55%)、基准奖励(25.35%)、最佳累积奖励(24.58%)、最优行动(13.49%)和收敛速度等方面表现优异。这一创新的道路管理技术可能有助于在 AVs 时代的行人流动自适应和主动移动友好的街道。
https://arxiv.org/abs/2303.12289
Accurate and robust trajectory prediction of neighboring agents is critical for autonomous vehicles traversing in complex scenes. Most methods proposed in recent years are deep learning-based due to their strength in encoding complex interactions. However, unplausible predictions are often generated since they rely heavily on past observations and cannot effectively capture the transient and contingency interactions from sparse samples. In this paper, we propose a hierarchical hybrid framework of deep learning (DL) and reinforcement learning (RL) for multi-agent trajectory prediction, to cope with the challenge of predicting motions shaped by multi-scale interactions. In the DL stage, the traffic scene is divided into multiple intermediate-scale heterogenous graphs based on which Transformer-style GNNs are adopted to encode heterogenous interactions at intermediate and global levels. In the RL stage, we divide the traffic scene into local sub-scenes utilizing the key future points predicted in the DL stage. To emulate the motion planning procedure so as to produce trajectory predictions, a Transformer-based Proximal Policy Optimization (PPO) incorporated with a vehicle kinematics model is devised to plan motions under the dominant influence of microscopic interactions. A multi-objective reward is designed to balance between agent-centric accuracy and scene-wise compatibility. Experimental results show that our proposal matches the state-of-the-arts on the Argoverse forecasting benchmark. It's also revealed by the visualized results that the hierarchical learning framework captures the multi-scale interactions and improves the feasibility and compliance of the predicted trajectories.
相邻代理的准确和鲁棒的轨迹预测对于在复杂场景中自动驾驶车辆非常重要。近年来,大多数方法都是基于深度学习的,因为深度学习在编码复杂交互方面具有优势。然而,由于它们依赖于过去的观察结果,并且无法有效地从稀疏样本中捕捉瞬态和异常交互,所以往往产生不合理的预测。在本文中,我们提出了一种分层的深度学习和强化学习混合框架,用于多代理轨迹预测,以应对由多尺度交互所塑造的轨迹预测挑战。在深度学习阶段,交通场景被分成多个中等规模的异质图形,基于这些图形采用Transformer风格的GNNs来编码异质交互在中等和全球水平上。在强化学习阶段,我们利用深度学习阶段预测的关键未来点将交通场景划分为本地子场景。为了模拟运动规划过程并产生轨迹预测,一个基于Transformer的远程决策优化(PPO)结合车辆运动学模型设计用于在微观交互主导影响下规划运动。一个多目标奖励旨在平衡代理中心准确性和场景间兼容性。实验结果表明,我们的提议与Argoverse预测基准的先进技术相当。可视化结果也表明,分层学习框架捕获了多尺度交互,并提高了预测轨迹的可行性和遵守性。
https://arxiv.org/abs/2303.12274
Robotic assistance for experimental manipulation in the life sciences is expected to enable precise manipulation of valuable samples, regardless of the skill of the scientist. Experimental specimens in the life sciences are subject to individual variability and deformation, and therefore require autonomous robotic control. As an example, we are studying the installation of a cranial window in a mouse. This operation requires the removal of the skull, which is approximately 300 um thick, to cut it into a circular shape 8 mm in diameter, but the shape of the mouse skull varies depending on the strain of mouse, sex and week of age. The thickness of the skull is not uniform, with some areas being thin and others thicker. It is also difficult to ensure that the skulls of the mice are kept in the same position for each operation. It is not realistically possible to measure all these features and pre-program a robotic trajectory for individual mice. The paper therefore proposes an autonomous robotic drilling method. The proposed method consists of drilling trajectory planning and image-based task completion level recognition. The trajectory planning adjusts the z-position of the drill according to the task completion level at each discrete point, and forms the 3D drilling path via constrained cubic spline interpolation while avoiding overshoot. The task completion level recognition uses a DSSD-inspired deep learning model to estimate the task completion level of each discrete point. Since an egg has similar characteristics to a mouse skull in terms of shape, thickness and mechanical properties, removing the egg shell without damaging the membrane underneath was chosen as the simulation task. The proposed method was evaluated using a 6-DOF robotic arm holding a drill and achieved a success rate of 80% out of 20 trials.
生命科学中的实验操作需要机器人辅助,期望能够精确操纵宝贵的样本,无论科学家的技能如何。生命科学中的实验样本具有个体变量和变形,因此需要自主机器人控制。举个例子,我们正在研究在老鼠中安装颅骨窗口的操作。这需要将颅骨削减到大约300微米的厚度,以将其切成圆形形状直径为8毫米,但老鼠的颅骨形状因老鼠品种、性别和年龄而异。颅骨的厚度不是一致的,某些区域薄一些,某些区域则更厚。确保每只老鼠的颅骨在每只操作中都保持在相同的位置很困难。实际上,很难测量所有这些特征并为每只老鼠预先编程机器人路径。因此,本文提出了一种自主机器人钻头的方法。该方法包括钻头路径规划和图像任务完成水平识别。路径规划根据每个离散点的任务完成水平调整钻头的z位置,并通过约束立方曲线插值形成3D钻孔路径,同时避免过度延伸。任务完成水平识别使用DSSD启发式的深度学习模型来估计每个离散点的任务完成水平。因为鸡蛋的外壳形状、厚度和机械性质与老鼠颅骨具有类似的特点,因此选择不破坏外层膜来去除蛋壳作为模拟任务。该方法使用一个带有钻头的6自由度机器人手臂进行评价,在20次试验中取得了80%的成功率。
https://arxiv.org/abs/2303.12265
At the core of bodily self-consciousness is the perception of the ownership of one's body. Recent efforts to gain a deeper understanding of the mechanisms behind the brain's encoding of the self-body have led to various attempts to develop a unified theoretical framework to explain related behavioral and neurophysiological phenomena. A central question to be explained is how body illusions such as the rubber hand illusion actually occur. Despite the conceptual descriptions of the mechanisms of bodily self-consciousness and the possible relevant brain areas, the existing theoretical models still lack an explanation of the computational mechanisms by which the brain encodes the perception of one's body and how our subjectively perceived body illusions can be generated by neural networks. Here we integrate the biological findings of bodily self-consciousness to propose a Brain-inspired bodily self-perception model, by which perceptions of bodily self can be autonomously constructed without any supervision signals. We successfully validated our computational model with six rubber hand illusion experiments on platforms including a iCub humanoid robot and simulated environments. The experimental results show that our model can not only well replicate the behavioral and neural data of monkeys in biological experiments, but also reasonably explain the causes and results of the rubber hand illusion from the neuronal level due to advantages in biological interpretability, thus contributing to the revealing of the computational and neural mechanisms underlying the occurrence of the rubber hand illusion.
在身体自我感知的核心,感知自己身体所有权的意识感知是主要的。最近,为了更深入地理解大脑自我身体编码的机制,人们尝试了各种方法,以期建立一个统一的理论框架来解释相关的行为和神经生理现象。一个关键问题是如何产生像橡皮手幻觉这样的身体错觉的。尽管对身体自我感知机制和可能相关的大脑区域进行了概念描述,但现有的理论模型仍然缺乏解释大脑中计算机制如何编码感知自己身体以及我们的主观感知身体错觉如何由神经网络产生。在这里,我们综合了身体自我感知的生理研究成果,提出了一个基于大脑的身体自我感知模型,该模型可以使自主地构建身体自我感知信号,而无需监督信号。我们成功验证我们的计算模型,在包括一个iCub机器人和模拟环境的平台上进行六次橡皮手幻觉实验,实验结果表明,我们的模型不仅可以在生物实验中复制猴子的行为和神经数据,而且由于生物解释性的优势,可以合理地解释橡皮手幻觉的产生原因和结果,从而为揭示产生橡皮手幻觉的计算和神经机制提供了贡献。
https://arxiv.org/abs/2303.12259
How spiking neuronal networks encode memories in their different time and spatial scales constitute a fundamental topic in neuroscience and neuro-inspired engineering. Much attention has been paid to large networks and long-term memory, for example in models of associative memory. Smaller circuit motifs may play an important complementary role on shorter time scales, where broader network effects may be of less relevance. Yet, compact computational models of spiking neural networks that exhibit short-term volatile memory and actively hold information until their energy source is switched off, seem not fully understood. Here we propose that small spiking neural circuit motifs may act as volatile memory components. A minimal motif consists of only two interconnected neurons -- one self-connected excitatory neuron and one inhibitory neuron -- and realizes a single-bit volatile memory. An excitatory, delayed self-connection promotes a bistable circuit in which a self-sustained periodic orbit generating spike trains co-exists with the quiescent state of no neuron spiking. Transient external inputs may straightforwardly induce switching between those states. Moreover, the inhibitory neuron may act as an autonomous turn-off switch. It integrates incoming excitatory pulses until a threshold is reached after which the inhibitory neuron emits a spike that then inhibits further spikes in the excitatory neuron, terminating the memory. Our results show how external bits of information (excitatory signal), can be actively held in memory for a pre-defined amount of time. We show that such memory operations are robust against parameter variations and exemplify how sequences of multidimensional input signals may control the dynamics of a many-bits memory circuit in a desired way.
突触触发神经元网络在不同时间和空间尺度上编码记忆构成了神经科学和神经灵感工程的基础话题。对大型网络和长期记忆的关注较多,例如结合记忆模型。小型电路主题可能在较短的时间尺度上扮演重要的补充角色,而更广泛的网络效应可能不那么相关。然而,具有短期易失性的突触触发神经元网络的紧凑计算模型,表现出短期记忆,并积极地保存信息直到能源源关闭,似乎尚未完全理解。在这里,我们建议,小型突触触发神经元电路主题可以被视为易失性记忆组件。一个最小的主题仅由两个相互连接的神经元组成--一个是自我连接的兴奋性神经元,另一个是抑制性神经元--并实现一个单比特的易失性记忆。兴奋性延迟自我连接促进一个双向电路,其中自维持的周期性轨道产生的 spike 序列与没有突触活动的静止状态共存。临时外部输入可能直接诱导这两个状态之间的切换。此外,抑制性神经元可以充当自主关闭开关。它将整合 incoming 兴奋性脉冲直到一个阈值被达到,然后抑制兴奋性神经元发放 spike,结束记忆。我们的结果显示,外部信息位(兴奋性信号)可以 actively 保持在预定时间内。我们证明,这样的记忆操作对参数变化具有鲁棒性,并举例说明了如何一组多通道输入信号可能以期望的方式控制一个许多比特记忆电路的动态。
https://arxiv.org/abs/2303.12225
Intelligent intersection managers can improve safety by detecting dangerous drivers or failure modes in autonomous vehicles, warning oncoming vehicles as they approach an intersection. In this work, we present FailureNet, a recurrent neural network trained end-to-end on trajectories of both nominal and reckless drivers in a scaled miniature city. FailureNet observes the poses of vehicles as they approach an intersection and detects whether a failure is present in the autonomy stack, warning cross-traffic of potentially dangerous drivers. FailureNet can accurately identify control failures, upstream perception errors, and speeding drivers, distinguishing them from nominal driving. The network is trained and deployed with autonomous vehicles in the MiniCity. Compared to speed or frequency-based predictors, FailureNet's recurrent neural network structure provides improved predictive power, yielding upwards of 84% accuracy when deployed on hardware.
智能交叉口管理器可以通过检测自动驾驶车辆中的危险驾驶员或失效模式,提高安全性,并在车辆接近交叉口时发出警告。在本研究中,我们介绍了失败网络(FailureNet),该网络通过对小型城市中的虚构和不负责任的驾驶员的行驶轨迹进行端到端的训练。失败网络在车辆接近交叉口时观察其姿态,并检测自动驾驶栈是否存在故障,同时警告交叉交通中的潜在的危险驾驶员。失败网络可以准确地识别控制故障、前向感知错误以及加速驾驶员,并将他们与虚构驾驶区分开来。该网络在小型城市中与自动驾驶车辆一起训练和部署。与基于速度或频率的预测相比,失败网络的循环神经网络结构提供了改进的预测能力,在硬件部署时可以达到84%的准确率。
https://arxiv.org/abs/2303.12224
Autonomous driving requires a comprehensive understanding of the surrounding environment for reliable trajectory planning. Previous works rely on dense rasterized scene representation (e.g., agent occupancy and semantic map) to perform planning, which is computationally intensive and misses the instance-level structure information. In this paper, we propose VAD, an end-to-end vectorized paradigm for autonomous driving, which models the driving scene as fully vectorized representation. The proposed vectorized paradigm has two significant advantages. On one hand, VAD exploits the vectorized agent motion and map elements as explicit instance-level planning constraints which effectively improves planning safety. On the other hand, VAD runs much faster than previous end-to-end planning methods by getting rid of computation-intensive rasterized representation and hand-designed post-processing steps. VAD achieves state-of-the-art end-to-end planning performance on the nuScenes dataset, outperforming the previous best method by a large margin (reducing the average collision rate by 48.4%). Besides, VAD greatly improves the inference speed (up to 9.3x), which is critical for the real-world deployment of an autonomous driving system. Code and models will be released for facilitating future research.
无人驾驶需要对周围环境进行全面的理解,以进行可靠的轨迹规划。以前的研究依赖于密集栅格化的场景表示(例如,Agent利用率和语义地图)进行规划,计算量巨大,缺少实例级结构信息。在本文中,我们提出了VAD,一种无人驾驶领域的端到端 vectorized 范式,将驾驶场景建模为完全 vectorized 表示。该 vectorized 范式具有两个显著的优势。一方面,VAD 利用 vectorized 的agent 运动和地图元素作为明确的实例级规划约束,有效地提高了规划安全性。另一方面,VAD 运行速度比以前的端到端规划方法快得多,通过消除计算量巨大的栅格化表示和手动设计的后处理步骤。在 nuiScenes 数据集上,VAD 实现了最先进的端到端规划性能,比以前的最好方法(降低平均碰撞率48.4%)表现更好(减少了平均碰撞率48.4%)。此外,VAD 大大提高了推理速度(高达 9.3x),这对于真实世界中部署无人驾驶系统至关重要。代码和模型将用于方便未来的研究。
https://arxiv.org/abs/2303.12077
Motion forecasting is a key module in an autonomous driving system. Due to the heterogeneous nature of multi-sourced input, multimodality in agent behavior, and low latency required by onboard deployment, this task is notoriously challenging. To cope with these difficulties, this paper proposes a novel agent-centric model with anchor-informed proposals for efficient multimodal motion prediction. We design a modality-agnostic strategy to concisely encode the complex input in a unified manner. We generate diverse proposals, fused with anchors bearing goal-oriented scene context, to induce multimodal prediction that covers a wide range of future trajectories. Our network architecture is highly uniform and succinct, leading to an efficient model amenable for real-world driving deployment. Experiments reveal that our agent-centric network compares favorably with the state-of-the-art methods in prediction accuracy, while achieving scene-centric level inference latency.
Motion forecasting是自动驾驶系统的关键模块。由于多种输入来源的异质性、Agent 行为的复杂性以及嵌入式部署所需的低延迟,这项任务众所周知地具有挑战性。为了应对这些困难,本文提出了一种具有锚 informed 提案的新型Agent centric模型,以高效多模式运动预测。我们设计了一种modality-agnostic策略,以简洁地统一编码复杂的输入。我们生成各种提案,与具有目标场景上下文的锚融合,以诱导涵盖广泛未来路径的多模式预测。我们的网络架构非常uniform 和简洁,导致一种适用于现实世界驾驶部署的效率模型。实验表明,我们的Agent centric网络在预测精度上与最先进的方法相媲美,同时实现了场景centric 水平推理延迟。
https://arxiv.org/abs/2303.12071
3D single object tracking in LiDAR point clouds (LiDAR SOT) plays a crucial role in autonomous driving. Current approaches all follow the Siamese paradigm based on appearance matching. However, LiDAR point clouds are usually textureless and incomplete, which hinders effective appearance matching. Besides, previous methods greatly overlook the critical motion clues among targets. In this work, beyond 3D Siamese tracking, we introduce a motion-centric paradigm to handle LiDAR SOT from a new perspective. Following this paradigm, we propose a matching-free two-stage tracker M^2-Track. At the 1st-stage, M^2-Track localizes the target within successive frames via motion transformation. Then it refines the target box through motion-assisted shape completion at the 2nd-stage. Due to the motion-centric nature, our method shows its impressive generalizability with limited training labels and provides good differentiability for end-to-end cycle training. This inspires us to explore semi-supervised LiDAR SOT by incorporating a pseudo-label-based motion augmentation and a self-supervised loss term. Under the fully-supervised setting, extensive experiments confirm that M^2-Track significantly outperforms previous state-of-the-arts on three large-scale datasets while running at 57FPS (~8%, ~17% and ~22% precision gains on KITTI, NuScenes, and Waymo Open Dataset respectively). While under the semi-supervised setting, our method performs on par with or even surpasses its fully-supervised counterpart using fewer than half labels from KITTI. Further analysis verifies each component's effectiveness and shows the motion-centric paradigm's promising potential for auto-labeling and unsupervised domain adaptation.
在激光雷达点云中的三维单物体跟踪(LiDAR SOT)在无人驾驶中扮演着关键角色。当前的方法都基于外观匹配,但LiDAR点云通常缺乏纹理和不完整,这阻碍了有效的外观匹配。此外,以前的方法严重忽略了目标之间的关键运动线索。在本文中,除了3D Siamese跟踪,我们引入了一种以运动为中心的范式,从新的角度处理LiDAR SOT。遵循这个范式,我们提出了一个无匹配的两步跟踪器M^2-Track。在第一个阶段,M^2-Track通过运动变换在相邻帧内定位目标。然后,在第二个阶段,它通过运动辅助的形状重构优化目标框。由于运动中心性质,我们的方法和 limited训练标签的情况下表现出令人印象深刻的泛化能力,并为端到端循环训练提供了良好的不同iability。这激励我们探索半监督的LiDAR SOT,通过添加伪标签的运动增强和自监督损失函数。在完全监督的情况下,广泛的实验确认M^2-Track在三个大规模数据集上显著优于以前的最高水平,同时运行在57FPS(KITTI、NuScenes和Waymo Open Dataset分别提高了~8%、~17%和~22%的精度)。在半监督的情况下,我们的方法和使用KITTI不到一半的标签数量的性能与它的完全监督对手相当或甚至超过了它。进一步的分析证实了每个组件的有效性,并展示了运动中心范式在自动 labeling和无监督域适应方面的潜力。
https://arxiv.org/abs/2303.12535
Planning for multi-robot teams in complex environments is a challenging problem, especially when these teams must coordinate to accomplish a common objective. In general, optimal solutions to these planning problems are computationally intractable, since the decision space grows exponentially with the number of robots. In this paper, we present a novel approach for multi-robot planning on topological graphs using mixed-integer programming. Central to our approach is the notion of a dynamic topological graph, where edge weights vary dynamically based on the locations of the robots in the graph. We construct this graph using the critical features of the planning problem and the relationships between robots; we then leverage mixed-integer programming to minimize a shared cost that depends on the paths of all robots through the graph. To improve computational tractability, we formulated an objective function with a fully convex relaxation and designed our decision space around eliminating the exponential dependence on the number of robots. We test our approach on a multi-robot reconnaissance scenario, where robots must coordinate to minimize detectability and maximize safety while gathering information. We demonstrate that our approach is able to scale to a series of representative scenarios and is capable of computing optimal coordinated strategic behaviors for autonomous multi-robot teams in seconds.
在复杂的环境中规划多机器人团队是一个挑战性的问题,特别是在这些团队必须协调完成共同目标时。通常情况下,对这些规划问题的最优解决方案是计算不可数的,因为决策空间随着机器人数量的增加呈指数级增长。在本文中,我们提出了一种使用整数混合编程来规划多机器人拓扑图的新方法。我们的 approach 的核心是动态拓扑图的概念,其中edge 权重根据图中的机器人位置动态变化。我们使用规划问题的关键特征和机器人之间的关系构建这个图;然后利用整数混合编程来最小化取决于所有机器人路径的共同成本。为了改善计算可计算性,我们制定了一个完全凸的松弛目标函数,并围绕着消除机器人数量对决策空间的指数依赖性来设计我们的决策空间。我们在一个多机器人侦察场景中测试了我们的 approach,在这个场景中,机器人必须在收集信息时协调,以最小化检测性和最大化安全性。我们证明,我们的 approach 能够扩展到一系列代表性场景,并在秒钟内计算自主多机器人团队的最优协调战略行为。
https://arxiv.org/abs/2303.11966
Soft robotics technology can aid in achieving United Nations Sustainable Development Goals (SDGs) and the Paris Climate Agreement through development of autonomous, environmentally responsible machines powered by renewable energy. By utilizing soft robotics, we can mitigate the detrimental effects of climate change on human society and the natural world through fostering adaptation, restoration, and remediation. Moreover, the implementation of soft robotics can lead to groundbreaking discoveries in material science, biology, control systems, energy efficiency, and sustainable manufacturing processes. However, to achieve these goals, we need further improvements in understanding biological principles at the basis of embodied and physical intelligence, environment-friendly materials, and energy-saving strategies to design and manufacture self-piloting and field-ready soft robots. This paper provides insights on how soft robotics can address the pressing issue of environmental sustainability. Sustainable manufacturing of soft robots at a large scale, exploring the potential of biodegradable and bioinspired materials, and integrating onboard renewable energy sources to promote autonomy and intelligence are some of the urgent challenges of this field that we discuss in this paper. Specifically, we will present field-ready soft robots that address targeted productive applications in urban farming, healthcare, land and ocean preservation, disaster remediation, and clean and affordable energy, thus supporting some of the SDGs. By embracing soft robotics as a solution, we can concretely support economic growth and sustainable industry, drive solutions for environment protection and clean energy, and improve overall health and well-being.
软机器人技术可以通过开发自主、环境负责任的机器,使用可再生能源驱动,来协助实现联合国可持续发展目标(SDGs)和巴黎气候协定。利用软机器人技术,我们可以通过促进适应、恢复和修复,减缓气候变化对人类和社会自然世界带来的有害影响。此外,实施软机器人技术还可以导致在材料科学、生物学、控制系统、能源效率和可持续制造进程中的开创性发现。但是,要实现这些目标,我们需要进一步改善理解生物体内性和身体智能、环境友好材料以及节省能源的战略,以设计和制造自主运行并field-ready的软机器人。本文提供了关于软机器人如何解决环境问题的重要见解。大规模生产软机器人的可持续性制造、探索可生物降解和生物启发材料的潜力,以及集成船上可再生能源,以促进自主和智能是该领域紧迫挑战之一,我们在此本文中讨论了这些问题。具体而言,我们将提供针对城市农业、医疗保健、土地和海洋保护、灾难恢复和清洁且价格合理的能源的目标生产性应用field-ready的软机器人,从而支持一些SDGs。通过拥抱软机器人作为解决方案,我们可以具体支持经济增长和可持续工业,推动环境保护和清洁能源的解决方案,并改善整体健康和福利。
https://arxiv.org/abs/2303.11931