Physics-based simulations have accelerated progress in robot learning for driving, manipulation, and locomotion. Yet, a fast, accurate, and robust surgical simulation environment remains a challenge. In this paper, we present ORBIT-Surgical, a physics-based surgical robot simulation framework with photorealistic rendering in NVIDIA Omniverse. We provide 14 benchmark surgical tasks for the da Vinci Research Kit (dVRK) and Smart Tissue Autonomous Robot (STAR) which represent common subtasks in surgical training. ORBIT-Surgical leverages GPU parallelization to train reinforcement learning and imitation learning algorithms to facilitate study of robot learning to augment human surgical skills. ORBIT-Surgical also facilitates realistic synthetic data generation for active perception tasks. We demonstrate ORBIT-Surgical sim-to-real transfer of learned policies onto a physical dVRK robot. Project website: this http URL
基于物理的机器人学习在驾驶、操作和移动方面已经取得了进展。然而,快速、准确和稳健的手术模拟环境仍然是一个挑战。在本文中,我们提出了ORBIT-Surgical,一个基于物理的手术机器人模拟框架,在NVIDIA Omniverse中实现光栅化渲染。我们为达芬奇研究工具包(dVRK)和智能组织自主机器人(STAR)提供了14个基准手术任务,这些任务代表了手术训练中常见的子任务。ORBIT-Surgical利用GPU并行训练强化学习和模仿学习算法,以促进研究机器人学习以提高人类手术技能。ORBIT-Surgical还促进了真实合成数据生成,用于主动感知任务。我们证明了ORBIT-Surgical将学习到的策略在物理dVRK机器人上实现模拟-到-实转。项目网站:这个链接
https://arxiv.org/abs/2404.16027
The livestock industry faces several challenges, including labor-intensive management, the threat of predators and environmental sustainability concerns. Therefore, this paper explores the integration of quadruped robots in extensive livestock farming as a novel application of field robotics. The SELF-AIR project, an acronym for Supporting Extensive Livestock Farming with the use of Autonomous Intelligent Robots, exemplifies this innovative approach. Through advanced sensors, artificial intelligence, and autonomous navigation systems, these robots exhibit remarkable capabilities in navigating diverse terrains, monitoring large herds, and aiding in various farming tasks. This work provides insight into the SELF-AIR project, presenting the lessons learned.
畜牧业面临着几个挑战,包括劳动密集管理、捕食者的威胁和环境可持续性问题。因此,本文探讨了在放养畜牧业中集成四足机器人的新型应用,作为场机器人技术的一种创新。SELF-AIR项目,即使用智能机器人支持放养畜牧业的缩写,体现了这种创新方法。通过先进的传感器、人工智能和自主导航系统,这些机器人表现出在复杂地形中导航、监测大规模群牛和协助各种农场任务的非凡能力。本工作提供了对SELF-AIR项目的了解,分享了其中的经验教训。
https://arxiv.org/abs/2404.16008
When a robot executes a task, it is necessary to model the relationship among its body, target objects, tools, and environment, and to control its body to realize the target state. However, it is difficult to model them using classical methods if the relationship is complex. In addition, when the relationship changes with time, it is necessary to deal with the temporal changes of the model. In this study, we have developed Deep Predictive Model with Parametric Bias (DPMPB) as a more human-like adaptive intelligence to deal with these modeling difficulties and temporal model changes. We categorize and summarize the theory of DPMPB and various task experiments on the actual robots, and discuss the effectiveness of DPMPB.
当机器人执行任务时,需要建模其身体、目标物体、工具和环境之间的关系,并控制其身体以实现目标状态。然而,使用经典方法建模这些关系会很难。此外,当关系随时间发生变化时,需要处理模型的时间变化。在本研究中,我们开发了具有参数偏差的深度预测模型(DPMPB)作为更人性化的自适应智能来处理这些建模困难以及随时间变化的模型。我们对DPMPB的理论进行了分类和总结,并讨论了DPMPB的有效性。
https://arxiv.org/abs/2404.15726
Cooperative Adaptive Cruise Control (CACC) represents a quintessential control strategy for orchestrating vehicular platoon movement within Connected and Automated Vehicle (CAV) systems, significantly enhancing traffic efficiency and reducing energy consumption. In recent years, the data-driven methods, such as reinforcement learning (RL), have been employed to address this task due to their significant advantages in terms of efficiency and flexibility. However, the delay issue, which often arises in real-world CACC systems, is rarely taken into account by current RL-based approaches. To tackle this problem, we propose a Delay-Aware Multi-Agent Reinforcement Learning (DAMARL) framework aimed at achieving safe and stable control for CACC. We model the entire decision-making process using a Multi-Agent Delay-Aware Markov Decision Process (MADA-MDP) and develop a centralized training with decentralized execution (CTDE) MARL framework for distributed control of CACC platoons. An attention mechanism-integrated policy network is introduced to enhance the performance of CAV communication and decision-making. Additionally, a velocity optimization model-based action filter is incorporated to further ensure the stability of the platoon. Experimental results across various delay conditions and platoon sizes demonstrate that our approach consistently outperforms baseline methods in terms of platoon safety, stability and overall performance.
合作自适应巡航控制(CACC)代表了一种在连接和自动驾驶车辆(CAV)系统中协调车辆编队运动的典型控制策略,显著提高了交通效率和降低了能源消耗。近年来,数据驱动的方法,如强化学习(RL),已经被采用来解决这个任务,因为它们在效率和灵活性方面具有显著优势。然而,当前基于RL的方法很少考虑到实世界CACC系统中经常出现的延迟问题。为了解决这个问题,我们提出了一个针对延迟敏感的多代理器强化学习(DAMARL)框架,旨在实现CACC的安全和稳定控制。我们使用多代理器延迟感知马尔可夫决策过程(MADA-MDP)来建模整个决策过程,并开发了一种集中训练和分布式执行(CTDE)的MARL框架,用于分布式控制CACC编队。引入了注意机制的策略网络,以提高CAV通信和决策的性能。此外,还引入了基于速度优化模型的动作滤波器,进一步确保编队的稳定性。在不同的延迟条件和编队大小等实验条件下,我们发现,我们的方法在编队安全、稳定和整体性能方面 consistently超过了基线方法。
https://arxiv.org/abs/2404.15696
Affordances, a concept rooted in ecological psychology and pioneered by James J. Gibson, have emerged as a fundamental framework for understanding the dynamic relationship between individuals and their environments. Expanding beyond traditional perceptual and cognitive paradigms, affordances represent the inherent effect and action possibilities that objects offer to the agents within a given context. As a theoretical lens, affordances bridge the gap between effect and action, providing a nuanced understanding of the connections between agents' actions on entities and the effect of these actions. In this study, we propose a model that unifies object, action and effect into a single latent representation in a common latent space that is shared between all affordances that we call the affordance space. Using this affordance space, our system is able to generate effect trajectories when action and object are given and is able to generate action trajectories when effect trajectories and objects are given. In the experiments, we showed that our model does not learn the behavior of each object but it learns the affordance relations shared by the objects that we call equivalences. In addition to simulated experiments, we showed that our model can be used for direct imitation in real world cases. We also propose affordances as a base for Cross Embodiment transfer to link the actions of different robots. Finally, we introduce selective loss as a solution that allows valid outputs to be generated for indeterministic model inputs.
Affordances,这个概念源于生态心理学,是由詹姆斯·J·吉布森(James J. Gibson)先驱性地提出的,已成为理解个体与其环境之间动态关系的坚实基础。它超越了传统的感知和认知范式,代表物体在特定环境中提供的潜在效果和行动可能性。作为一个理论透镜,affordances在效果和行为之间搭建了桥梁,提供了实体中代理商行动对实体和这些行动的影响的细微理解。在这项研究中,我们提出了一个将物体、行为和效果统一为单个潜在表示的模型,称为affordance空间。利用这个affordance空间,我们的系统能够在给定动作和物体时生成效果轨迹,能够在给定效果轨迹和物体时生成行为轨迹。在实验中,我们证明了我们的模型不仅学习了每个物体的行为,还学习了我们称之为等价物的物体之间的affordance关系。除了模拟实验之外,我们还证明了我们的模型可以在现实世界 case 直接仿写。最后,我们提出了affordance作为跨身体转移的基础,将不同机器人的行动联系起来。此外,我们还引入了选择性损失作为解决方案,允许为不确定模型输入生成有效的输出。
https://arxiv.org/abs/2404.15648
Tactile sensing has become a popular sensing modality for robot manipulators, due to the promise of providing robots with the ability to measure the rich contact information that gets transmitted through its sense of touch. Among the diverse range of information accessible from tactile sensors, torques transmitted from the grasped object to the fingers through extrinsic environmental contact may be particularly important for tasks such as object insertion. However, tactile torque estimation has received relatively little attention when compared to other sensing modalities, such as force, texture, or slip identification. In this work, we introduce the notion of the Tactile Dipole Moment, which we use to estimate tilt torques from gel-based visuotactile sensors. This method does not rely on deep learning, sensor-specific mechanical, or optical modeling, and instead takes inspiration from electromechanics to analyze the vector field produced from 2D marker displacements. Despite the simplicity of our technique, we demonstrate its ability to provide accurate torque readings over two different tactile sensors and three object geometries, and highlight its practicality for the task of USB stick insertion with a compliant robot arm. These results suggest that simple analytical calculations based on dipole moments can sufficiently extract physical quantities from visuotactile sensors.
触觉传感器已成为机器人操作器中的一种流行感测方式,因为它们能够提供机器人测量其触觉感知中传输的丰富接触信息的可能性。在触觉传感器的多样信息中,从握住物体到手指的力可以通过外环境接触传输,对于诸如插入物体这样的任务,这可能特别重要。然而,与力、纹理或滑移识别等其他感测方式相比,触觉扭矩估计并没有得到太多的关注。在这篇工作中,我们引入了触觉球极的概念,该概念使用基于凝胶的视觉触觉传感器来估计倾斜扭矩。这种方法不依赖于深度学习、传感器特定的机械或光学建模,而是从电磁学中获得灵感并分析由2D标记位移产生的矢量场。尽管我们的技术非常简单,但我们证明了它在两种不同的触觉传感器和三个不同物体形状之间提供准确扭矩读数的能力,并强调了它在USB棒插入的机器人手臂任务上的实用性。这些结果表明,基于球极的简单分析计算可以足够提取视觉触觉传感器中的物理量。
https://arxiv.org/abs/2404.15626
This paper proposes a decentralized trajectory planning framework for the collision avoidance problem of multiple micro aerial vehicles (MAVs) in environments with static and dynamic obstacles. The framework utilizes spatiotemporal occupancy grid maps (SOGM), which forecast the occupancy status of neighboring space in the near future, as the environment representation. Based on this representation, we extend the kinodynamic A* and the corridor-constrained trajectory optimization algorithms to efficiently tackle static and dynamic obstacles with arbitrary shapes. Collision avoidance between communicating robots is integrated by sharing planned trajectories and projecting them onto the SOGM. The simulation results show that our method achieves competitive performance against state-of-the-art methods in dynamic environments with different numbers and shapes of obstacles. Finally, the proposed method is validated in real experiments.
本文提出了一种分散式轨迹规划框架,用于解决具有静态和动态障碍物的环境中多个微型无人飞行器(MAVs)的碰撞避免问题。该框架利用了静态和动态占用网格图(SOGM),将预测周围空间邻居的占用状态作为环境表示。基于此表示,我们将动量惯性算法(Kinodynamic A*)和约束跟踪优化算法(Corridor-Constrained Trajectory Optimization)扩展到能够有效处理具有任意形状的静态和动态障碍物。通过共享计划轨迹并将其投影到SOGM,将碰撞避免集成到通信机器人之间。仿真结果表明,与其他方法相比,我们的方法在具有不同数量和形状的障碍物的动态环境中实现了竞争性的性能。最后,所提出的技术在实际实验中得到了验证。
https://arxiv.org/abs/2404.15602
Online planning for partially observable Markov decision processes (POMDPs) provides efficient techniques for robot decision-making under uncertainty. However, existing methods fall short of preventing safety violations in dynamic environments. This work presents a novel safe POMDP online planning approach that offers probabilistic safety guarantees amidst environments populated by multiple dynamic agents. Our approach utilizes data-driven trajectory prediction models of dynamic agents and applies Adaptive Conformal Prediction (ACP) for assessing the uncertainties in these predictions. Leveraging the obtained ACP-based trajectory predictions, our approach constructs safety shields on-the-fly to prevent unsafe actions within POMDP online planning. Through experimental evaluation in various dynamic environments using real-world pedestrian trajectory data, the proposed approach has been shown to effectively maintain probabilistic safety guarantees while accommodating up to hundreds of dynamic agents.
基于部分可观测的马尔可夫决策过程(POMDP)的在线规划为机器人在不确定性环境中的决策提供了有效的技术。然而,现有的方法尚不能在动态环境中防止安全违规。本文提出了一种新的安全POMDP在线规划方法,能在充满多个动态代理人的环境中提供概率安全保证。我们的方法利用动态代理数据的基于数据驱动的轨迹预测模型,并应用自适应收缩预测(ACP)来评估这些预测的不确定性。通过使用获得的ACP基于轨迹预测,我们的方法在在线规划过程中动态地构建安全屏蔽以防止不安全行为。通过使用真实世界行人轨迹数据在各种动态环境中进行实验评估,该方法已被证明在容纳多达数百个动态代理人的情况下,有效保持概率安全保证。
https://arxiv.org/abs/2404.15557
Mapping traversal costs in an environment and planning paths based on this map are important for autonomous navigation. We present a neurobotic navigation system that utilizes a Spiking Neural Network Wavefront Planner and E-prop learning to concurrently map and plan paths in a large and complex environment. We incorporate a novel method for mapping which, when combined with the Spiking Wavefront Planner, allows for adaptive planning by selectively considering any combination of costs. The system is tested on a mobile robot platform in an outdoor environment with obstacles and varying terrain. Results indicate that the system is capable of discerning features in the environment using three measures of cost, (1) energy expenditure by the wheels, (2) time spent in the presence of obstacles, and (3) terrain slope. In just twelve hours of online training, E-prop learns and incorporates traversal costs into the path planning maps by updating the delays in the Spiking Wavefront Planner. On simulated paths, the Spiking Wavefront Planner plans significantly shorter and lower cost paths than A* and RRT*. The spiking wavefront planner is compatible with neuromorphic hardware and could be used for applications requiring low size, weight, and power.
映射环境中的穿行成本并进行路径规划对于自主导航非常重要。我们提出了一个神经机器人导航系统,该系统利用Spiking Neural Network Wavefront Planner和E-prop学习在大型和复杂的环境中同时映射和规划路径。我们引入了一种新的映射方法,当与Spiking Wavefront Planner结合时,可以通过选择性地考虑任何成本组合来进行自适应规划。该系统在户外环境中的移动机器人平台上进行了测试,并遇到了障碍物和不同的地形。结果表明,系统能够通过三种成本指标(1)轮子消耗的能量,2)与障碍物相伴的时间,3)地形斜率来辨别环境特征。仅在在线训练的12小时内,E-prop就学会了将穿行成本纳入路径规划图,并通过更新Spiking Wavefront Planner中的延迟来完成。在模拟路径上,Spiking Wavefront Planner规划的路径比A*和RRT短得多,且成本更低。spiking wavefront planner与神经元硬件兼容,可以用于需要低尺寸、重量和功率的应用。
https://arxiv.org/abs/2404.15524
In this work, we aim to improve transparency and efficacy in human-robot collaboration by developing machine teaching algorithms suitable for groups with varied learning capabilities. While previous approaches focused on tailored approaches for teaching individuals, our method teaches teams with various compositions of diverse learners using team belief representations to address personalization challenges within groups. We investigate various group teaching strategies, such as focusing on individual beliefs or the group's collective beliefs, and assess their impact on learning robot policies for different team compositions. Our findings reveal that team belief strategies yield less variation in learning duration and better accommodate diverse teams compared to individual belief strategies, suggesting their suitability in mixed-proficiency settings with limited resources. Conversely, individual belief strategies provide a more uniform knowledge level, particularly effective for homogeneously inexperienced groups. Our study indicates that the teaching strategy's efficacy is significantly influenced by team composition and learner proficiency, highlighting the importance of real-time assessment of learner proficiency and adapting teaching approaches based on learner proficiency for optimal teaching outcomes.
在这项工作中,我们旨在通过开发适用于具有不同学习能力的团队的人工智能教学算法,提高人机协作的透明度和效率。与之前针对个人进行定制教学的方法不同,我们的方法通过团队信念表示来教授具有不同学习能力的团队,以解决团队内个人化挑战。我们研究了各种团队教学策略,如关注个人信念或团队信念,并评估它们对不同团队组合学习机器人政策的影响。我们的研究结果表明,团队信念策略在学习持续时间上产生较小差异,并且比个人信念策略更好地适应多样化的团队,表明在有限资源的情况下,这些策略非常适合混合熟练度环境。相反,个人信念策略提供了一个更加均匀的知识水平,特别是对于经验相同或相似的团队来说更为有效。我们的研究表明,教学策略的有效性显著受到团队构成和 learners proficiency的影响,强调了根据 learners proficiency 实时评估学习效果以及根据 learners proficiency 调整教学方法以实现最优教学成果的重要性。
https://arxiv.org/abs/2404.15472
This work investigates the potential of Reinforcement Learning (RL) to tackle robot motion planning challenges in the dynamic RoboCup Small Size League (SSL). Using a heuristic control approach, we evaluate RL's effectiveness in obstacle-free and single-obstacle path-planning environments. Ablation studies reveal significant performance improvements. Our method achieved a 60% time gain in obstacle-free environments compared to baseline algorithms. Additionally, our findings demonstrated dynamic obstacle avoidance capabilities, adeptly navigating around moving blocks. These findings highlight the potential of RL to enhance robot motion planning in the challenging and unpredictable SSL environment.
这项工作研究了强化学习(RL)解决机器人运动规划挑战在动态机器人杯小型联赛(SSL)中的潜在能力。我们使用一种启发式控制方法来评估RL在无障碍和单障碍路径规划环境中的效果。消融研究揭示了显著的性能提升。与基线算法相比,我们的方法在无障碍环境中取得了60%的性能提升。此外,我们的研究结果表明,RL具有动态避障能力,能够熟练地围绕移动障碍物进行导航。这些发现突出了RL在具有挑战性和不可预测性的SSL环境中增强机器人运动规划的潜力。
https://arxiv.org/abs/2404.15410
The majority of multi-agent path finding (MAPF) methods compute collision-free space-time paths which require agents to be at a specific location at a specific discretized timestep. However, executing these space-time paths directly on robotic systems is infeasible due to real-time execution differences (e.g. delays) which can lead to collisions. To combat this, current methods translate the space-time paths into a temporal plan graph (TPG) that only requires that agents observe the order in which they navigate through locations where their paths cross. However, planning space-time paths and then post-processing them into a TPG does not reduce the required agent-to-agent coordination, which is fixed once the space-time paths are computed. To that end, we propose a novel algorithm Space-Order CBS that can directly plan a TPG and explicitly minimize coordination. Our main theoretical insight is our novel perspective on viewing a TPG as a set of space-visitation order paths where agents visit locations in relative orders (e.g. 1st vs 2nd) as opposed to specific timesteps. We redefine unique conflicts and constraints for adapting CBS for space-order planning. We experimentally validate how Space-Order CBS can return TPGs which significantly reduce coordination, thus subsequently reducing the amount of agent-agent communication and leading to more robustness to delays during execution.
大多数多代理器路径规划(MAPF)方法计算碰撞免的空间-时间路径,这要求代理在特定的时间步长上位于特定的位置。然而,直接在机器人系统上执行这些空间-时间路径是不可行的,因为实时执行差异(例如延迟)可能导致碰撞。为了应对这个问题,现有方法将空间-时间路径转换为时间计划图(TPG),只需要要求代理观察他们通过位置相交的路径的顺序。然而,规划和处理空间-时间路径并将其转换为TPG并不能减少所需的代理与代理之间的协调,一旦空间-时间路径计算完成,这种协调就是固定的。因此,我们提出了一种新颖的算法——空间顺序CBS(Space-Order CBS),可以直接规划TPG,并明确最小化协调。我们主要的理论洞察是我们将TPG看作是一个相对位置访问顺序路径的集合,而不是特定的时间步。我们重新定义了为适应空间顺序计划而重新定义独特的冲突和约束。我们通过实验验证了Space-Order CBS如何返回具有显著降低协调的TPG,从而在后续减少代理与代理之间的通信,并导致在执行过程中延迟的减少。
https://arxiv.org/abs/2404.15137
Tactile and textile skin technologies have become increasingly important for enhancing human-robot interaction and allowing robots to adapt to different environments. Despite notable advancements, there are ongoing challenges in skin signal processing, particularly in achieving both accuracy and speed in dynamic touch sensing. This paper introduces a new framework that poses the touch sensing problem as an estimation problem of resistive sensory arrays. Utilizing a Regularized Least Squares objective function which estimates the resistance distribution of the skin. We enhance the touch sensing accuracy and mitigate the ghosting effects, where false or misleading touches may be registered. Furthermore, our study presents a streamlined skin design that simplifies manufacturing processes without sacrificing performance. Experimental outcomes substantiate the effectiveness of our method, showing 26.9% improvement in multi-touch force-sensing accuracy for the tactile skin.
触觉和纺织皮肤技术在增强人机交互和让机器人适应不同环境方面变得越来越重要。尽管已经取得了一定的进展,但在动态触摸感知的皮肤信号处理方面仍然存在 ongoing 的挑战,特别是在实现准确性和速度方面。本文提出了一种新的框架,将触摸感知的問題视为阻抗性傳感器阵列的估計問題。利用 Regularized Least Squares 目標函數,估計皮膚的阻抗分佈。我們通過增強触摸感知的準確性,減輕了幽靈效應,其中可以記錄到假設或誤導的触摸。此外, our 研究還呈現出一種簡化製造過程的皮膚設計,在保持性能的同时簡化了製造過程。實驗結果證實了我們的方法的有效性,在觸覺皮膚上 Multi-touch 力量感測的準確性提高了 26.9%。
https://arxiv.org/abs/2404.15131
Replicating the remarkable athleticism seen in animals has long been a challenge in robotics control. Although Reinforcement Learning (RL) has demonstrated significant progress in dynamic legged locomotion control, the substantial sim-to-real gap often hinders the real-world demonstration of truly dynamic movements. We propose a new framework to mitigate this gap through frequency-domain analysis-based impedance matching between simulated and real robots. Our framework offers a structured guideline for parameter selection and the range for dynamics randomization in simulation, thus facilitating a safe sim-to-real transfer. The learned policy using our framework enabled jumps across distances of 55 cm and heights of 38 cm. The results are, to the best of our knowledge, one of the highest and longest running jumps demonstrated by an RL-based control policy in a real quadruped robot. Note that the achieved jumping height is approximately 85% of that obtained from a state-of-the-art trajectory optimization method, which can be seen as the physical limit for the given robot hardware. In addition, our control policy accomplished stable walking at speeds up to 2 m/s in the forward and backward directions, and 1 m/s in the sideway direction.
复制动物在运动中的惊人 athletic 性一直是一个挑战,尤其是在机器人控制领域。虽然强化学习 (RL) 在动态腿履带运动控制方面取得了显著的进步,但巨大的模拟与现实之间的差距通常会阻碍在现实世界中真正动态运动的演示。我们提出了一种新的框架,通过基于频域分析的模拟与现实机器人之间的阻尼匹配来缓解这个差距。我们的框架为参数选择和动态随机化在模拟中的范围提供了结构化的指导,从而促进了安全的模拟到实体的转移。使用我们框架学习到的策略,跳跃距离达到了55厘米,高度达到了38厘米。据我们所知,这是基于 RL 的控制策略在实心四足机器人中实现的最高和最长的跳跃。需要注意的是,所达到的跳跃高度大约是先进轨迹优化方法得到的结果的85%,可以看出这是给定机器人硬件的物理极限。此外,我们的控制策略在前进和后退方向上实现了稳定的步行,速度达到2米/秒,而在侧面方向上实现了1米/秒的步行。
https://arxiv.org/abs/2404.15096
Teleoperation is a popular solution to remotely support highly automated vehicles through a human remote operator whenever a disengagement of the automated driving system is present. The remote operator wirelessly connects to the vehicle and solves the disengagement through support or substitution of automated driving functions and therefore enables the vehicle to resume automation. There are different approaches to support automated driving functions on various levels, commonly known as teleoperation concepts. A variety of teleoperation concepts is described in the literature, yet there has been no comprehensive and structured comparison of these concepts, and it is not clear what subset of teleoperation concepts is suitable to enable safe and efficient remote support of highly automated vehicles in a broad spectrum of disengagements. The following work establishes a basis for comparing teleoperation concepts through a literature overview on automated vehicle disengagements and on already conducted studies on the comparison of teleoperation concepts and metrics used to evaluate teleoperation performance. An evaluation of the teleoperation concepts is carried out in an expert workshop, comparing different teleoperation concepts using a selection of automated vehicle disengagement scenarios and metrics. Based on the workshop results, a set of teleoperation concepts is derived that can be used to address a wide variety of automated vehicle disengagements in a safe and efficient way.
遥控操作是通过一个远程操作员来支持高度自动化车辆的常见解决方案,在任何自动驾驶系统断开的情况下,都可以通过支持或替代自动驾驶功能来解决断开问题,从而使车辆重新进入自动化。在支持自动驾驶功能的不同级别上有不同的方法,通常称为遥控概念。文献中描述了各种遥控概念,然而,还没有对这些概念进行全面的结构比较,而且不清楚哪些遥控概念适合在广泛的断开范围内安全有效地支持高度自动化车辆。以下工作为比较遥控概念提供了一个基础,通过对自动驾驶断开和已经进行的研究进行文献回顾,对遥控概念和评价遥控性能的指标进行比较。在专家研讨会中,通过选择不同的自动驾驶断开场景和指标,对遥控概念进行评估。根据研讨会结果,得出了一组适用于各种自动驾驶断开的安全高效的遥控概念。
https://arxiv.org/abs/2404.15030
We propose a novel pipeline for unknown object grasping in shared robotic autonomy scenarios. State-of-the-art methods for fully autonomous scenarios are typically learning-based approaches optimised for a specific end-effector, that generate grasp poses directly from sensor input. In the domain of assistive robotics, we seek instead to utilise the user's cognitive abilities for enhanced satisfaction, grasping performance, and alignment with their high level task-specific goals. Given a pair of stereo images, we perform unknown object instance segmentation and generate a 3D reconstruction of the object of interest. In shared control, the user then guides the robot end-effector across a virtual hemisphere centered around the object to their desired approach direction. A physics-based grasp planner finds the most stable local grasp on the reconstruction, and finally the user is guided by shared control to this grasp. In experiments on the DLR EDAN platform, we report a grasp success rate of 87% for 10 unknown objects, and demonstrate the method's capability to grasp objects in structured clutter and from shelves.
我们提出了一个在共享机器人自主场景中解决未知物体抓取的新流程。在先进的全自动驾驶场景中,通常采用基于学习的优化方法,针对特定的末端设备生成直接的抓取姿态。在辅助机器人领域,我们寻求利用用户的认知能力来提高满足感、抓取性能以及与高层次任务目标的对齐。 给定一对立体图像,我们进行未知物体实例分割并生成物体感兴趣的3D复原。在共享控制下,用户 then 导引机器人末端Effector 穿越围绕物体的虚拟半球,以到达期望的接近方向。基于物理的抓取规划器找到重构中最具稳定性的局部抓取,最后用户通过共享控制找到这个抓取。 在德国 Frauncese 实验室的 EDAN 平台实验中,我们报告了10个未知物体的抓取成功率为87%,并展示了该方法在结构混乱和货架上的物体抓取能力。
https://arxiv.org/abs/2404.15001
The emergence of Large Vision Models (LVMs) is following in the footsteps of the recent prosperity of Large Language Models (LLMs) in following years. However, there's a noticeable gap in structured research applying LVMs to Human-Robot Interaction (HRI), despite extensive evidence supporting the efficacy of vision models in enhancing interactions between humans and robots. Recognizing the vast and anticipated potential, we introduce an initial design space that incorporates domain-specific LVMs, chosen for their superior performance over normal models. We delve into three primary dimensions: HRI contexts, vision-based tasks, and specific domains. The empirical validation was implemented among 15 experts across six evaluated metrics, showcasing the primary efficacy in relevant decision-making scenarios. We explore the process of ideation and potential application scenarios, envisioning this design space as a foundational guideline for future HRI system design, emphasizing accurate domain alignment and model selection.
大视图模型的出现是在大型语言模型(LLMs)在接下来的几年里繁荣昌盛的基础上。然而,在将LVMs应用于人机交互(HRI)领域方面, Structured research之间存在显著的空白,尽管在增强人类与机器人之间的互动方面,视觉模型的有效性已经得到了充分的证据支持。认识到LVMs的广泛和预期的潜在可能性,我们引入了一个初始设计空间,其中包含特定领域的LVMs,这些模型在正常模型中具有卓越的性能。我们深入研究三个主要方面:人机交互环境、基于视觉的任务和具体领域。在六个评估指标的15位专家的实证验证过程中进行了实际验证,展示了在相关决策场景中的主要有效性。我们探讨了创意过程和潜在应用场景,将此设计空间视为未来HRI系统设计的基石指南,强调准确领域对齐和模型选择。
https://arxiv.org/abs/2404.14965
A common prerequisite for evaluating a visual(-inertial) odometry (VO/VIO) algorithm is to align the timestamps and the reference frame of its estimated trajectory with a reference ground-truth derived from a system of superior precision, such as a motion capture system. The trajectory-based alignment, typically modeled as a classic hand-eye calibration, significantly influences the accuracy of evaluation metrics. However, traditional calibration methods are susceptible to the quality of the input poses. Few studies have taken this into account when evaluating VO/VIO trajectories that usually suffer from noise and drift. To fill this gap, we propose a novel spatiotemporal hand-eye calibration algorithm that fully leverages multiple constraints from screw theory for enhanced accuracy and robustness. Experimental results show that our algorithm has better performance and is less noise-prone than state-of-the-art methods.
评估视觉惯性导航算法(VO/VIO)的常见先决条件是将其估计轨迹的时标和参考帧与从高级精度系统(如运动捕捉系统)生成的参考地面参考系对齐。基于轨迹的对齐通常建模为经典的手眼校准,显著影响了评估指标的准确性。然而,传统的校准方法容易受到输入姿态质量的影响。在评估通常存在噪声和漂移的VO/VIO轨迹时,很少有研究考虑这一点。为了填补这一空白,我们提出了一个新颖的spatiotemporal hand-eye校准算法,它完全利用螺纹理论的多个约束以提高准确性和稳健性。实验结果表明,我们的算法具有更好的性能,并且比最先进的 methods噪声更小。
https://arxiv.org/abs/2404.14894
Dynamic obstacle avoidance is a popular research topic for autonomous systems, such as micro aerial vehicles and service robots. Accurately evaluating the performance of dynamic obstacle avoidance methods necessitates the establishment of a metric to quantify the environment's difficulty, a crucial aspect that remains unexplored. In this paper, we propose four metrics to measure the difficulty of dynamic environments. These metrics aim to comprehensively capture the influence of obstacles' number, size, velocity, and other factors on the difficulty. We compare the proposed metrics with existing static environment difficulty metrics and validate them through over 1.5 million trials in a customized simulator. This simulator excludes the effects of perception and control errors and supports different motion and gaze planners for obstacle avoidance. The results indicate that the survivability metric outperforms and establishes a monotonic relationship between the success rate, with a Spearman's Rank Correlation Coefficient (SRCC) of over 0.9. Specifically, for every planner, lower survivability leads to a higher success rate. This metric not only facilitates fair and comprehensive benchmarking but also provides insights for refining collision avoidance methods, thereby furthering the evolution of autonomous systems in dynamic environments.
动态避障是自动驾驶系统和服务机器人的一个热门研究课题。准确评估动态避障方法的性能需要建立一个指标来量化环境的难度,这是的一个重要方面,但尚未被探索。在本文中,我们提出了四个指标来衡量动态环境的难度。这些指标旨在全面捕捉障碍物数量、大小、速度和其他因素对难度的影响。我们将所提出的指标与现有的静态环境难度指标进行比较,并通过在定制仿真器上进行超过150000次试验来验证它们。这个仿真器排除了感知和控制误差的影响,支持不同避障规划的运动和视觉计划。结果表明,生存能力指标超过了传统的避障方法,并建立了成功率与幸存能力之间的单调关系,相关系数(SRCC)超过0.9。具体来说,对于每个规划器,较低的生存能力会导致更高的成功率。这个指标不仅促进了公平和全面的基准测试,还为改进避障方法提供了洞察,从而进一步推动自动驾驶系统在动态环境中的发展。
https://arxiv.org/abs/2404.14848
This research addresses the challenge of estimating bathymetry from imaging sonars where the state-of-the-art works have primarily relied on either supervised learning with ground-truth labels or surface rendering based on the Lambertian assumption. In this letter, we propose a novel, self-supervised framework based on volume rendering for reconstructing bathymetry using forward-looking sonar (FLS) data collected during standard surveys. We represent the seafloor as a neural heightmap encapsulated with a parametric multi-resolution hash encoding scheme and model the sonar measurements with a differentiable renderer using sonar volumetric rendering employed with hierarchical sampling techniques. Additionally, we model the horizontal and vertical beam patterns and estimate them jointly with the bathymetry. We evaluate the proposed method quantitatively on simulation and field data collected by remotely operated vehicles (ROVs) during low-altitude surveys. Results show that the proposed method outperforms the current state-of-the-art approaches that use imaging sonars for seabed mapping. We also demonstrate that the proposed approach can potentially be used to increase the resolution of a low-resolution prior map with FLS data from low-altitude surveys.
这项研究解决了从成像声纳中估计海底地形这一挑战,因为最先进的工作主要依赖于监督学习或基于Lambertian假设的表面渲染。在本文中,我们提出了一个新颖的、自监督的框架,基于体积渲染,用于通过标准调查期间收集的前向声纳数据(FLS)重构海底地形。我们将海底被视为一个参数多分辨率哈希编码方案捕获的神经高度图,并使用采用分层采样技术展开的声纳体积渲染模型来建模声纳测量。此外,我们还建模水平和垂直束模式,并与其共同估计海底地形。我们对使用遥控操作车辆(ROVs)在低空调查期间收集的模拟和现场数据进行定量评估。结果表明,与使用成像声纳进行海底映射的现有最佳方法相比,所提出的方法表现优异。我们还证明了这种方法有可能用于从低空调查中增加低分辨率先验图的分辨率。
https://arxiv.org/abs/2404.14819