Thanks to the explosive developments of data-driven learning methodologies recently, reinforcement learning (RL) emerges as a promising solution to address the legged locomotion problem in robotics. In this manuscript, we propose a novel concurrent teacher-student reinforcement learning architecture for legged locomotion over challenging terrains, based only on proprioceptive measurements in real-world deployment. Different from convectional teacher-student architecture that trains the teacher policy via RL and transfers the knowledge to the student policy through supervised learning, our proposed architecture trains teacher and student policy networks concurrently under the reinforcement learning paradigm. To achieve this, we develop a new training scheme based on conventional proximal policy gradient (PPO) method to accommodate the interaction between teacher policy network and student policy network. The effectiveness of the proposed architecture as well as the new training scheme is demonstrated through extensive indoor and outdoor experiments on quadrupedal robots and point-foot bipedal robot, showcasing robust locomotion over challenging terrains and improved performance compared to two-stage training methods.
感谢数据驱动学习方法论的爆炸性发展,强化学习(RL)在机器人学中解决腿行问题变得具有前景。在本文中,我们提出了一个新颖的并行教师-学生强化学习架构,用于解决具有挑战性地形的三足机器人。与通过RL训练教师策略并通过监督学习将知识传递给学生策略的传热器教师-学生架构不同,我们提出的架构在强化学习范式下训练教师和学生策略网络的同时。为了实现这一目标,我们开发了一种基于传统近端策略梯度(PPO)方法的新训练方案,以适应教师策略网络和学生策略网络之间的交互。通过在室内和室外对四足机器人和点脚步行机器人进行广泛的实验,证明所提出的架构的有效性和新训练方案的优越性,展示了在具有挑战性地形下的稳健运动和与两阶段训练方法相比的性能提升。
https://arxiv.org/abs/2405.10830
Place recognition is a fundamental task for robotic application, allowing robots to perform loop closure detection within simultaneous localization and mapping (SLAM), and achieve relocalization on prior maps. Current range image-based networks use single-column convolution to maintain feature invariance to shifts in image columns caused by LiDAR viewpoint change.However, this raises the issues such as "restricted receptive fields" and "excessive focus on local regions", degrading the performance of networks. To address the aforementioned issues, we propose a lightweight circular convolutional Transformer network denoted as CCTNet, which boosts performance by capturing structural information in point clouds and facilitating crossdimensional interaction of spatial and channel information. Initially, a Circular Convolution Module (CCM) is introduced, expanding the network's perceptual field while maintaining feature consistency across varying LiDAR perspectives. Then, a Range Transformer Module (RTM) is proposed, which enhances place recognition accuracy in scenarios with movable objects by employing a combination of channel and spatial attention mechanisms. Furthermore, we propose an Overlap-based loss function, transforming the place recognition task from a binary loop closure classification into a regression problem linked to the overlap between LiDAR frames. Through extensive experiments on the KITTI and Ford Campus datasets, CCTNet surpasses comparable methods, achieving Recall@1 of 0.924 and 0.965, and Recall@1% of 0.990 and 0.993 on the test set, showcasing a superior performance. Results on the selfcollected dataset further demonstrate the proposed method's potential for practical implementation in complex scenarios to handle movable objects, showing improved generalization in various datasets.
定位是一个机器人应用的基本任务,使机器人能够在同时定位和映射(SLAM)过程中执行闭环检测,并在先验地图上实现重新定位。目前,基于范围图像的网络使用单列卷积来保持特征不变,以应对由于激光雷达视点变化引起的图像列的位移。然而,这导致了诸如“受限制的接收域”和“过度关注局部区域”等问题,降低了网络的性能。为了应对上述问题,我们提出了一个轻量级的环状卷积Transformer网络,称为CCTNet,通过捕获点云中的结构信息并促进空间和通道信息的跨维度交互来提高性能。首先,引入环状卷积模块(CCM),在扩展网络的感知场的同时保持特征一致性,在不同的激光雷达视点下保持特征一致性。接着,我们提出了一个范围Transformer模块(RTM),通过结合通道和空间注意机制,在可移动物体场景中提高地点识别准确性。此外,我们提出了一个基于重叠损失函数的地点识别问题,将二进制环闭合分类问题转化为与激光雷达帧之间的重叠的回归问题。通过在KITTI和福特校园数据集上的广泛实验,CCTNet超越了类似方法,实现了召回率@1为0.924和0.965,以及召回率@1%为0.990和0.993在测试集上的成绩。结果在自收集数据集上进一步证明了该方法在复杂场景中进行实际应用的潜力,各种数据集上的泛化能力得到提高。
https://arxiv.org/abs/2405.10793
In this paper, we propose an optimization based SLAM approach to simultaneously optimize the robot trajectory and the occupancy map using 2D laser scans (and odometry) information. The key novelty is that the robot poses and the occupancy map are optimized together, which is significantly different from existing occupancy mapping strategies where the robot poses need to be obtained first before the map can be estimated. In our formulation, the map is represented as a continuous occupancy map where each 2D point in the environment has a corresponding evidence value. The Occupancy-SLAM problem is formulated as an optimization problem where the variables include all the robot poses and the occupancy values at the selected discrete grid cell nodes. We propose a variation of Gauss-Newton method to solve this new formulated problem, obtaining the optimized occupancy map and robot trajectory together with their uncertainties. Our algorithm is an offline approach since it is based on batch optimization and the number of variables involved is large. Evaluations using simulations and publicly available practical 2D laser datasets demonstrate that the proposed approach can estimate the maps and robot trajectories more accurately than the state-of-the-art techniques, when a relatively accurate initial guess is provided to our algorithm. The video shows the convergence process of the proposed Occupancy-SLAM and comparison of results to Cartographer can be found at \url{this https URL}.
在本文中,我们提出了一种基于SLAM的优化方法,同时利用2D激光扫描(和姿态估计)信息来同时优化机器人的轨迹和占有率图。这种新颖之处在于,机器人和占有率图是在一起优化的,这大大不同于现有的占有率图策略,其中机器人姿态需要在地图估计之前获得。在我们的形式化中,地图表示为一个连续的占有率图,其中环境中的每个2D点都有相应的证据值。占有率-SLAM问题被形式化为一个优化问题,其中变量包括所有机器人姿态和所选离散网格单元格的占有率值。我们提出了一种基于高斯-牛顿方法的变体来解决这个新定义的问题,获得最优的占有率图和机器人轨迹,并伴以其不确定性。我们的算法是一种离线方法,因为它基于批量优化,所涉及到的变量数量较大。通过使用仿真和公开可用的2D激光数据集进行评估,证明了与最先进的技术相比,所提出的算法更准确地估计地图和机器人轨迹。视频展示了所提出的占有率-SLAM的收敛过程以及与Cartographer的比较结果,您可以点击以下链接查看:https://this URL。
https://arxiv.org/abs/2405.10743
Safe navigation in unknown environments stands as a significant challenge in the field of robotics. Control Barrier Function (CBF) is a strong mathematical tool to guarantee safety requirements. However, a common assumption in many works is that the CBF is already known and obstacles have predefined shapes. In this letter, we present a novel method called Occupancy Grid Map-based Control Barrier Function (OGM-CBF), which defines Control Barrier Function based on Occupancy Grid Maps. This enables generalization to unknown environments while generating online local or global maps of the environment using onboard perception sensors such as LiDAR or camera. With this method, the system guarantees safety via a single, continuously differentiable CBF per time step, which can be represented as one constraint in the CBF-QP optimization formulation while having an arbitrary number of obstacles with unknown shapes in the environment. This enables practical real-time implementation of CBF in both unknown and known environments. The efficacy of OGM-CBF is demonstrated in the safe control of an autonomous car in the CARLA simulator and a real-world industrial mobile robot.
在机器人领域,安全导航是一个重要的挑战。控制障碍功能(CBF)是一种强大的数学工具,用于确保安全要求。然而,许多作品中的常见假设是CBF已经已知,障碍具有预定义的形状。在本文中,我们提出了一种名为基于占用网格映射的控制障碍功能(OGM-CBF)的新方法,该方法基于占用网格映射定义控制障碍功能。这使得可以在未知的环境中进行泛化,并通过车载感知传感器(如激光雷达或摄像头)生成环境中的在线局部或全局地图。通过这种方法,系统通过每个时间步的单一致可导的控制障碍函数保证安全性,该函数可以表示为CBF-QP优化形式中的一个约束,同时环境中具有任意形状的障碍物数量是未知的。这使得可以在未知和已知环境中实现CBF的实际实时实现。OGM-CBF的有效性在CARLA模拟器和现实工业移动机器人中对自动驾驶汽车的安全控制中得到了演示。
https://arxiv.org/abs/2405.10703
The escalating volumes of textile waste globally necessitate innovative waste management solutions to mitigate the environmental impact and promote sustainability in the fashion industry. This paper addresses the inefficiencies of traditional textile sorting methods by introducing an autonomous textile analysis pipeline. Utilising robotics, spectral imaging, and AI-driven classification, our system enhances the accuracy, efficiency, and scalability of textile sorting processes, contributing to a more sustainable and circular approach to waste management. The integration of a Digital Twin system further allows critical evaluation of technical and economic feasibility, providing valuable insights into the sorting system's accuracy and reliability. The proposed framework, inspired by Industry 4.0 principles, comprises five interconnected layers facilitating seamless data exchange and coordination within the system. Preliminary results highlight the potential of our holistic approach to mitigate environmental impact and foster a positive shift towards recycling in the textile industry.
全球纺织品垃圾量的不断增加迫使我们需要创新垃圾管理解决方案,减轻对环境的影响,推动可持续时尚行业的可持续发展。本文通过引入自主纺织品分析流水线解决了传统纺织品分类方法的不效率。利用机器人、光谱成像和AI驱动分类,我们的系统提高了纺织品分类过程的准确性、效率和可扩展性,有助于实现更可持续和循环的废物管理方法。引入数字孪生系统进一步允许对技术和经济可行性进行关键评估,为废物管理系统提供了宝贵的见解,并提高了系统的准确性。我们提出的框架,受到 Industry 4.0原则的启发,由五个相互连接的层组成,促进了系统内无缝数据交流和协调。初步结果强调了我们整体方法减少环境影响和促进纺织业回收的潜力。
https://arxiv.org/abs/2405.10696
Robotic systems driven by artificial muscles present unique challenges due to the nonlinear dynamics of actuators and the complex designs of mechanical structures. Traditional model-based controllers often struggle to achieve desired control performance in such systems. Deep reinforcement learning (DRL), a trending machine learning technique widely adopted in robot control, offers a promising alternative. However, integrating DRL into these robotic systems faces significant challenges, including the requirement for large amounts of training data and the inevitable sim-to-real gap when deployed to real-world robots. This paper proposes an efficient reinforcement learning control framework with sim-to-real transfer to address these challenges. Bootstrap and augmentation enhancements are designed to improve the data efficiency of baseline DRL algorithms, while a sim-to-real transfer technique, namely randomization of muscle dynamics, is adopted to bridge the gap between simulation and real-world deployment. Extensive experiments and ablation studies are conducted utilizing two string-type artificial muscle-driven robotic systems including a two degree-of-freedom robotic eye and a parallel robotic wrist, the results of which demonstrate the effectiveness of the proposed learning control strategy.
由于执行器和机械结构的非线性动力学以及复杂设计,基于模型的控制器往往难以在此类系统中实现期望的控制性能。深度强化学习(DRL)作为一种热门的机器学习技术,为机器人控制提供了有前景的替代方案。然而,将DRL集成到这些机器人系统面临着巨大的挑战,包括需要大量训练数据以及将模型部署到现实世界机器人时不可避免的模拟-真实世界差距。本文提出了一种有效的强化学习控制框架,通过模拟-真实世界迁移来解决这些挑战。通过 bootstrap 和 augmentation 增强基线 DRL 算法提高数据效率,同时采用一种称为肌肉动力随机化技术的 sim-to-real 迁移方法来弥合模拟和真实世界部署之间的差距。使用两种基于人工肌肉的机器人系统,包括具有两个自由度的机器人眼和一个并行机器人手腕,进行了广泛的实验和消融研究。实验结果表明,所提出的学习控制策略的有效性得到了充分验证。
https://arxiv.org/abs/2405.10576
The exploration of under-ice environments presents unique challenges due to limited access for scientific research. This report investigates the potential of deploying a fully actuated Remotely Operated Vehicle (ROV) for shallow area exploration beneath ice sheets. Leveraging advancements in marine robotics technology, ROVs offer a promising solution for extending human presence into remote underwater locations. To enable successful under-ice exploration, the ROV must follow precise trajectories for effective localization signal reception. This study develops a multi-input-multi-output (MIMO) nonlinear system controller, incorporating a Lyapunov-based stability guarantee and an adaptation law to mitigate unknown environmental disturbances. Fuzzy logic is employed to dynamically adjust adaptation rates, enhancing performance in highly nonlinear ROV dynamic systems. Additionally, a Particle Swarm Optimization (PSO) algorithm automates the tuning of controller parameters for optimal trajectory tracking. The report details the ROV dynamic model, the proposed control framework, and the PSO-based tuning process. Simulation-based experiments validate the efficacy of the methodology, with experimental results demonstrating superior trajectory tracking performance compared to baseline controllers. This work contributes to the advancement of under-ice exploration capabilities and sets the stage for future research in marine robotics and autonomous underwater systems.
由于科学研究的限制,对冰下环境的探索面临着独特的挑战。本报告调查了在冰层下部署全 actuated Remotely Operated Vehicle(ROV)进行浅水区域探索的可能性。利用海洋机器人技术的发展,ROV 提供了将人类 presence扩展到遥远水下地点的有前景的解决方案。为了实现成功的冰下探索,ROV 必须遵循有效的局部定位信号接收轨迹。本研究开发了一个多输入多输出(MIMO)非线性系统控制器,包括基于 Lipschitz 稳定性保证和自适应律来减轻未知环境干扰的功能。模糊逻辑被用于动态调整自适应速率,提高高度非线性的 ROV 动态系统的性能。此外,粒子群优化(PSO)算法自动调整控制器参数以实现最优轨迹跟踪。报告详细介绍了 ROV 动态模型、所提出的控制框架和 PSO 基于调整过程。基于模拟的实验验证了该方法的有效性,实验结果表明,与基线控制器相比,轨迹跟踪性能具有卓越的优势。这项工作为冰下探索能力的提升做出了贡献,并为未来海洋机器人学和自主水下系统研究奠定了基础。
https://arxiv.org/abs/2405.10441
We demonstrate the capabilities of an attention-based end-to-end approach for high-speed quadrotor obstacle avoidance in dense, cluttered environments, with comparison to various state-of-the-art architectures. Quadrotor unmanned aerial vehicles (UAVs) have tremendous maneuverability when flown fast; however, as flight speed increases, traditional vision-based navigation via independent mapping, planning, and control modules breaks down due to increased sensor noise, compounding errors, and increased processing latency. Thus, learning-based, end-to-end planning and control networks have shown to be effective for online control of these fast robots through cluttered environments. We train and compare convolutional, U-Net, and recurrent architectures against vision transformer models for depth-based end-to-end control, in a photorealistic, high-physics-fidelity simulator as well as in hardware, and observe that the attention-based models are more effective as quadrotor speeds increase, while recurrent models with many layers provide smoother commands at lower speeds. To the best of our knowledge, this is the first work to utilize vision transformers for end-to-end vision-based quadrotor control.
我们证明了基于注意力的端到端方法在高速 quadrotor 避障中的能力,与各种先进的架构进行比较。当 quadrotor 飞行速度增加时,传统的通过独立映射、规划和控制模块的视觉基线导航会因为传感器噪声、复合误差和处理延迟的增加而失效。因此,基于学习的端到端规划与控制网络在复杂环境中通过这些快速机器人进行在线控制已经取得了成功。我们用训练和比较卷积神经网络、U-Net 和循环神经网络与视觉变换器模型进行深度基于端到端控制,在光实时模拟器和硬件上进行实验,并观察到,随着 quadrotor 速度的增加,基于注意力的模型效果更好,而具有多层递归的循环模型在较低速度下提供更平滑的指令。据我们所知,这是第一个利用视觉变换器进行端到端高速 quadrotor 控制的工作。
https://arxiv.org/abs/2405.10391
Learning in simulation and transferring the learned policy to the real world has the potential to enable generalist robots. The key challenge of this approach is to address simulation-to-reality (sim-to-real) gaps. Previous methods often require domain-specific knowledge a priori. We argue that a straightforward way to obtain such knowledge is by asking humans to observe and assist robot policy execution in the real world. The robots can then learn from humans to close various sim-to-real gaps. We propose TRANSIC, a data-driven approach to enable successful sim-to-real transfer based on a human-in-the-loop framework. TRANSIC allows humans to augment simulation policies to overcome various unmodeled sim-to-real gaps holistically through intervention and online correction. Residual policies can be learned from human corrections and integrated with simulation policies for autonomous execution. We show that our approach can achieve successful sim-to-real transfer in complex and contact-rich manipulation tasks such as furniture assembly. Through synergistic integration of policies learned in simulation and from humans, TRANSIC is effective as a holistic approach to addressing various, often coexisting sim-to-real gaps. It displays attractive properties such as scaling with human effort. Videos and code are available at this https URL
在模拟中学习和将学到的策略应用于现实世界具有实现通用机器人的潜力。这种方法的关键挑战是解决模拟与现实之间的差距(sim-to-real gaps)。之前的方法通常需要先验的知识领域特定知识。我们认为,获得这种知识的最直接方法是让人类在现实生活中观察和辅助机器人策略执行。机器人可以从人类那里学习以填补各种模拟与现实之间的差距。我们提出了TRANSIC,一种基于人类在环框架的数据驱动方法,以实现基于人类在环的模拟与现实之间的成功转移。TRANSIC允许人类通过干预和在线纠错来通过各种未建模的模拟与现实之间的差距来扩展模拟策略。残余策略可以从人类的纠正中学习,并将其与模拟策略集成以实现自主执行。我们证明了,在我们的方法下,可以实现成功的模拟与现实之间的转移,特别是在复杂的接触操作任务中,如家具组装。通过模拟策略和学习人类策略的协同作用,TRANSIC是一种有效的全面方法来解决各种经常存在的模拟与现实之间的差距。它具有可扩展 human effort 的特点。视频和代码可以通过这个链接https://www.youtube.com/watch?v=获取:
https://arxiv.org/abs/2405.10315
Point cloud segmentation (PCS) plays an essential role in robot perception and navigation tasks. To efficiently understand large-scale outdoor point clouds, their range image representation is commonly adopted. This image-like representation is compact and structured, making range image-based PCS models practical. However, undesirable missing values in the range images damage the shapes and patterns of objects. This problem creates difficulty for the models in learning coherent and complete geometric information from the objects. Consequently, the PCS models only achieve inferior performance. Delving deeply into this issue, we find that the use of unreasonable projection approaches and deskewing scans mainly leads to unwanted missing values in the range images. Besides, almost all previous works fail to consider filling in the unexpected missing values in the PCS task. To alleviate this problem, we first propose a new projection method, namely scan unfolding++ (SU++), to avoid massive missing values in the generated range images. Then, we introduce a simple yet effective approach, namely range-dependent $K$-nearest neighbor interpolation ($K$NNI), to further fill in missing values. Finally, we introduce the Filling Missing Values Network (FMVNet) and Fast FMVNet. Extensive experimental results on SemanticKITTI, SemanticPOSS, and nuScenes datasets demonstrate that by employing the proposed SU++ and $K$NNI, existing range image-based PCS models consistently achieve better performance than the baseline models. Besides, both FMVNet and Fast FMVNet achieve state-of-the-art performance in terms of the speed-accuracy trade-off. The proposed methods can be applied to other range image-based tasks and practical applications.
点云分割(PCS)在机器人感知和导航任务中发挥着关键作用。为了有效地理解大型户外点云,通常采用它们的范围图像表示。这种图像似表示紧凑且结构化,使得基于范围图像的PCS模型具有实际应用价值。然而,范围图像中的不良缺失值破坏了物体的形状和图案。这个问题使得模型在从物体中学习连贯和完整的几何信息方面遇到了困难。因此,PCS模型只实现了较差的性能。 深入研究这个问题后,我们发现,使用不合理的投影方法和倾斜扫描主要导致范围图像中的不良缺失值。此外,几乎所有以前的工作都没有考虑在PCS任务中填充意外的缺失值。为了解决这个问题,我们首先提出了一种新的投影方法,即扫描展开++(SU++),以避免在生成的范围图像中出现大规模的缺失值。然后,我们引入了一种简单而有效的方法,即基于范围的K-最近邻插值(KNNI)来进一步填充缺失值。最后,我们引入了填充缺失值网络(FMVNet)和快速FMVNet。在SemanticKITTI、SemanticPOSS和nuScenes数据集上的大量实验结果表明,通过采用所提出的SU++和KNNI,现有的范围图像基于PCS模型在性能上始终优于基线模型。此外,FMVNet和快速FMVNet在速度与准确度权衡方面都实现了最先进的性能。所提出的方法可以应用于其他范围图像基于任务和实际应用。
https://arxiv.org/abs/2405.10175
Active reconstruction technique enables robots to autonomously collect scene data for full coverage, relieving users from tedious and time-consuming data capturing process. However, designed based on unsuitable scene representations, existing methods show unrealistic reconstruction results or the inability of online quality evaluation. Due to the recent advancements in explicit radiance field technology, online active high-fidelity reconstruction has become achievable. In this paper, we propose GS-Planner, a planning framework for active high-fidelity reconstruction using 3D Gaussian Splatting. With improvement on 3DGS to recognize unobserved regions, we evaluate the reconstruction quality and completeness of 3DGS map online to guide the robot. Then we design a sampling-based active reconstruction strategy to explore the unobserved areas and improve the reconstruction geometric and textural quality. To establish a complete robot active reconstruction system, we choose quadrotor as the robotic platform for its high agility. Then we devise a safety constraint with 3DGS to generate executable trajectories for quadrotor navigation in the 3DGS map. To validate the effectiveness of our method, we conduct extensive experiments and ablation studies in highly realistic simulation scenes.
主动重建技术使机器人能够自主收集场景数据以实现全面覆盖,从而摆脱用户从繁琐且耗时的数据捕捉过程。然而,由于不合适的场景表示,现有的方法表现出不现实的重构结果或在线质量评估无能为力。由于最近在显式辐射场技术方面的进步,在线主动高保真度重构已经成为可能。在本文中,我们提出了GS-Planner,一种使用3D高斯平铺进行主动高保真度重构的规划框架。通过提高3DGS以识别未观察到的区域,我们在线评估了3DGS地图的重建质量和完整性,以指导机器人的行动。然后我们设计了一种基于采样的主动重构策略,以探索未观察到的区域并提高重建几何和纹理质量。为了建立完整的机器人主动重建系统,我们选择了四旋翼作为机器人平台,因为它具有高度的敏捷性。然后我们通过3DGS生成机器人导航在3DGS地图上的可执行轨迹的安全约束。为了验证我们方法的有效性,我们在具有高度逼真的仿真场景中进行了广泛的实验和消融研究。
https://arxiv.org/abs/2405.10142
There are various desired capabilities to create aerial forest-traversing robots capable of monitoring both biological and abiotic data. The features range from multi-functionality, robustness, and adaptability. These robots have to weather turbulent winds and various obstacles such as forest flora and wildlife thus amplifying the complexity of operating in such uncertain environments. The key for successful data collection is the flexibility to intermittently move from tree-to-tree, in order to perch at vantage locations for elongated time. This effort to perch not only reduces the disturbance caused by multi-rotor systems during data collection, but also allows the system to rest and recharge for longer outdoor missions. Current systems feature the addition of perching modules that increase the aerial robots' weight and reduce the drone's overall endurance. Thus in our work, the key questions currently studied are: "How do we develop a single robot capable of metamorphosing its body for multi-modal flight and dynamic perching?", "How do we detect and land on perchable objects robustly and dynamically?", and "What important spatial-temporal data is important for us to collect?"
有许多创建能够监测生物和环境数据并穿越森林的 aerial 森林穿越机器人的期望功能。这些功能包括多功能性、稳健性和适应性。这些机器人必须应对动荡的风和各种障碍,例如森林植物和野生动物,从而增加了在如此不确定的环境中操作的复杂性。成功数据收集的关键在于可以在树木间间歇移动,以便在长时间的攀爬过程中停留在优势位置。这种在树上停留的努力不仅减少了多旋翼系统在数据收集过程中产生的干扰,而且还允许系统休息和充电以执行更长的户外任务。目前的系统增加了悬挂模块,增加了无人机的重量并降低了其耐用性。因此,在我们的工作中,当前研究的关键问题包括:“我们如何开发一个能够进行多模态飞行和动态悬挂的单个机器人?”、“我们如何动态和可靠地检测并登陆可攀爬的物体?”以及“对我们来说重要的是收集哪些重要的空间和时间数据?”
https://arxiv.org/abs/2405.10043
The main challenge in learning image-conditioned robotic policies is acquiring a visual representation conducive to low-level control. Due to the high dimensionality of the image space, learning a good visual representation requires a considerable amount of visual data. However, when learning in the real world, data is expensive. Sim2Real is a promising paradigm for overcoming data scarcity in the real-world target domain by using a simulator to collect large amounts of cheap data closely related to the target task. However, it is difficult to transfer an image-conditioned policy from sim to real when the domains are very visually dissimilar. To bridge the sim2real visual gap, we propose using natural language descriptions of images as a unifying signal across domains that captures the underlying task-relevant semantics. Our key insight is that if two image observations from different domains are labeled with similar language, the policy should predict similar action distributions for both images. We demonstrate that training the image encoder to predict the language description or the distance between descriptions of a sim or real image serves as a useful, data-efficient pretraining step that helps learn a domain-invariant image representation. We can then use this image encoder as the backbone of an IL policy trained simultaneously on a large amount of simulated and a handful of real demonstrations. Our approach outperforms widely used prior sim2real methods and strong vision-language pretraining baselines like CLIP and R3M by 25 to 40%.
在学习图像条件机器人策略的主要挑战是获取一个有助于低级控制的视觉表示。由于图像空间的维度很高,获得良好的视觉表示需要大量的视觉数据。然而,在现实世界中,数据很昂贵。Sim2Real 是一种有前途的方法,通过在真实世界目标领域使用模拟器收集与目标任务密切相关的大量廉价数据,来克服真实世界数据稀缺的问题。然而,在领域之间具有非常视觉差异时,从Sim到Real的图像条件策略很难进行迁移。为了弥合Sim2Real的视觉差距,我们提出使用图像的自然语言描述作为跨领域的统一信号来捕捉相关任务语义。我们的关键洞见是,如果来自不同领域的两个图像观察者被标记为相似的语言,策略应该预测两个图像的相似动作分布。我们证明了将图像编码器预训练为预测图像描述或描述的距离是一种有用的、数据有效的预训练步骤,可以帮助学习领域无关的图像表示。然后,我们可以将这个图像编码器作为同时训练大量模拟和几个真实演示的IL策略的基础。我们的方法在广泛使用的先验Sim2Real方法和强大的视觉语言预训练基线CLIP和R3M上分别提高了25%至40%。
https://arxiv.org/abs/2405.10020
This paper presents the definition of a teleoperated robotic system for non-destructive corrosion inspection of Steel Cylinder Concrete Pipes (SCCP) from the inside. A general description of in-pipe environment and a state of the art of in-pipe navigation solutions are exposed, with a zoom on the characteristics of the SCCP case of interest (pipe dimensions, curves, slopes, humidity, payload, etc.). Then, two specific steel corrosion measurement techniques are described. In order to operate them, several possible architectures of inspection system (mobile platform combined with a robotic inspection manipulator) are presented, depending if the mobile platform is self-centred or not and regarding the robotic manipulator type, namely a basic cylindrical manipulator, a self centred one, or a force-controlled 6 degrees of freedom (DoF) robotic arm. A suitable mechanical architecture is then selected according to SCCP inspection needs. This includes relevant interfaces between the robot, the corrosion measurement Non Destructive Testing (NDT) device and the pipe. Finally, possible future adaptation of the chosen solution are exposed.
本文从内部定义了一种遥控式机器人系统,用于非破坏性钢筋混凝土管(SCCP)的检测性腐蚀检测。首先,介绍了一种通用的管道环境描述和目前管道导航解决方案的状态,重点关注感兴趣的SCCP案例(管道尺寸、曲线、坡度、湿度、荷载等特征)。接着,描述了两种具体的钢腐蚀测量技术。为了操作它们,根据移动平台是否自我中心以及机器人操作器类型(基本圆柱形操作器、自中心操作器或力控制6度自由度(DoF)机器人手臂)提供了多种可能的系统架构。根据SCCP检测需求选择一个合适的机械架构。包括机器人、腐蚀检测非破坏性测试(NDT)设备和管道的相关接口。最后,展示了所选方案的可能未来适应性。
https://arxiv.org/abs/2405.09925
This paper addresses the problem of object-goal navigation in autonomous inspections in real-world environments. Object-goal navigation is crucial to enable effective inspections in various settings, often requiring the robot to identify the target object within a large search space. Current object inspection methods fall short of human efficiency because they typically cannot bootstrap prior and common sense knowledge as humans do. In this paper, we introduce a framework that enables robots to use semantic knowledge from prior spatial configurations of the environment and semantic common sense knowledge. We propose SEEK (Semantic Reasoning for Object Inspection Tasks) that combines semantic prior knowledge with the robot's observations to search for and navigate toward target objects more efficiently. SEEK maintains two representations: a Dynamic Scene Graph (DSG) and a Relational Semantic Network (RSN). The RSN is a compact and practical model that estimates the probability of finding the target object across spatial elements in the DSG. We propose a novel probabilistic planning framework to search for the object using relational semantic knowledge. Our simulation analyses demonstrate that SEEK outperforms the classical planning and Large Language Models (LLMs)-based methods that are examined in this study in terms of efficiency for object-goal inspection tasks. We validated our approach on a physical legged robot in urban environments, showcasing its practicality and effectiveness in real-world inspection scenarios.
本文针对现实环境中自动驾驶检查中的物体目标导航问题进行了研究。物体目标导航对于在各种环境中有效检查物体至关重要,通常需要机器人识别一个大型搜索空间中的目标物体。当前的物体检查方法之所以不能达到人类效率,是因为它们通常无法利用人类在先前的空间配置中具有的语义知识和共同感觉。在本文中,我们引入了一个框架,使机器人可以使用环境先验配置中的语义知识和共同感觉知识。我们提出了SEEK(语义推理物体检查任务)框架,将语义先验知识与机器人的观察相结合以更有效地搜索和导航目标物体。SEEK有两个表示:动态场景图(DSG)和关系语义网络(RSN)。RSN是一个紧凑且实用的模型,用于估计DSG中空间元素中找到目标物体的概率。我们提出了一种新颖的概率规划框架,使用关系语义知识搜索物体。我们的仿真分析表明,SEEK在物体目标检查任务上优于本研究中使用的经典规划和基于大型语言模型的方法。我们在城市环境中使用实物机器人进行了验证,展示了其在现实世界检查场景中的实际性和有效性。
https://arxiv.org/abs/2405.09822
There has been a growing utilization of industrial robots as complementary collaborators for human workers in re-manufacturing sites. Such a human-robot collaboration (HRC) aims to assist human workers in improving the flexibility and efficiency of labor-intensive tasks. In this paper, we propose a human-aware motion planning framework for HRC to effectively compute collision-free motions for manipulators when conducting collaborative tasks with humans. We employ a neural human motion prediction model to enable proactive planning for manipulators. Particularly, rather than blindly trusting and utilizing predicted human trajectories in the manipulator planning, we quantify uncertainties of the neural prediction model to further ensure human safety. Moreover, we integrate the uncertainty-aware prediction into a graph that captures key workspace elements and illustrates their interconnections. Then a graph neural network is leveraged to operate on the constructed graph. Consequently, robot motion planning considers both the dependencies among all the elements in the workspace and the potential influence of future movements of human workers. We experimentally validate the proposed planning framework using a 6-degree-of-freedom manipulator in a shared workspace where a human is performing disassembling tasks. The results demonstrate the benefits of our approach in terms of improving the smoothness and safety of HRC. A brief video introduction of this work is available as the supplemental materials.
随着机器人技术在重新制造现场的人类工人中的应用不断增加,人机协作(HRC)作为一种补充工具,旨在帮助人类工人提高劳动密集型任务的灵活性和效率。在本文中,我们提出了一个关注人类的人机运动规划框架,以便在与人合作任务时计算出不会发生碰撞的机器人运动。我们使用神经人类运动预测模型来实现针对机器人的主动规划。特别地,我们不仅盲目地信任和使用预测的人类轨迹,而且通过量化神经预测模型的不确定性来进一步确保人类的安全。此外,我们将不确定性 awareness 预测整合到一个捕捉关键工作空间元素并说明它们之间联系的图形中。然后利用图神经网络对构建的图形进行操作。因此,机器人运动规划考虑了工作空间中所有元素之间的依赖关系以及未来人类工人移动的可能性。我们在一个共享工作空间中使用6自由度机器人进行实验性验证,该空间中一名人类正在进行拆卸任务。结果表明,我们方法在提高HRC的平滑性和安全性方面具有优势。本工作的简要视频介绍作为补充资料附上。
https://arxiv.org/abs/2405.09779
This paper proposes a method to combine reinforcement learning (RL) and imitation learning (IL) using a dynamic, performance-based modulation over learning signals. The proposed method combines RL and behavioral cloning (IL), or corrective feedback in the action space (interactive IL/IIL), by dynamically weighting the losses to be optimized, taking into account the backpropagated gradients used to update the policy and the agent's estimated performance. In this manner, RL and IL/IIL losses are combined by equalizing their impact on the policy's updates, while modulating said impact such that IL signals are prioritized at the beginning of the learning process, and as the agent's performance improves, the RL signals become progressively more relevant, allowing for a smooth transition from pure IL/IIL to pure RL. The proposed method is used to learn local planning policies for mobile robots, synthesizing IL/IIL signals online by means of a scripted policy. An extensive evaluation of the application of the proposed method to this task is performed in simulations, and it is empirically shown that it outperforms pure RL in terms of sample efficiency (achieving the same level of performance in the training environment utilizing approximately 4 times less experiences), while consistently producing local planning policies with better performance metrics (achieving an average success rate of 0.959 in an evaluation environment, outperforming pure RL by 12.5% and pure IL by 13.9%). Furthermore, the obtained local planning policies are successfully deployed in the real world without performing any major fine tuning. The proposed method can extend existing RL algorithms, and is applicable to other problems for which generating IL/IIL signals online is feasible. A video summarizing some of the real world experiments that were conducted can be found in this https URL.
本文提出了一种结合强化学习(RL)和模仿学习(IL)的方法,通过动态地对学习信号进行性能为基础的调节。所提出的方法将RL和行为复制(IL)相结合,或者在动作空间中使用交互式IL/IIL中的纠正反馈(interactive IL/IIL),通过动态地加权需要优化的损失,考虑到用于更新策略的回溯梯度以及代理器的估计绩效。这样,通过平衡它们对策略更新影响的等效性,RL和IL/IIL损失得以结合。在某种程度上,通过动态地加权它们对策略更新的影响,使得IL信号在学习过程开始时具有优先级,而随着代理器绩效的提高,RL信号逐渐变得更加相关,从而实现从纯IL/IIL到纯RL的平滑过渡。所提出的方法用于学习移动机器人的局部规划策略,通过编写脚本策略在线合成IL/IIL信号。对所提出方法在 this任务上的应用进行了广泛的仿真评估,实验结果表明,与纯RL相比,其在样本效率方面具有优势(在训练环境中实现与约4倍经验相同的性能),同时,它还 consistently产生具有更好性能指标的局部规划策略(在评估环境中,平均成功率为0.959,比纯RL高12.5%,比纯IL高13.9%)。此外,所获得的局部规划策略在实际环境中成功地得到了部署,没有进行任何重大微调。所提出的方法可以扩展现有的RL算法,并适用于其他可以通过在线生成IL/IIL信号的问题。可以在这个链接中找到一个总结了一些真实世界实验的视频:https://www.youtube.com/watch?v=
https://arxiv.org/abs/2405.09760
3D cameras have emerged as a critical source of information for applications in robotics and autonomous driving. These cameras provide robots with the ability to capture and utilize point clouds, enabling them to navigate their surroundings and avoid collisions with other objects. However, current standard camera evaluation metrics often fail to consider the specific application context. These metrics typically focus on measures like Chamfer distance (CD) or Earth Mover's Distance (EMD), which may not directly translate to performance in real-world scenarios. To address this limitation, we propose a novel metric for point cloud evaluation, specifically designed to assess the suitability of 3D cameras for the critical task of collision avoidance. This metric incorporates application-specific considerations and provides a more accurate measure of a camera's effectiveness in ensuring safe robot navigation.
3D相机已成为机器人学和自动驾驶应用程序的关键信息来源。这些相机使机器人能够捕获并利用点云,从而使它们能够感知周围环境并避免与其他物体发生碰撞。然而,当前的标准相机评估指标通常无法考虑特定应用程序的上下文。这些指标通常关注诸如Chamfer距离(CD)或Earth Mover's Distance(EMD)等指标,这些指标可能无法直接转化为现实场景中的性能。为了克服这一局限,我们提出了一个专门用于点云评估的新指标,该指标旨在评估3D相机在碰撞避免关键任务中的适用性。该指标考虑了应用程序特定的因素,并提供了一种更准确地衡量相机确保安全机器人导航有效性的方法。
https://arxiv.org/abs/2405.09755
For robotics applications where there is a limited number of (typically ego-centric) views, parametric representations such as neural radiance fields (NeRFs) generalize better than non-parametric ones such as Gaussian splatting (GS) to views that are very different from those in the training data; GS however can render much faster than NeRFs. We develop a procedure to convert back and forth between the two. Our approach achieves the best of both NeRFs (superior PSNR, SSIM, and LPIPS on dissimilar views, and a compact representation) and GS (real-time rendering and ability for easily modifying the representation); the computational cost of these conversions is minor compared to training the two from scratch.
在机器人应用中,通常存在有限数量的(通常以自我为中心)视角,参数表示 such as 神经辐射场 (NeRFs) 比非参数表示 such as 高斯镶嵌(GS)更适合训练数据中的非常不同的视角;然而,GS 渲染速度要快得多。我们开发了一个转换过程,在两个之间进行相互转换。我们的方法实现了 NeRFs 和 GS 的最佳特点(在异质视图中的卓越 PSNR、SSIM 和 LPIPS,以及紧凑的表示);这些转换的计算成本与从头训练两者相比很小。
https://arxiv.org/abs/2405.09717
Spoken language interaction is at the heart of interpersonal communication, and people flexibly adapt their speech to different individuals and environments. It is surprising that robots, and by extension other digital devices, are not equipped to adapt their speech and instead rely on fixed speech parameters, which often hinder comprehension by the user. We conducted a speech comprehension study involving 39 participants who were exposed to different environmental and contextual conditions. During the experiment, the robot articulated words using different vocal parameters, and the participants were tasked with both recognising the spoken words and rating their subjective impression of the robot's speech. The experiment's primary outcome shows that spaces with good acoustic quality positively correlate with intelligibility and user experience. However, increasing the distance between the user and the robot exacerbated the user experience, while distracting background sounds significantly reduced speech recognition accuracy and user satisfaction. We next built an adaptive voice for the robot. For this, the robot needs to know how difficult it is for a user to understand spoken language in a particular setting. We present a prediction model that rates how annoying the ambient acoustic environment is and, consequentially, how hard it is to understand someone in this setting. Then, we develop a convolutional neural network model to adapt the robot's speech parameters to different users and spaces, while taking into account the influence of ambient acoustics on intelligibility. Finally, we present an evaluation with 27 users, demonstrating superior intelligibility and user experience with adaptive voice parameters compared to fixed voice.
口语交互是人际交往的核心,而人们会根据不同的个体和环境灵活调整自己的讲话。令人惊讶的是,机器人以及其他数字设备并没有具备适应讲话的能力,而是依赖固定的讲话参数,这往往阻碍了用户的理解。我们对39名参与者进行了一项口语理解研究,让他们暴露于不同的环境和情境中。在实验过程中,机器人使用不同的语音参数表达单词,参与者被要求识别出听到的单词,并对机器人的讲话进行主观评价。实验的主要结果表明,具有良好的声学质量的空间与可理解性和用户体验正相关。然而,用户与机器人之间的距离增加会加剧用户体验,而分散的背景声音会显著降低语音识别准确性和用户满意度。接下来,我们为机器人构建了一个自适应的语音。为此,机器人需要知道用户在特定环境中理解口语语言的困难程度。我们提出了一种预测模型,用于评估环境声学对可理解性的影响程度,从而对机器人的讲话参数进行调整。最后,我们展示了使用自适应语音参数的评估结果,证明了与固定语音相比,具有更好的智能度和用户体验。
https://arxiv.org/abs/2405.09708