Tensegrity robots, characterized by a synergistic assembly of rigid rods and elastic cables, form robust structures that are resistant to impacts. However, this design introduces complexities in kinematics and dynamics, complicating control and state estimation. This work presents a novel proprioceptive state estimator for tensegrity robots. The estimator initially uses the geometric constraints of 3-bar prism tensegrity structures, combined with IMU and motor encoder measurements, to reconstruct the robot's shape and orientation. It then employs a contact-aided invariant extended Kalman filter with forward kinematics to estimate the global position and orientation of the tensegrity robot. The state estimator's accuracy is assessed against ground truth data in both simulated environments and real-world tensegrity robot applications. It achieves an average drift percentage of 4.2%, comparable to the state estimation performance of traditional rigid robots. This state estimator advances the state of the art in tensegrity robot state estimation and has the potential to run in real-time using onboard sensors, paving the way for full autonomy of tensegrity robots in unstructured environments.
张力整体机器人由刚性杆和弹性缆绳的协同组合构成,形成能够抵抗冲击的坚固结构。然而,这种设计给运动学和动力学带来了复杂性,使控制和状态估计变得更加困难。本研究提出了一种针对张力整体机器人的新型自感知状态估计算法。该算法首先利用三棱柱张力整体结构的几何约束,并结合惯性测量单元(IMU)和电机编码器的测量数据来重构机器人形状与姿态。接着,它采用接触辅助不变扩展卡尔曼滤波器及正向运动学方法估计张力整体机器人的全局位置与姿态。通过将状态估计算法的结果与模拟环境和实际张力整体机器人应用中的真实数据进行对比评估其准确性,结果表明该算法的平均漂移百分比为4.2%,这一性能与传统刚性机器人的状态估计性能相当。这项状态估计算法推动了张力整体机器人状态估计技术的发展,并有望通过机载传感器实现实时运行,从而为进一步实现无结构环境中的全自主操作奠定了基础。
https://arxiv.org/abs/2410.24226
Powered ankle-foot prostheses can often reduce the energy cost of walking by assisting with push-off. However, focus on providing mechanical work may lead to ignoring or exacerbating common issues with chronic pain, irritation, pressure ulcer development, and eventual osteoarthritis in persons with amputation. This paper presents the design and validation of a novel transtibial prosthesis informed by predictive biomechanical simulations of gait which minimize a combination of user effort and interaction loading from the prosthesis socket. From these findings, the device was designed with a non-biomimetic anterior-posterior translation degree of freedom with a 10 cm range of motion which is primarily position-controlled to change the alignment of the prosthetic foot with the residual limb. The system is both mobile and tethered, with the batteries, actuators, and majority of electronics located in a small backpack. Mechanical loads are transmitted through cables to the prosthesis, minimizing the distal mass carriage required. We measured torque and force sensing accuracy, open loop actuator performance, closed loop torque and position control bandwidth, and torque and position tracking error during walking. The system is capable of producing up to 160 N-m of plantarflexion torque and 394 N of AP translation force with a closed loop control bandwidth of about 7 Hz in both degrees of freedom. Torque tracking during walking was accurate within about 10 N-m but position tracking was substantially affected by phase lag, possibly due to cable slack in the bidirectional mechanism. The prototype was capable of replicating our simulated prosthesis dynamics during gait and offers useful insights into the advantages and the practical considerations of using predictive biomechanical simulation as a design tool for wearable robots.
翻译如下: 由动力驱动的踝足假肢通常通过辅助推进来减少行走时的能量消耗。然而,过分关注机械功输出可能会导致忽视或加剧截肢者常见的慢性疼痛、刺激、压疮发展及最终的骨关节炎问题。本文介绍了设计并验证了一种新型小腿假肢的设计方案,该设计基于最小化用户努力与来自假肢套筒的交互负荷的步态预测生物力学模拟结果。根据这些发现,该设备被设计为具有非仿生前后平移自由度,活动范围为10厘米,并主要通过位置控制来改变假脚与残肢的对齐情况。系统既可移动也可固定,电池、驱动器和大部分电子元件都位于一个小背包内。机械负荷则通过电缆传递到假肢,从而最小化了远端质量负载需求。 我们测量了扭矩和力传感器精度、开环驱动器性能、闭环扭矩与位置控制带宽以及行走过程中的扭矩与位置跟踪误差。该系统能够在两个自由度上以约7赫兹的闭环控制带宽产生高达160牛米的跖屈扭矩及394牛的前后平移力。在步行过程中,扭矩跟踪精度约为10牛米内,但位置跟踪受到相位滞后的影响较大,这可能是因为双向机制中电缆松动所致。该原型能够复制我们模拟中的步态假肢动态,并为利用预测生物力学仿真作为可穿戴机器人设计工具的优势及实际考虑提供了有价值的见解。
https://arxiv.org/abs/2410.24196
Robot learning holds tremendous promise to unlock the full potential of flexible, general, and dexterous robot systems, as well as to address some of the deepest questions in artificial intelligence. However, bringing robot learning to the level of generality required for effective real-world systems faces major obstacles in terms of data, generalization, and robustness. In this paper, we discuss how generalist robot policies (i.e., robot foundation models) can address these challenges, and how we can design effective generalist robot policies for complex and highly dexterous tasks. We propose a novel flow matching architecture built on top of a pre-trained vision-language model (VLM) to inherit Internet-scale semantic knowledge. We then discuss how this model can be trained on a large and diverse dataset from multiple dexterous robot platforms, including single-arm robots, dual-arm robots, and mobile manipulators. We evaluate our model in terms of its ability to perform tasks in zero shot after pre-training, follow language instructions from people and from a high-level VLM policy, and its ability to acquire new skills via fine-tuning. Our results cover a wide variety of tasks, such as laundry folding, table cleaning, and assembling boxes.
机器人学习有望释放灵活、通用和灵巧的机器人系统的全部潜力,并解决人工智能领域一些最深层次的问题。然而,将机器人学习提升到实际系统所需的一般化水平面临着数据、泛化能力和鲁棒性方面的重大障碍。本文讨论了如何通过通用型机器人策略(即机器人基础模型)来应对这些挑战,并探讨如何设计有效的通用型机器人策略以执行复杂且高度灵巧的任务。我们提出了一种基于预训练视觉语言模型(VLM)的新型流匹配架构,以继承互联网规模的语义知识。随后,我们将讨论该模型如何在来自多种灵巧机器人平台的大规模多样化数据集上进行训练,包括单臂机器人、双臂机器人和移动操作器。我们从多个方面评估了我们的模型:预训练后的零样本任务执行能力;遵循人类及高级VLM策略的语音指令的能力;以及通过微调来学习新技能的能力。我们的研究结果涵盖了广泛的领域,如折叠衣物、清洁桌子和组装盒子等任务。
https://arxiv.org/abs/2410.24164
The cooperative driving technology of Connected and Autonomous Vehicles (CAVs) is crucial for improving the efficiency and safety of transportation systems. Learning-based methods, such as Multi-Agent Reinforcement Learning (MARL), have demonstrated strong capabilities in cooperative decision-making tasks. However, existing MARL approaches still face challenges in terms of learning efficiency and performance. In recent years, Large Language Models (LLMs) have rapidly advanced and shown remarkable abilities in various sequential decision-making tasks. To enhance the learning capabilities of cooperative agents while ensuring decision-making efficiency and cost-effectiveness, we propose LDPD, a language-driven policy distillation method for guiding MARL exploration. In this framework, a teacher agent based on LLM trains smaller student agents to achieve cooperative decision-making through its own decision-making demonstrations. The teacher agent enhances the observation information of CAVs and utilizes LLMs to perform complex cooperative decision-making reasoning, which also leverages carefully designed decision-making tools to achieve expert-level decisions, providing high-quality teaching experiences. The student agent then refines the teacher's prior knowledge into its own model through gradient policy updates. The experiments demonstrate that the students can rapidly improve their capabilities with minimal guidance from the teacher and eventually surpass the teacher's performance. Extensive experiments show that our approach demonstrates better performance and learning efficiency compared to baseline methods.
连接与自动驾驶车辆(CAVs)的协同驾驶技术对于提高交通系统的效率和安全性至关重要。基于学习的方法,如多智能体强化学习(MARL),在协作决策任务中表现出强大的能力。然而,现有的MARL方法在学习效率和性能方面仍面临挑战。近年来,大型语言模型(LLMs)迅速发展,在各种顺序决策任务中展示了卓越的能力。为了增强协同代理的学习能力,并确保决策的高效性和成本效益,我们提出了LDPD(Language-Driven Policy Distillation),一种基于语言驱动策略蒸馏的方法来指导MARL探索。在这个框架中,一个基于LLM的教师代理通过自己的决策演示训练较小的学生代理以实现协作决策。教师代理增强了CAVs的观测信息,并利用LLMs进行复杂的协同决策推理,同时借助精心设计的决策工具实现专家级别的决策,提供高质量的教学体验。学生代理随后通过梯度策略更新将教师的先验知识提炼到其自身模型中。实验表明,在教师的最少指导下,学生能够快速提升能力并最终超越教师的表现。广泛的实验结果显示,我们的方法在性能和学习效率上优于基线方法。
https://arxiv.org/abs/2410.24152
In this work, we introduce general purpose touch representations for the increasingly accessible class of vision-based tactile sensors. Such sensors have led to many recent advances in robot manipulation as they markedly complement vision, yet solutions today often rely on task and sensor specific handcrafted perception models. Collecting real data at scale with task centric ground truth labels, like contact forces and slip, is a challenge further compounded by sensors of various form factor differing in aspects like lighting and gel markings. To tackle this we turn to self-supervised learning (SSL) that has demonstrated remarkable performance in computer vision. We present Sparsh, a family of SSL models that can support various vision-based tactile sensors, alleviating the need for custom labels through pre-training on 460k+ tactile images with masking and self-distillation in pixel and latent spaces. We also build TacBench, to facilitate standardized benchmarking across sensors and models, comprising of six tasks ranging from comprehending tactile properties to enabling physical perception and manipulation planning. In evaluations, we find that SSL pre-training for touch representation outperforms task and sensor-specific end-to-end training by 95.1% on average over TacBench, and Sparsh (DINO) and Sparsh (IJEPA) are the most competitive, indicating the merits of learning in latent space for tactile images. Project page: this https URL
在这项工作中,我们介绍了针对日益普及的基于视觉的触觉传感器类别的一般用途触摸表示。这类传感器显著补充了视觉能力,并已推动机器人操作领域的许多近期进展,但现有的解决方案往往依赖于特定任务和传感器的手工感知模型。收集具有任务中心真实标签(如接触力和滑动)的大规模实际数据是一个挑战,这一问题因不同形式因素的传感器在光照和凝胶标记等方面的差异而变得更加复杂。为了解决这个问题,我们转向了自监督学习(SSL),这种技术已在计算机视觉领域表现出色。我们提出了Sparsh,一个支持各种基于视觉触觉传感器的SSL模型家族,通过在460,000多张触摸图像上进行掩码和自我蒸馏预训练来缓解定制标签的需求,这些训练涵盖了像素空间和潜在空间。此外,我们构建了TacBench,以促进跨传感器和模型的标准基准测试,其中包括六个任务,从理解触觉特性到支持物理感知和操作规划。在评估中,我们发现用于触摸表示的SSL预训练比针对特定任务和传感器进行端到端训练平均高出95.1%的性能,并且Sparsh (DINO) 和 Sparsh (IJEPA) 是最具有竞争力的,这表明了在触觉图像潜在空间学习的优势。项目页面: 这个 https URL
https://arxiv.org/abs/2410.24090
Zero-Shot Object Goal Navigation (ZS-OGN) enables robots or agents to navigate toward objects of unseen categories without object-specific training. Traditional approaches often leverage categorical semantic information for navigation guidance, which struggles when only objects are partially observed or detailed and functional representations of the environment are lacking. To resolve the above two issues, we propose \textit{Geometric-part and Affordance Maps} (GAMap), a novel method that integrates object parts and affordance attributes as navigation guidance. Our method includes a multi-scale scoring approach to capture geometric-part and affordance attributes of objects at different scales. Comprehensive experiments conducted on HM3D and Gibson benchmark datasets demonstrate improvements in Success Rate and Success weighted by Path Length, underscoring the efficacy of our geometric-part and affordance-guided navigation approach in enhancing robot autonomy and versatility, without any additional object-specific training or fine-tuning with the semantics of unseen objects and/or the locomotions of the robot.
零样本物体目标导航(ZS-OGN)使机器人或代理能够导航到未见过类别的物体,而无需进行特定于物体的训练。传统的方法通常利用类别语义信息作为导航指导,在仅部分观察到对象或环境缺乏详细和功能表示的情况下会遇到困难。为了解决上述两个问题,我们提出了一种新颖的方法——*几何部件与效用图*(GAMap),该方法将物体的部分和效用属性集成作为导航引导。我们的方法包括多尺度评分方法,在不同尺度上捕捉物体的几何部件和效用属性。在HM3D和Gibson基准数据集上的全面实验表明,成功率达到提高,并且通过路径长度加权的成功率也有所提升,这强调了我们基于几何部件和效用引导的导航方法在增强机器人自主性和灵活性方面的有效性,无需任何额外特定于物体的训练或利用未见过物体语义信息以及/或者机器人运动学进行微调。
https://arxiv.org/abs/2410.23978
Recent advances in Large Language Models (LLMs) have helped facilitate exciting progress for robotic planning in real, open-world environments. 3D scene graphs (3DSGs) offer a promising environment representation for grounding such LLM-based planners as they are compact and semantically rich. However, as the robot's environment scales (e.g., number of entities tracked) and the complexity of scene graph information increases (e.g., maintaining more attributes), providing the 3DSG as-is to an LLM-based planner quickly becomes infeasible due to input token count limits and attentional biases present in LLMs. Inspired by the successes of Retrieval-Augmented Generation (RAG) methods that retrieve query-relevant document chunks for LLM question and answering, we adapt the paradigm for our embodied domain. Specifically, we propose a 3D scene subgraph retrieval framework, called EmbodiedRAG, that we augment an LLM-based planner with for executing natural language robotic tasks. Notably, our retrieved subgraphs adapt to changes in the environment as well as changes in task-relevancy as the robot executes its plan. We demonstrate EmbodiedRAG's ability to significantly reduce input token counts (by an order of magnitude) and planning time (up to 70% reduction in average time per planning step) while improving success rates on AI2Thor simulated household tasks with a single-arm, mobile manipulator. Additionally, we implement EmbodiedRAG on a quadruped with a manipulator to highlight the performance benefits for robot deployment at the edge in real environments.
大型语言模型(LLMs)在真实开放世界环境中的机器人规划方面取得了令人兴奋的进步。3D场景图(3DSGs)因其紧凑性和语义丰富性,为基于LLM的计划者提供了一种有前景的环境表示方式。然而,随着机器人环境规模的扩大(例如,追踪实体的数量增加)和场景图信息复杂性的提升(例如,保持更多属性),直接将3DSG提供给基于LLM的计划者变得不可行,这是因为输入令牌数量限制以及LLMs中存在的注意力偏差所致。受到检索增强生成(RAG)方法的成功启发——这些方法通过检索与查询相关的文档片段来支持LLM的问题回答能力,我们对这一范式进行了调整以适用于我们的具身化领域。具体来说,我们提出了一种3D场景子图检索框架,称为EmbodiedRAG,并将其应用于基于LLM的计划者中,以便执行自然语言机器人任务。值得注意的是,随着机器人执行其计划过程中的环境变化及任务相关性改变,所检索到的子图也能做出适应。我们在AI2Thor模拟的家庭环境中使用单臂移动操作器展示了EmbodiedRAG能够显著减少输入令牌数量(量级降低)和规划时间(最多平均每个规划步骤的时间减少了70%),同时提高了成功执行率。此外,我们还在带有操作器的四足机器人上实现了EmbodiedRAG,以突出其在真实环境中部署机器人时的性能优势。
https://arxiv.org/abs/2410.23968
Observational learning is a promising approach to enable people without expertise in programming to transfer skills to robots in a user-friendly manner, since it mirrors how humans learn new behaviors by observing others. Many existing methods focus on instructing robots to mimic human trajectories, but motion-level strategies often pose challenges in skills generalization across diverse environments. This paper proposes a novel framework that allows robots to achieve a \textit{higher-level} understanding of human-demonstrated manual tasks recorded in RGB videos. By recognizing the task structure and goals, robots generalize what observed to unseen scenarios. We found our task representation on Shannon's Information Theory (IT), which is applied for the first time to manual tasks. IT helps extract the active scene elements and quantify the information shared between hands and objects. We exploit scene graph properties to encode the extracted interaction features in a compact structure and segment the demonstration into blocks, streamlining the generation of Behavior Trees for robot replicas. Experiments validated the effectiveness of IT to automatically generate robot execution plans from a single human demonstration. Additionally, we provide HANDSOME, an open-source dataset of HAND Skills demOnstrated by Multi-subjEcts, to promote further research and evaluation in this field.
观察学习是一种有前景的方法,能够使没有编程专业知识的人以用户友好的方式将技能转移到机器人上,因为它反映了人类通过观察他人来学习新行为的方式。许多现有的方法侧重于指导机器人模仿人类的轨迹,但基于运动级别的策略在跨不同环境推广技能时常常会遇到挑战。本文提出了一种新的框架,允许机器人实现对记录在RGB视频中的人类演示手动任务的高层次理解。通过识别任务结构和目标,机器人能够将所观察到的内容推广至未见过的情景。我们的任务表示基于香农的信息理论(IT),这是首次将其应用于手动任务。信息论帮助提取场景中的活跃元素,并量化手与物体之间共享的信息量。我们利用场景图的属性以紧凑的形式编码提取的交互特征,并将演示分段,简化为机器人复制的行为树生成过程。实验验证了从单一的人类演示自动为机器人生成执行计划时IT的有效性。此外,我们提供了一个开放源码数据集HANDSOME(由多主体展示的手动技能),旨在促进该领域的进一步研究和评估。
https://arxiv.org/abs/2410.23963
A new disturbance observer based control scheme is developed for a quadrotor under the concurrent disturbances from a lightweight elastic tether cable and a lumped vertical disturbance. This elastic tether is unusual as it creates a disturbance proportional to the multicopter's translational movement. This paper takes an observer-based approach to estimate the stiffness coefficient of the cable and uses the system model to update the estimates of the external forces, which are then compensated in the control action. Given that the tethered cable force affects both horizontal channels of the quadrotor and is also coupled with the vertical channel, the proposed disturbance observer is constructed to exploit the redundant measurements across all three channels to jointly estimate the cable stiffness and the vertical disturbance. A pseudo-inverse method is used to determine the observer gain functions, such that the estimation of the two quantities is decoupled and stable. Compared to standard disturbance observers which assume nearly constant disturbances, the proposed approach can quickly adjust its total force estimate as the tethered quadrotor changes its position or tautness of the tether. This is applied to two experiments - a tracking performance test where the multicopter moves under a constant tether strain, and an object extraction test. In the second test, the multicopter manipulates a nonlinear mechanism mimicking the extraction of a wedged object. In both cases, the proposed approach shows significant improvement over standard Disturbance Observer and Extended State Observer approaches. A video summary of the experiments can be found at this https URL.
一种基于新型干扰观测器的控制方案被开发出来,用于处理四旋翼飞行器在轻质弹性牵引绳和集中垂直干扰共同作用下的情况。这种弹性牵引绳非常独特,它产生的扰动与多旋翼飞机的平移运动成正比。本文采用了一种基于观测器的方法来估计电缆的刚度系数,并使用系统模型更新外部力的估计值,然后在控制动作中进行补偿。由于牵引绳的力量影响四旋翼飞行器的两个水平通道并且也与垂直通道耦合,因此提出了一个干扰观测器,利用所有三个通道上的冗余测量数据共同估算电缆刚度和垂直扰动。使用伪逆方法确定观测器增益函数,使得对这两个量的估计是解耦且稳定的。与假设几乎恒定扰动的标准干扰观测器相比,所提出的方法可以快速调整其总力估计值,以应对牵引绳四旋翼飞行器位置或牵引绳紧绷程度的变化。此方案应用于两个实验中——一个跟踪性能测试,在该测试中多旋翼飞机在恒定的牵引应变下移动;另一个是物体提取测试,在这个测试中多旋翼操纵一个非线性机制来模拟楔入物的提取过程。在这两种情况下,所提出的方法都显著优于标准干扰观测器和扩展状态观测器方法。实验视频摘要可以在提供的链接中找到:[https URL]。
https://arxiv.org/abs/2410.23929
Robotic insertion tasks remain challenging due to uncertainties in perception and the need for precise control, particularly in unstructured environments. While humans seamlessly combine vision and touch for such tasks, effectively integrating these modalities in robotic systems is still an open problem. Our work presents an extensive analysis of the interplay between visual and tactile feedback during dexterous insertion tasks, showing that tactile sensing can greatly enhance success rates on challenging insertions with tight tolerances and varied hole orientations that vision alone cannot solve. These findings provide valuable insights for designing more effective multi-modal robotic control systems and highlight the critical role of tactile feedback in contact-rich manipulation tasks.
机器人插入任务由于感知中的不确定性以及对精确控制的需求,特别是在非结构化环境中,依然具有挑战性。虽然人类能够无缝地结合视觉和触觉来完成此类任务,但如何在机器人系统中有效整合这些模态仍然是一个未解决的问题。我们的研究详细分析了灵巧插入任务中视觉与触觉反馈之间的相互作用,表明触觉传感可以显著提高在高精度要求且孔位方向多变的挑战性插入任务中的成功率,这仅凭视觉是无法解决的。这些发现为设计更有效的多模态机器人控制系统提供了宝贵的见解,并强调了接触密集型操作任务中触觉反馈的关键作用。
https://arxiv.org/abs/2410.23860
The decline of bee and wind-based pollination systems in greenhouses due to controlled environments and limited access has boost the importance of finding alternative pollination methods. Robotic based pollination systems have emerged as a promising solution, ensuring adequate crop yield even in challenging pollination scenarios. This paper presents a comprehensive review of the current robotic-based pollinators employed in greenhouses. The review categorizes pollinator technologies into major categories such as air-jet, water-jet, linear actuator, ultrasonic wave, and air-liquid spray, each suitable for specific crop pollination requirements. However, these technologies are often tailored to particular crops, limiting their versatility. The advancement of science and technology has led to the integration of automated pollination technology, encompassing information technology, automatic perception, detection, control, and operation. This integration not only reduces labor costs but also fosters the ongoing progress of modern agriculture by refining technology, enhancing automation, and promoting intelligence in agricultural practices. Finally, the challenges encountered in design of pollinator are addressed, and a forward-looking perspective is taken towards future developments, aiming to contribute to the sustainable advancement of this technology.
由于受控环境和有限的访问,温室中蜜蜂和风力授粉系统的衰退促使人们更加重视寻找替代授粉方法。基于机器人的授粉系统作为有前景的解决方案应运而生,在具有挑战性的授粉场景下仍能确保作物产量充足。本文对目前在温室中使用的基于机器人的授粉器进行了全面回顾。该综述将授粉技术分为几大类,包括气流喷射、水柱喷射、直线执行器、超声波和气液喷雾,每种都适用于特定的作物授粉需求。然而,这些技术往往针对特定作物设计,限制了其灵活性。科学和技术的进步促进了自动授粉技术的发展,这一技术整合了信息技术、自动感知、检测、控制和操作等功能。这种集成不仅降低了劳动力成本,还通过改进技术和提升自动化水平推动现代农业的进步,促进农业实践中的智能化。最后,文章讨论了在授粉器设计中遇到的挑战,并展望未来发展方向,旨在为这项技术的可持续进步做出贡献。
https://arxiv.org/abs/2410.23747
This paper underscores the importance of environmental monitoring, and specifically of freshwater ecosystems, which play a critical role in sustaining life and global economy. Despite their importance, insufficient data availability prevents a comprehensive understanding of these ecosystems, thereby impeding informed decision-making concerning their preservation. Aerial-aquatic robots are identified as effective tools for freshwater sensing, offering rapid deployment and avoiding the need of using ships and manned teams. To advance the field of aerial aquatic robots, this paper conducts a comprehensive review of air-water transitions focusing on the water entry strategy of existing prototypes. This analysis also highlights the safety risks associated with each transition and proposes a set of design requirements relating to robots' tasks, mission objectives, and safety measures. To further explore the proposed design requirements, we present a novel robot with VTOL capability, enabling seamless air water transitions.
本文强调了环境监测的重要性,特别是淡水生态系统的作用,这些系统在维持生命和全球经济中扮演着至关重要的角色。尽管如此重要,但由于数据不足,我们无法全面理解这些生态系统,这妨碍了有关其保护的知情决策。空中-水下机器人被认定为淡水感测的有效工具,能够快速部署,并避免使用船只和人工团队的需要。为了推动空水机器人的发展,本文对空气与水之间的转换进行了全面回顾,重点关注现有原型的入水策略。该分析还强调了每次转换所涉及的安全风险,并提出了与机器人任务、使命目标及安全措施相关的系列设计要求。为进一步探讨所提出的这些设计要求,我们介绍了一种具备垂直起降(VTOL)能力的新颖机器人,可实现无缝空水过渡。
https://arxiv.org/abs/2410.23722
This work explores conditions under which multi-finger grasping algorithms can attain robust sim-to-real transfer. While numerous large datasets facilitate learning generative models for multi-finger grasping at scale, reliable real-world dexterous grasping remains challenging, with most methods degrading when deployed on hardware. An alternate strategy is to use discriminative grasp evaluation models for grasp selection and refinement, conditioned on real-world sensor measurements. This paradigm has produced state-of-the-art results for vision-based parallel-jaw grasping, but remains unproven in the multi-finger setting. In this work, we find that existing datasets and methods have been insufficient for training discriminitive models for multi-finger grasping. To train grasp evaluators at scale, datasets must provide on the order of millions of grasps, including both positive and negative examples, with corresponding visual data resembling measurements at inference time. To that end, we release a new, open-source dataset of 3.5M grasps on 4.3K objects annotated with RGB images, point clouds, and trained NeRFs. Leveraging this dataset, we train vision-based grasp evaluators that outperform both analytic and generative modeling-based baselines on extensive simulated and real-world trials across a diverse range of objects. We show via numerous ablations that the key factor for performance is indeed the evaluator, and that its quality degrades as the dataset shrinks, demonstrating the importance of our new dataset. Project website at: this https URL.
这项工作探讨了多指抓取算法在何种条件下能够实现稳健的模拟到现实(sim-to-real)迁移。虽然大量数据集促进了大规模多指抓取生成模型的学习,但在实际应用中可靠地执行灵巧抓取仍然充满挑战,大多数方法在硬件部署时表现会下降。一种替代策略是使用判别性抓取评估模型进行抓取选择和优化,该模型基于真实世界的传感器测量结果。这一范式已经为基于视觉的平行颚抓取生成了最先进的成果,但在多指设置中尚未得到证实。在这项工作中,我们发现现有的数据集和方法不足以训练用于多指抓取的判别性模型。为了大规模训练抓取评估器,数据集需要提供数百万次抓取示例,包括正负样本,并且带有在推理时能够与传感器测量结果相似的视觉数据。为此,我们发布了一个新的开源数据集,包含4300个物体上的350万个抓取示例,这些示例标注有RGB图像、点云和训练好的NeRFs。利用这一数据集,我们训练了基于视觉的抓取评估器,在广泛的模拟和真实世界试验中对一系列不同对象的表现超过了分析方法及生成模型基线。通过大量消融研究,我们展示了性能的关键因素确实是评估器,并且其质量会随着数据集规模减小而下降,这突显了我们新数据集的重要性。项目网站:[此https URL]。
https://arxiv.org/abs/2410.23701
Humanoids exhibit a wide variety in terms of joint configuration, actuators, and degrees of freedom, resulting in different achievable movements and tasks for each type. Particularly, musculoskeletal humanoids are developed to closely emulate human body structure and movement functions, consisting of a skeletal framework driven by numerous muscle actuators. The redundant arrangement of muscles relative to the skeletal degrees of freedom has been used to represent the flexible and complex body movements observed in humans. However, due to this flexible body and high degrees of freedom, modeling, simulation, and control become extremely challenging, limiting the feasible movements and tasks. In this study, we integrate the musculoskeletal humanoid Musashi with the wire-driven robot CubiX, capable of connecting to the environment, to form CubiXMusashi. This combination addresses the shortcomings of traditional musculoskeletal humanoids and enables movements beyond the capabilities of other humanoids. CubiXMusashi connects to the environment with wires and drives by winding them, successfully achieving movements such as pull-up, rising from a lying pose, and mid-air kicking, which are difficult for Musashi alone. This concept demonstrates that various humanoids, not limited to musculoskeletal humanoids, can mitigate their physical constraints and acquire new abilities by connecting to the environment and driving through wires.
类人机器人在关节配置、执行器和自由度方面表现出广泛的多样性,导致每种类型能够实现的运动和任务各不相同。特别是肌肉骨骼类人机器人被开发出来以密切模仿人体结构和运动功能,包括一个由大量肌肉执行器驱动的骨骼框架。相对于骨骼自由度而言,肌肉冗余的排列方式用于表示人类观察到的灵活而复杂的肢体运动。然而,由于这种灵活性和高自由度的存在,建模、仿真和控制变得极其具有挑战性,限制了可行的运动和任务范围。在本研究中,我们将肌肉骨骼类人机器人Musashi与能够连接环境的线驱动机器人CubiX集成在一起,形成了CubiXMusashi。这一组合解决了传统肌肉骨骼类人机器人的不足之处,并实现了超越其他类人机器能力的新动作。CubiXMusashi通过电缆连接到环境中并通过缠绕这些电缆来驱动,成功实现了引体向上、从躺姿起身以及空中踢腿等对于单独的Musashi而言难度较大的动作。这一概念证明了不仅仅是肌肉骨骼类人机器人,各种类型的类人机器人可以通过连接环境并利用电线进行驱动以减轻其物理限制并获得新的能力。
https://arxiv.org/abs/2410.23682
Careful robot manipulation in every-day cluttered environments requires an accurate understanding of the 3D scene, in order to grasp and place objects stably and reliably and to avoid mistakenly colliding with other objects. In general, we must construct such a 3D interpretation of a complex scene based on limited input, such as a single RGB-D image. We describe SceneComplete, a system for constructing a complete, segmented, 3D model of a scene from a single view. It provides a novel pipeline for composing general-purpose pretrained perception modules (vision-language, segmentation, image-inpainting, image-to-3D, and pose-estimation) to obtain high-accuracy results. We demonstrate its accuracy and effectiveness with respect to ground-truth models in a large benchmark dataset and show that its accurate whole-object reconstruction enables robust grasp proposal generation, including for a dexterous hand.
精确的机器人操作在日常杂乱环境中需要对3D场景有准确的理解,以便稳定可靠地抓取和放置物体,并避免误撞其他物体。通常情况下,我们必须基于有限的输入(如单个RGB-D图像)来构建这种复杂的3D场景解释。本文介绍了SceneComplete系统,该系统可以从单一视角构造一个完整的、分割后的3D场景模型。它提供了一种新的管道,用于组合通用预训练感知模块(视觉-语言处理、分割、图像补全、图像到3D转换和姿态估计),以获得高精度结果。我们通过大型基准数据集中的与真实模型对比验证了其准确性和有效性,并展示了其精准的完整物体重建能力能够生成稳健的抓取提案,包括对灵巧手的操作支持。
https://arxiv.org/abs/2410.23643
The development of large language models and vision-language models (VLMs) has resulted in the increasing use of robotic systems in various fields. However, the effective integration of these models into real-world robotic tasks is a key challenge. We developed a versatile robotic system called SuctionPrompt that utilizes prompting techniques of VLMs combined with 3D detections to perform product-picking tasks in diverse and dynamic environments. Our method highlights the importance of integrating 3D spatial information with adaptive action planning to enable robots to approach and manipulate objects in novel environments. In the validation experiments, the system accurately selected suction points 75.4%, and achieved a 65.0% success rate in picking common items. This study highlights the effectiveness of VLMs in robotic manipulation tasks, even with simple 3D processing.
大型语言模型和视觉-语言模型(VLM)的发展导致了机器人系统在各个领域的广泛应用。然而,将这些模型有效整合到实际的机器人任务中是当前面临的一个关键挑战。我们开发了一个名为SuctionPrompt的多功能机器人系统,该系统利用VLM的提示技术结合3D检测,在多变且动态的环境中执行商品挑选任务。我们的方法突出了将3D空间信息与自适应行动规划相结合的重要性,以使机器人能够接近并在新环境中操作物体。在验证实验中,该系统准确选择了吸盘点75.4%,并且在挑选常见物品时的成功率为65.0%。本研究表明,即使采用简单的3D处理技术,VLMs在机器人操控任务中的有效性也非常显著。
https://arxiv.org/abs/2410.23640
Tiny aerial robots show promise for applications like environmental monitoring and search-and-rescue but face challenges in control due to their limited computing power and complex dynamics. Model Predictive Control (MPC) can achieve agile trajectory tracking and handle constraints. Although current learning-based MPC methods, such as Gaussian Process (GP) MPC, improve control performance by learning residual dynamics, they are computationally demanding, limiting their onboard application on tiny robots. This paper introduces Tiny Learning-Based Model Predictive Control (LB MPC), a novel framework for resource-constrained micro multirotor platforms. By exploiting multirotor dynamics' structure and developing an efficient solver, our approach enables high-rate control at 100 Hz on a Crazyflie 2.1 with a Teensy 4.0 microcontroller. We demonstrate a 23\% average improvement in tracking performance over existing embedded MPC methods, achieving the first onboard implementation of learning-based MPC on a tiny multirotor (53 g).
小型空中机器人在环境监测和搜救等领域显示出巨大的应用潜力,但受限于其计算能力和复杂的动态特性,在控制方面面临着挑战。模型预测控制(MPC)能够实现敏捷的轨迹跟踪并处理约束条件。尽管当前基于学习的MPC方法,如高斯过程(GP)MPC,通过学习残差动力学来提升控制性能,但由于计算需求较高,限制了其在小型机器人上的板载应用。本文介绍了一种名为微型基于学习的模型预测控制(LB MPC)的新框架,适用于资源受限的微多旋翼平台。通过利用多旋翼动力学结构并开发高效的求解器,我们的方法能够在配备Teensy 4.0微控制器的Crazyflie 2.1上实现高达100 Hz的高速率控制。实验结果显示,相较于现有的嵌入式MPC方法,我们的方法在跟踪性能方面平均提升了23%,实现了首个基于学习的MPC在小型多旋翼(53克)上的板载实施。
https://arxiv.org/abs/2410.23634
This paper studies the problem of multi-robot pursuit of how to coordinate a group of defending robots to capture a faster attacker before it enters a protected area. Such operation for defending robots is challenging due to the unknown avoidance strategy and higher speed of the attacker, coupled with the limited communication capabilities of defenders. To solve this problem, we propose a parameterized formation controller that allows defending robots to adapt their formation shape using five adjustable parameters. Moreover, we develop an imitation-learning based approach integrated with model predictive control to optimize these shape parameters. We make full use of these two techniques to enhance the capture capabilities of defending robots through ongoing training. Both simulation and experiment are provided to verify the effectiveness and robustness of our proposed controller. Simulation results show that defending robots can rapidly learn an effective strategy for capturing the attacker, and moreover the learned strategy remains effective across varying numbers of defenders. Experiment results on real robot platforms further validated these findings.
本文研究了多机器人追捕问题,即如何协调一组防御机器人在攻击者进入保护区之前捕捉到一个速度更快的攻击者。对于防御机器人的这种操作来说,由于攻击者的避免策略未知且速度较快,加上防御方通信能力有限,因此具有挑战性。为了解决这一问题,我们提出了一种参数化编队控制器,允许防御机器人使用五个可调参数调整其编队形状。此外,我们开发了一种基于模仿学习并与模型预测控制相结合的方法来优化这些形状参数。我们充分利用这两种技术通过持续训练增强防御机器人的捕捉能力。仿真和实验均提供了验证我们提出控制器的有效性和鲁棒性的结果。仿真结果显示,防御机器人能够快速学会一种有效的捕捉攻击者的策略,并且所学策略在不同数量的防守者之间仍然有效。在真实机器人平台上的实验进一步证实了这些发现。
https://arxiv.org/abs/2410.23586
Different from most of the formation strategies where robots require unique labels to identify topological neighbors to satisfy the predefined shape constraints, we here study the problem of identity-less distributed shape formation in homogeneous swarms, which is rarely studied in the literature. The absence of identities creates a unique challenge: how to design appropriate target formations and local behaviors that are suitable for identity-less formation shape control. To address this challenge, we propose the following novel results. First, to avoid using unique identities, we propose a dynamic formation description method and solve the formation consensus of robots in a locally distributed manner. Second, to handle identity-less distributed formations, we propose a fully distributed control law for homogeneous swarms based on locally sensed information. While the existing methods are applicable to simple cases where the target formation is stationary, ours can tackle more general maneuvering formations such as translation, rotation, or even shape deformation. Both numerical simulation and flight experiment are presented to verify the effectiveness and robustness of our proposed formation strategy.
与大多数需要机器人具有独特标签来识别拓扑邻居以满足预定义形状约束的编队策略不同,我们在这里研究了同质集群中无身份分布式形状形成的课题,这一领域在文献中很少被探讨。身份的缺失带来了独特的挑战:如何设计适合无身份形成控制的目标编队和局部行为。为了解决这个挑战,我们提出了以下创新成果。首先,为了避免使用独特身份,我们提出了一种动态编队描述方法,并通过局部分布式方式解决了机器人之间的编队一致性问题。其次,为了处理无身份的分布式编队,我们基于局部感知信息提出了一种适用于同质集群的完全分布式控制律。虽然现有的方法仅适用于目标编队静止的情况,而我们的方法可以应对更为广泛的机动编队,例如平移、旋转甚至形状变形。通过数值模拟和飞行实验验证了所提出的编队策略的有效性和鲁棒性。
https://arxiv.org/abs/2410.23581
This paper presents a novel reinforcement learning framework for trajectory tracking of unmanned aerial vehicles in cluttered environments using a dual-agent architecture. Traditional optimization methods for trajectory tracking face significant computational challenges and lack robustness in dynamic environments. Our approach employs deep reinforcement learning (RL) to overcome these limitations, leveraging 3D pointcloud data to perceive the environment without relying on memory-intensive obstacle representations like occupancy grids. The proposed system features two RL agents: one for predicting UAV velocities to follow a reference trajectory and another for managing collision avoidance in the presence of obstacles. This architecture ensures real-time performance and adaptability to uncertainties. We demonstrate the efficacy of our approach through simulated and real-world experiments, highlighting improvements over state-of-the-art RL and optimization-based methods. Additionally, a curriculum learning paradigm is employed to scale the algorithms to more complex environments, ensuring robust trajectory tracking and obstacle avoidance in both static and dynamic scenarios.
本文提出了一种新颖的强化学习框架,用于在复杂环境中利用双智能体架构实现无人飞行器的轨迹跟踪。传统的轨迹跟踪优化方法面临着显著的计算挑战,并且缺乏在动态环境中的鲁棒性。我们的方法采用深度强化学习(RL)来克服这些限制,并使用3D点云数据感知环境,而不需要依赖于占用栅格等内存密集型障碍物表示方法。该系统包括两个RL智能体:一个用于预测无人飞行器的速度以跟随参考轨迹,另一个则负责在存在障碍物的情况下进行碰撞避免。这种架构保证了实时性能并适应不确定性。我们通过模拟和实际实验展示了我们的方法的有效性,并强调其优于现有的基于强化学习和优化的方法。此外,还采用了课程学习范式来将算法扩展到更复杂的环境,确保在静态和动态场景中都能实现稳健的轨迹跟踪和障碍物避免。
https://arxiv.org/abs/2410.23571