Abstract
Dual-arm robots have great application prospects in intelligent manufacturing due to their human-like structure when deployed with advanced intelligence algorithm. However, the previous visuomotor policy suffers from perception deficiencies in environments where features of images are impaired by the various conditions, such as abnormal lighting, occlusion and shadow etc. The Focal CVAE framework is proposed for RGB-D multi-modal data fusion to address this challenge. In this study, a mixed focal attention module is designed for the fusion of RGB images containing color features and depth images containing 3D shape and structure information. This module highlights the prominent local features and focuses on the relevance of RGB and depth via cross-attention. A saliency attention module is proposed to improve its computational efficiency, which is applied in the encoder and the decoder of the framework. We illustrate the effectiveness of the proposed method via extensive simulation and experiments. It's shown that the performances of bi-manipulation are all significantly improved in the four real-world tasks with lower computational cost. Besides, the robustness is validated through experiments under different scenarios where there is a perception deficiency problem, demonstrating the feasibility of the method.
Abstract (translated)
双臂机器人由于其具有先进智能算法部署时的人性化结构,在智能制造领域具有巨大的应用潜力。然而,在部署过程中,由于图像特征受到各种条件(如异常照明、遮挡和阴影等)的影响,前可视运动策略存在感知缺陷。为了应对这一挑战,我们提出了Focal CVAE框架来解决这一问题。 在本研究中,我们设计了一个混合焦注意模块,用于融合包含颜色特征的RGB图像和包含3D形状和结构信息的深度图像。该模块突出了显著的局部特征,并关注RGB和深度之间的跨注意关系。为了提高计算效率,我们提出了一个欠拟合注意力模块,该模块应用于框架的编码器和解码器。通过广泛的仿真和实验,我们证明了所提出方法的有效性。实验结果表明,在四个现实世界的任务中,双臂机器人的性能都有显著的提高,且计算成本较低。此外,通过在不同情景下的实验验证了方法的稳健性,证明了该方法的可行性。
URL
https://arxiv.org/abs/2404.17811