Abstract
We developed a robust solution for real-time 6D object detection in industrial applications by integrating FoundationPose, SAM2, and LightGlue, eliminating the need for retraining. Our approach addresses two key challenges: the requirement for an initial object mask in the first frame in FoundationPose and issues with tracking loss and automatic rotation for symmetric objects. The algorithm requires only a CAD model of the target object, with the user clicking on its location in the live feed during the initial setup. Once set, the algorithm automatically saves a reference image of the object and, in subsequent runs, employs LightGlue for feature matching between the object and the real-time scene, providing an initial prompt for detection. Tested on the YCB dataset and industrial components such as bleach cleanser and gears, the algorithm demonstrated reliable 6D detection and tracking. By integrating SAM2 and FoundationPose, we effectively mitigated common limitations such as the problem of tracking loss, ensuring continuous and accurate tracking under challenging conditions like occlusion or rapid movement.
Abstract (translated)
我们通过将FoundationPose、SAM2和LightGlue集成开发了一个实时的6D物体检测解决方案,无需重新训练。我们的方法解决了两个关键挑战:在FoundationPose中第一帧需要初始物体掩码,以及对称物体跟踪损失和自动旋转问题。算法只需要目标对象的CAD模型,用户在初始设置过程中点击其位置。一旦设置完成,算法会自动保存物体的参考图像,并在后续运行中使用LightGlue在物体和实时场景之间进行特征匹配,为检测提供初始提示。在YCB数据集和工业组件(如漂白清洁剂和齿轮)上进行了测试,该算法展示了可靠的6D检测和跟踪。通过将SAM2和FoundationPose集成,我们有效地减轻了常见的限制,如跟踪损失问题,确保在具有挑战性的条件(如遮挡或快速运动)下实现连续和准确的跟踪。
URL
https://arxiv.org/abs/2409.19986