Abstract
With the current demand for automation in the agro-food industry, accurately detecting and localizing relevant objects in 3D is essential for successful robotic operations. However, this is a challenge due the presence of occlusions. Multi-view perception approaches allow robots to overcome occlusions, but a tracking component is needed to associate the objects detected by the robot over multiple viewpoints. Multi-object tracking (MOT) algorithms can be categorized between two-stage and single-stage methods. Two-stage methods tend to be simpler to adapt and implement to custom applications, while single-stage methods present a more complex end-to-end tracking method that can yield better results in occluded situations at the cost of more training data. The potential advantages of single-stage methods over two-stage methods depends on the complexity of the sequence of viewpoints that a robot needs to process. In this work, we compare a 3D two-stage MOT algorithm, 3D-SORT, against a 3D single-stage MOT algorithm, MOT-DETR, in three different types of sequences with varying levels of complexity. The sequences represent simpler and more complex motions that a robot arm can perform in a tomato greenhouse. Our experiments in a tomato greenhouse show that the single-stage algorithm consistently yields better tracking accuracy, especially in the more challenging sequences where objects are fully occluded or non-visible during several viewpoints.
Abstract (translated)
随着农业食品工业中自动化需求的增加,准确地检测和定位相关物体在3D中的关键是成功的机器人操作。然而,由于存在遮挡,这是一个挑战。多视角感知方法允许机器人克服遮挡,但需要一个跟踪组件来将机器人检测到的物体与多个视角相关联。多对象跟踪(MOT)算法可以分为两阶段和单阶段方法。两阶段方法通常更容易适应定制应用程序,而单阶段方法则呈现出了更复杂的端到端跟踪方法,在遮挡情况下可以获得更好的结果,但需要更多的训练数据。单阶段方法相对于两阶段方法的潜在优势取决于机器人需要处理视点的序列的复杂程度。在本研究中,我们比较了3D两阶段MOT算法(3D-SORT)与3D单阶段MOT算法(MOT-DETR)在三种不同复杂程度的序列中的效果。这些序列代表机器人手臂在番茄温室中可以执行的更简单和更复杂的动作。我们在番茄温室中的实验结果表明,单阶段算法在跟踪准确性方面始终优于双阶段算法,尤其是在更具有挑战性的序列中,对象在多个视角中都被完全遮挡或不可见的情况下。
URL
https://arxiv.org/abs/2404.12963