Paper Reading AI Learner

A comparison between single-stage and two-stage 3D tracking algorithms for greenhouse robotics

2024-04-19 15:45:41
David Rapado-Rincon, Akshay K. Burusa, Eldert J. van Henten, Gert Kootstra

Abstract

With the current demand for automation in the agro-food industry, accurately detecting and localizing relevant objects in 3D is essential for successful robotic operations. However, this is a challenge due the presence of occlusions. Multi-view perception approaches allow robots to overcome occlusions, but a tracking component is needed to associate the objects detected by the robot over multiple viewpoints. Multi-object tracking (MOT) algorithms can be categorized between two-stage and single-stage methods. Two-stage methods tend to be simpler to adapt and implement to custom applications, while single-stage methods present a more complex end-to-end tracking method that can yield better results in occluded situations at the cost of more training data. The potential advantages of single-stage methods over two-stage methods depends on the complexity of the sequence of viewpoints that a robot needs to process. In this work, we compare a 3D two-stage MOT algorithm, 3D-SORT, against a 3D single-stage MOT algorithm, MOT-DETR, in three different types of sequences with varying levels of complexity. The sequences represent simpler and more complex motions that a robot arm can perform in a tomato greenhouse. Our experiments in a tomato greenhouse show that the single-stage algorithm consistently yields better tracking accuracy, especially in the more challenging sequences where objects are fully occluded or non-visible during several viewpoints.

Abstract (translated)

随着农业食品工业中自动化需求的增加,准确地检测和定位相关物体在3D中的关键是成功的机器人操作。然而,由于存在遮挡,这是一个挑战。多视角感知方法允许机器人克服遮挡,但需要一个跟踪组件来将机器人检测到的物体与多个视角相关联。多对象跟踪(MOT)算法可以分为两阶段和单阶段方法。两阶段方法通常更容易适应定制应用程序,而单阶段方法则呈现出了更复杂的端到端跟踪方法,在遮挡情况下可以获得更好的结果,但需要更多的训练数据。单阶段方法相对于两阶段方法的潜在优势取决于机器人需要处理视点的序列的复杂程度。在本研究中,我们比较了3D两阶段MOT算法(3D-SORT)与3D单阶段MOT算法(MOT-DETR)在三种不同复杂程度的序列中的效果。这些序列代表机器人手臂在番茄温室中可以执行的更简单和更复杂的动作。我们在番茄温室中的实验结果表明,单阶段算法在跟踪准确性方面始终优于双阶段算法,尤其是在更具有挑战性的序列中,对象在多个视角中都被完全遮挡或不可见的情况下。

URL

https://arxiv.org/abs/2404.12963

PDF

https://arxiv.org/pdf/2404.12963.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot