A Diffusion-based Data Generator for Training Object Recognition Models in Ultra-Range Distance

Abstract
Abstract (translated)
URL
PDF

Abstract

Object recognition, commonly performed by a camera, is a fundamental requirement for robots to complete complex tasks. Some tasks require recognizing objects far from the robot's camera. A challenging example is Ultra-Range Gesture Recognition (URGR) in human-robot interaction where the user exhibits directive gestures at a distance of up to 25~m from the robot. However, training a model to recognize hardly visible objects located in ultra-range requires an exhaustive collection of a significant amount of labeled samples. The generation of synthetic training datasets is a recent solution to the lack of real-world data, while unable to properly replicate the realistic visual characteristics of distant objects in images. In this letter, we propose the Diffusion in Ultra-Range (DUR) framework based on a Diffusion model to generate labeled images of distant objects in various scenes. The DUR generator receives a desired distance and class (e.g., gesture) and outputs a corresponding synthetic image. We apply DUR to train a URGR model with directive gestures in which fine details of the gesturing hand are challenging to distinguish. DUR is compared to other types of generative models showcasing superiority both in fidelity and in recognition success rate when training a URGR model. More importantly, training a DUR model on a limited amount of real data and then using it to generate synthetic data for training a URGR model outperforms directly training the URGR model on real data. The synthetic-based URGR model is also demonstrated in gesture-based direction of a ground robot.

Abstract (translated)

物体识别，通常由相机执行，是机器人完成复杂任务的基本要求。有些任务要求在机器人摄像头的远距离内识别物体。一个具有挑战性的例子是在人机交互中，用户在距离机器人25米以内的距离时表现出指令手势。然而，为了训练能够识别距离遥远物体且难以观察的模型的模型，需要收集大量标记样本。生成合成训练数据是针对缺乏真实世界数据的最近解决方案，但它无法正确复制远处物体的真实视觉特征。在本文中，我们基于扩散模型的扩散Ultra-Range (DUR)框架提出了合成远距离物体标记图像的方案。DUR生成器接收所需距离和分类（例如手势）并输出相应的合成图像。我们将DUR应用于训练具有指令手势的URGR模型，其中手部细节的分辨度很高。DUR与其他类型的生成模型进行了比较，展示了在准确性和识别成功率方面的优越性。更重要的是，在有限量的真实数据上训练DUR模型，然后使用该模型生成训练URGR模型的合成数据，比直接在真实数据上训练URGR模型更有效地实现。基于仿真的URGR模型还在基于手势的地面机器人的手部方向中得到了演示。

URL

https://arxiv.org/abs/2404.09846

PDF

https://arxiv.org/pdf/2404.09846.pdf

A Diffusion-based Data Generator for Training Object Recognition Models in Ultra-Range Distance

Abstract

Abstract (translated)

URL

PDF Copy

PDF