Edge-guided Multi-domain RGB-to-TIR image Translation for Training Vision Tasks with Challenging Labels

Abstract
Abstract (translated)
URL
PDF

Abstract

The insufficient number of annotated thermal infrared (TIR) image datasets not only hinders TIR image-based deep learning networks to have comparable performances to that of RGB but it also limits the supervised learning of TIR image-based tasks with challenging labels. As a remedy, we propose a modified multidomain RGB to TIR image translation model focused on edge preservation to employ annotated RGB images with challenging labels. Our proposed method not only preserves key details in the original image but also leverages the optimal TIR style code to portray accurate TIR characteristics in the translated image, when applied on both synthetic and real world RGB images. Using our translation model, we have enabled the supervised learning of deep TIR image-based optical flow estimation and object detection that ameliorated in deep TIR optical flow estimation by reduction in end point error by 56.5\% on average and the best object detection mAP of 23.9\% respectively. Our code and supplementary materials are available at this https URL.

Abstract (translated)

大量的标注热红外图像(TIR)数据集不仅阻碍基于TIR图像的深度学习网络与RGB图像的性能相当,也限制了具有挑战性标签的TIR图像任务的监督学习。作为一种改善措施,我们提出了一种修改的多模态RGB到TIR图像转换模型,重点保护边缘,使用具有挑战性标签的标注RGB图像。我们提出的方法不仅保留了原始图像的关键细节,还利用最佳TIR样式代码在应用于合成和现实世界RGB图像时,将翻译图像的TIR特征呈现为准确的特征,平均减少终点误差56.5%,并最高达到23.9%的目标检测mAP。我们的代码和补充材料可在该httpsURL上获取。

URL

https://arxiv.org/abs/2301.12689

PDF

https://arxiv.org/pdf/2301.12689.pdf