Abstract
Neural Style Transfer (NST) is a technique for applying the visual characteristics of one image onto another while preserving structural content. Traditionally used for artistic transformations, NST has recently been adapted, e.g., for domain adaptation and data augmentation. This study investigates the use of this technique for enhancing animal facial landmark detectors training. As a case study, we use a recently introduced Ensemble Landmark Detector for 48 anatomical cat facial landmarks and the CatFLW dataset it was trained on, making three main contributions. First, we demonstrate that applying style transfer to cropped facial images rather than full-body images enhances structural consistency, improving the quality of generated images. Secondly, replacing training images with style-transferred versions raised challenges of annotation misalignment, but Supervised Style Transfer (SST) - which selects style sources based on landmark accuracy - retained up to 98% of baseline accuracy. Finally, augmenting the dataset with style-transferred images further improved robustness, outperforming traditional augmentation methods. These findings establish semantic style transfer as an effective augmentation strategy for enhancing the performance of facial landmark detection models for animals and beyond. While this study focuses on cat facial landmarks, the proposed method can be generalized to other species and landmark detection models.
Abstract (translated)
神经风格迁移(NST)是一种将一幅图像的视觉特征应用于另一幅图象以保留结构内容的技术。这种技术传统上用于艺术转换,但最近已被改编用于领域适应和数据增强等领域。本研究探讨了该技术在提升动物面部标志检测器训练效果中的应用。作为案例研究,我们使用了一种为48个解剖学猫面部标志设计的集合标志探测器以及其基于CatFLW数据集进行的训练,并做出三项主要贡献。 首先,研究表明对裁剪后的面部图像而非全身图像应用风格迁移可以增强结构一致性,从而提高生成图像的质量。其次,在用经过风格转换的图像替换训练图像时出现了注释不一致的问题,但通过监督式风格传输(SST)——一种基于标志准确性选择样式来源的方法——能够保留高达98%的基础准确率。最后,将数据集与经过风格迁移的图像进行扩充进一步提升了模型的鲁棒性,并超越了传统的增强方法。 这些发现确立了语义风格迁移作为一种有效的增强策略,可以用于提升动物面部标志检测模型及其他相关领域的性能表现。虽然本研究重点是猫面部标志,但提出的方法也可以推广到其他物种和标志检测模型上。
URL
https://arxiv.org/abs/2505.05640