Abstract
Recently, the progress of learning-by-synthesis has proposed a training model for synthetic images, which can effectively reduce the cost of human and material resources. However, due to the different distribution of synthetic images compared to real images, the desired performance cannot still be achieved. Real images consist of multiple forms of light orientation, while synthetic images consist of a uniform light orientation. These features are considered to be characteristic of outdoor and indoor scenes, respectively. To solve this problem, the previous method learned a model to improve the realism of the synthetic image. Different from the previous methods, this paper takes the first step to purify real images. Through the style transfer task, the distribution of outdoor real images is converted into indoor synthetic images, thereby reducing the influence of light. Therefore, this paper proposes a real-time style transfer network that preserves image content information (eg, gaze direction, pupil center position) of an input image (real image) while inferring style information (eg, image color structure, semantic features) of style image (synthetic image). In addition, the network accelerates the convergence speed of the model and adapts to multi-scale images. Experiments were performed using mixed studies (qualitative and quantitative) methods to demonstrate the possibility of purifying real images in complex directions. Qualitatively, it compares the proposed method with the available methods in a series of indoor and outdoor scenarios of the LPW dataset. In quantitative terms, it evaluates the purified image by training a gaze estimation model on the cross data set. The results show a significant improvement over the baseline method compared to the raw real image.
Abstract (translated)
近年来,合成学习的发展提出了一种合成图像的训练模型,有效地降低了人力物力资源的成本。然而,由于合成图像与真实图像的分布不同,仍然无法达到预期的性能。真实图像由多种形式的光方向组成,而合成图像则由均匀的光方向组成。这些特征分别被认为是室外和室内场景的特征。为了解决这个问题,以前的方法学习了一个模型来提高合成图像的真实性。与以往的方法不同,本文首先对真实图像进行了净化处理。通过样式转换任务,将室外真实图像的分布转化为室内合成图像,从而减少光线的影响。因此,本文提出了一种实时的风格传递网络,在推断风格图像(合成图像)的风格信息(如图像颜色结构、语义特征)的同时,保留输入图像(真实图像)的图像内容信息(如注视方向、瞳孔中心位置)。此外,该网络加快了模型的收敛速度,适应于多尺度图像。实验采用混合研究(定性和定量)方法,以证明在复杂方向上净化真实图像的可能性。定性地比较了该方法与LPW数据集的一系列室内外场景中的可用方法。在定量方面,通过在交叉数据集上训练一个注视估计模型,对纯化后的图像进行评价。结果表明,与原始真实图像相比,基线方法有了显著的改进。
URL
https://arxiv.org/abs/1903.05820