Abstract
Photorealistic style transfer (PST) enables real-world color grading by adapting reference image colors while preserving content structure. Existing methods mainly follow either approaches: generation-based methods that prioritize stylistic fidelity at the cost of content integrity and efficiency, or global color transformation methods such as LUT, which preserve structure but lack local adaptability. To bridge this gap, we propose Spatial Adaptive 4D Look-Up Table (SA-LUT), combining LUT efficiency with neural network adaptability. SA-LUT features: (1) a Style-guided 4D LUT Generator that extracts multi-scale features from the style image to predict a 4D LUT, and (2) a Context Generator using content-style cross-attention to produce a context map. This context map enables spatially-adaptive adjustments, allowing our 4D LUT to apply precise color transformations while preserving structural integrity. To establish a rigorous evaluation framework for photorealistic style transfer, we introduce PST50, the first benchmark specifically designed for PST assessment. Experiments demonstrate that SA-LUT substantially outperforms state-of-the-art methods, achieving a 66.7% reduction in LPIPS score compared to 3D LUT approaches, while maintaining real-time performance at 16 FPS for video stylization. Our code and benchmark are available at this https URL
Abstract (translated)
光真实感风格迁移(PST)通过适应参考图像的颜色来实现现实世界的色彩分级,同时保持内容结构的完整性。现有方法主要遵循两种路径:一种是优先考虑风格忠实性的生成方法,但牺牲了内容完整性和效率;另一种是全局颜色变换方法,如查找表(LUT),它保留了结构完整性但缺乏局部适应性。为弥合这一差距,我们提出了空间自适应4D查找表(SA-LUT),将LUT的效率与神经网络的适应能力相结合。 SA-LUT的特点包括: 1. 风格引导的4D LUT生成器:从风格图像中提取多尺度特征以预测一个4D LUT。 2. 上下文生成器:使用内容-样式交叉注意力机制来产生上下文映射。这个上下文映射使得空间自适应调整成为可能,使我们的4D LUT能够执行精确的颜色变换同时保持结构完整性。 为了建立光真实感风格迁移的严格评估框架,我们引入了PST50,这是第一个专门用于PST评估的基准测试。实验结果表明,SA-LUT显著优于现有的最佳方法,在LPIPS评分上比3D LUT方法减少了66.7%,同时在视频着色方面保持了每秒16帧的实时性能。 我们的代码和基准可以在以下网址获得:[此URL](请将方括号中的内容替换为实际链接)。
URL
https://arxiv.org/abs/2506.13465