Paper Reading AI Learner

SemiNFT: Learning to Transfer Presets from Imitation to Appreciation via Hybrid-Sample Reinforcement Learning

2026-02-09 12:20:33
Melany Yang, Yuhang Yu, Diwang Weng, Jinwei Chen, Wei Dong

Abstract

Photorealistic color retouching plays a vital role in visual content creation, yet manual retouching remains inaccessible to non-experts due to its reliance on specialized expertise. Reference-based methods offer a promising alternative by transferring the preset color of a reference image to a source image. However, these approaches often operate as novice learners, performing global color mappings derived from pixel-level statistics, without a true understanding of semantic context or human aesthetics. To address this issue, we propose SemiNFT, a Diffusion Transformer (DiT)-based retouching framework that mirrors the trajectory of human artistic training: beginning with rigid imitation and evolving into intuitive creation. Specifically, SemiNFT is first taught with paired triplets to acquire basic structural preservation and color mapping skills, and then advanced to reinforcement learning (RL) on unpaired data to cultivate nuanced aesthetic perception. Crucially, during the RL stage, to prevent catastrophic forgetting of old skills, we design a hybrid online-offline reward mechanism that anchors aesthetic exploration with structural review. % experiments Extensive experiments show that SemiNFT not only outperforms state-of-the-art methods on standard preset transfer benchmarks but also demonstrates remarkable intelligence in zero-shot tasks, such as black-and-white photo colorization and cross-domain (anime-to-photo) preset transfer. These results confirm that SemiNFT transcends simple statistical matching and achieves a sophisticated level of aesthetic comprehension. Our project can be found at this https URL.

Abstract (translated)

逼真的色彩修图在视觉内容创作中扮演着至关重要的角色,然而,由于依赖专门的技术知识,手动修图对于非专业人士来说仍然是难以触及的。基于参考的方法通过将参考图像的预设颜色转移到源图像上提供了一个有前景的选择。然而,这些方法往往操作如同初学者的学习过程,仅从像素级统计中进行全局色彩映射,而不理解语义上下文或人类美学。为了解决这个问题,我们提出了SemiNFT(半自主网络迁移框架),这是一个基于扩散变换器(DiT)的修图框架,它模拟了人类艺术训练的发展轨迹:从严格的模仿开始,逐渐演变为直观创造。具体来说,SemiNFT首先通过成对的三元组进行学习,以获得基本的结构保持和色彩映射技能,并进一步过渡到无配对数据上的强化学习(RL)阶段,以培养细微的美学感知能力。尤为重要的是,在RL阶段,为防止旧技能的灾难性遗忘,我们设计了一个混合在线-离线奖励机制,将美学探索与结构审查相结合。 实验结果表明,SemiNFT不仅在标准预设转移基准测试中超越了现有技术方法,还在零样本任务(如黑白照片上色和跨域转换[动漫到真实图片]的预设转移)方面表现出令人印象深刻的智能。这些结果证实了SemiNFT超越简单的统计匹配,并达到了一种复杂的美学理解水平。我们的项目可以在提供的链接地址找到。

URL

https://arxiv.org/abs/2602.08582

PDF

https://arxiv.org/pdf/2602.08582.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot