Abstract
Personalized portrait synthesis, essential in domains like social entertainment, has recently made significant progress. Person-wise fine-tuning based methods, such as LoRA and DreamBooth, can produce photorealistic outputs but need training on individual samples, consuming time and resources and posing an unstable risk. Adapter based techniques such as IP-Adapter freeze the foundational model parameters and employ a plug-in architecture to enable zero-shot inference, but they often exhibit a lack of naturalness and authenticity, which are not to be overlooked in portrait synthesis tasks. In this paper, we introduce a parameter-efficient adaptive generation method, namely HyperLoRA, that uses an adaptive plug-in network to generate LoRA weights, merging the superior performance of LoRA with the zero-shot capability of adapter scheme. Through our carefully designed network structure and training strategy, we achieve zero-shot personalized portrait generation (supporting both single and multiple image inputs) with high photorealism, fidelity, and editability.
Abstract (translated)
个人肖像合成技术在社交娱乐等领域中至关重要,最近取得了显著进展。基于个人微调的方法(如LoRA和DreamBooth)可以生成逼真的图像输出,但需要对每个样本进行训练,这会消耗大量时间和资源,并且存在不稳定的隐患。而基于适配器的技术(例如IP-Adapter),冻结基础模型参数并采用插件架构以实现零样本推理,但在肖像合成任务中往往缺乏自然感和真实性。 在本文中,我们提出了一种参数高效的自适应生成方法——HyperLoRA,该方法使用一个自适应的插件网络来生成LoRA权重,从而结合了LoRA的优越性能与适配器方案的零样本推理能力。通过精心设计的网络结构和训练策略,我们的方法能够实现高逼真度、保真度及可编辑性的零样本个性化肖像生成(支持单图或多图输入)。
URL
https://arxiv.org/abs/2503.16944