Abstract
Gaussian Splatting (GS) has recently emerged as an efficient representation for rendering 3D scenes from 2D images and has been extended to images, videos, and dynamic 4D content. However, applying style transfer to GS-based representations, especially beyond simple color changes, remains challenging. In this work, we introduce CLIPGaussians, the first unified style transfer framework that supports text- and image-guided stylization across multiple modalities: 2D images, videos, 3D objects, and 4D scenes. Our method operates directly on Gaussian primitives and integrates into existing GS pipelines as a plug-in module, without requiring large generative models or retraining from scratch. CLIPGaussians approach enables joint optimization of color and geometry in 3D and 4D settings, and achieves temporal coherence in videos, while preserving a model size. We demonstrate superior style fidelity and consistency across all tasks, validating CLIPGaussians as a universal and efficient solution for multimodal style transfer.
Abstract (translated)
最近,高斯点表示(Gaussian Splatting,GS)作为一种高效的三维场景渲染方法从二维图像中崭露头角,并已扩展到图像、视频和动态四维内容。然而,将风格转换应用到基于GS的表示上,特别是超越简单的颜色变化方面,仍然具有挑战性。在这项工作中,我们介绍了CLIPGaussians,这是第一个统一的风格转换框架,支持在多个模式下进行文本引导和图像引导的样式化:二维图像、视频、三维物体和四维场景。我们的方法直接操作高斯原语,并作为插件模块集成到现有的GS管道中,无需使用大规模生成模型或从头开始重新训练。CLIPGaussians的方法使三维和四维设置中的颜色和几何形状的联合优化成为可能,在视频中实现了时间一致性的同时保持了模型大小。我们在所有任务上展示了卓越的风格忠实度和一致性,验证了CLIPGaussians作为一个通用且高效的跨模态风格转换解决方案的有效性。
URL
https://arxiv.org/abs/2505.22854