Paper Reading AI Learner

MVPaint: Synchronized Multi-View Diffusion for Painting Anything 3D

2024-11-04 17:59:39
Wei Cheng, Juncheng Mu, Xianfang Zeng, Xin Chen, Anqi Pang, Chi Zhang, Zhibin Wang, Bin Fu, Gang Yu, Ziwei Liu, Liang Pan

Abstract

Texturing is a crucial step in the 3D asset production workflow, which enhances the visual appeal and diversity of 3D assets. Despite recent advancements in Text-to-Texture (T2T) generation, existing methods often yield subpar results, primarily due to local discontinuities, inconsistencies across multiple views, and their heavy dependence on UV unwrapping outcomes. To tackle these challenges, we propose a novel generation-refinement 3D texturing framework called MVPaint, which can generate high-resolution, seamless textures while emphasizing multi-view consistency. MVPaint mainly consists of three key modules. 1) Synchronized Multi-view Generation (SMG). Given a 3D mesh model, MVPaint first simultaneously generates multi-view images by employing an SMG model, which leads to coarse texturing results with unpainted parts due to missing observations. 2) Spatial-aware 3D Inpainting (S3I). To ensure complete 3D texturing, we introduce the S3I method, specifically designed to effectively texture previously unobserved areas. 3) UV Refinement (UVR). Furthermore, MVPaint employs a UVR module to improve the texture quality in the UV space, which first performs a UV-space Super-Resolution, followed by a Spatial-aware Seam-Smoothing algorithm for revising spatial texturing discontinuities caused by UV unwrapping. Moreover, we establish two T2T evaluation benchmarks: the Objaverse T2T benchmark and the GSO T2T benchmark, based on selected high-quality 3D meshes from the Objaverse dataset and the entire GSO dataset, respectively. Extensive experimental results demonstrate that MVPaint surpasses existing state-of-the-art methods. Notably, MVPaint could generate high-fidelity textures with minimal Janus issues and highly enhanced cross-view consistency.

Abstract (translated)

纹理化是3D资产生产工作流程中的一个关键步骤,它增强了3D资产的视觉吸引力和多样性。尽管在文本到纹理(T2T)生成方面有了最近的进步,现有方法通常会产生欠佳的结果,这主要是由于局部不连续性、多视角之间的不一致性以及它们对UV展开结果的高度依赖。为了应对这些挑战,我们提出了一种新颖的生成-优化3D纹理框架称为MVPaint,它可以生成高分辨率且无缝的纹理,并强调了多视角的一致性。MVPaint主要由三个关键模块组成。 1) 同步多视角生成(SMG)。给定一个3D网格模型,MVPaint首先通过使用SMG模型同时生成多个视角的图像,这会导致由于缺少观测而导致粗略纹理化结果和未着色的部分。 2) 空间感知3D修补(S3I)。为了确保完整的3D纹理化,我们引入了专门设计用于有效为先前未观察到区域添加纹理的S3I方法。 3) UV优化(UVR)。此外,MVPaint采用了UVR模块以提高UV空间中的纹理质量。该模块首先执行UV空间超级分辨率操作,随后使用一种空间感知接缝平滑算法来修订由于UV展开而引起的空域纹理不连续性。 除此之外,我们还基于从Objaverse数据集中精选的高质量3D网格和整个GSO数据集,建立了两个T2T评估基准:Objaverse T2T基准和GSO T2T基准。广泛的实验结果表明,MVPaint超越了现有的最先进方法。值得注意的是,MVPaint能够生成高保真纹理,并显著减少Janus问题且大幅度提高跨视角一致性。

URL

https://arxiv.org/abs/2411.02336

PDF

https://arxiv.org/pdf/2411.02336.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot