Paper Reading AI Learner

MUJICA: Reforming SISR Models for PBR Material Super-Resolution via Cross-Map Attention

2025-08-13 13:34:39
Xin Du, Maoyuan Xu, Zhi Ying

Abstract

Physically Based Rendering (PBR) materials are typically characterized by multiple 2D texture maps such as basecolor, normal, metallic, and roughness which encode spatially-varying bi-directional reflectance distribution function (SVBRDF) parameters to model surface reflectance properties and microfacet interactions. Upscaling SVBRDF material is valuable for modern 3D graphics applications. However, existing Single Image Super-Resolution (SISR) methods struggle with cross-map inconsistency, inadequate modeling of modality-specific features, and limited generalization due to data distribution shifts. In this work, we propose Multi-modal Upscaling Joint Inference via Cross-map Attention (MUJICA), a flexible adapter that reforms pre-trained Swin-transformer-based SISR models for PBR material super-resolution. MUJICA is seamlessly attached after the pre-trained and frozen SISR backbone. It leverages cross-map attention to fuse features while preserving remarkable reconstruction ability of the pre-trained SISR model. Applied to SISR models such as SwinIR, DRCT, and HMANet, MUJICA improves PSNR, SSIM, and LPIPS scores while preserving cross-map consistency. Experiments demonstrate that MUJICA enables efficient training even with limited resources and delivers state-of-the-art performance on PBR material datasets.

Abstract (translated)

基于物理的渲染(PBR)材质通常由多个2D纹理图,如基础颜色、法线、金属度和粗糙度等组成。这些纹理图编码了空间变化的双向反射分布函数(SVBRDF)参数,用于模拟表面反射特性和微面交互作用。对于现代3D图形应用而言,提升SVBRDF材质的质量非常有价值。然而,现有的单幅图像超分辨率(SISR)方法在处理跨图不一致性、特定模式特征建模不足以及由于数据分布变化而导致的泛化能力有限方面存在困难。 为此,我们提出了通过跨图注意力实现多模态上采样联合推理的方法(MUJICA),这是一种灵活的适配器,可以改造基于预训练Swin变换器的SISR模型用于PBR材质的超分辨率处理。MUJICA可以在预训练且冻结的SISR骨干网络之后无缝连接,并利用跨图注意力来融合特征,同时保持预训练SISR模型出色的重建能力。将MUJICA应用于如SwinIR、DRCT和HMANet等SISR模型时,它能够提高PSNR(峰值信噪比)、SSIM(结构相似性指标)和LPIPS(感知线性单元图像相似度)评分,并保持跨图一致性。实验表明,即使在资源有限的情况下,MUJICA也能实现高效的训练并为PBR材质数据集提供最先进的性能。

URL

https://arxiv.org/abs/2508.09802

PDF

https://arxiv.org/pdf/2508.09802.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot