Abstract
Although large multi-modality models (LMMs) have seen extensive exploration and application in various quality assessment studies, their integration into Point Cloud Quality Assessment (PCQA) remains unexplored. Given LMMs' exceptional performance and robustness in low-level vision and quality assessment tasks, this study aims to investigate the feasibility of imparting PCQA knowledge to LMMs through text supervision. To achieve this, we transform quality labels into textual descriptions during the fine-tuning phase, enabling LMMs to derive quality rating logits from 2D projections of point clouds. To compensate for the loss of perception in the 3D domain, structural features are extracted as well. These quality logits and structural features are then combined and regressed into quality scores. Our experimental results affirm the effectiveness of our approach, showcasing a novel integration of LMMs into PCQA that enhances model understanding and assessment accuracy. We hope our contributions can inspire subsequent investigations into the fusion of LMMs with PCQA, fostering advancements in 3D visual quality analysis and beyond.
Abstract (translated)
尽管大型多模态模型(LMMs)已经在各种质量评估研究中得到了广泛探索和应用,但将LMM集成到点云质量评估(PCQA)中仍然是一个未被探索的问题。鉴于LMM在低级视觉和质量评估任务中的卓越表现和稳健性,本研究旨在调查通过文本监督将PCQA知识传递给LMM的可行性。为了实现这一目标,我们在微调阶段将质量标签转换为文本描述,使LMM可以从点云的二维投影中提取质量评分逻辑。为了弥补在3D领域中感知到的损失,我们还提取了结构特征。然后将这些质量评分和结构特征进行结合并回归到质量分数。我们的实验结果证实了我们的方法的有效性,展示了将LMM与PCQA相结合的新颖之处,提高了模型理解和评估准确性。我们希望我们的研究可以激励后续对LMM与PCQA融合的研究,促进在3D视觉质量分析和 beyond方面的进步。
URL
https://arxiv.org/abs/2404.18203