Paper Reading AI Learner

Knowledge Guided Semi-Supervised Learning for Quality Assessment of User Generated Videos

2023-12-24 07:32:03
Shankhanil Mitra, Rajiv Soundararajan

Abstract

Perceptual quality assessment of user generated content (UGC) videos is challenging due to the requirement of large scale human annotated videos for training. In this work, we address this challenge by first designing a self-supervised Spatio-Temporal Visual Quality Representation Learning (ST-VQRL) framework to generate robust quality aware features for videos. Then, we propose a dual-model based Semi Supervised Learning (SSL) method specifically designed for the Video Quality Assessment (SSL-VQA) task, through a novel knowledge transfer of quality predictions between the two models. Our SSL-VQA method uses the ST-VQRL backbone to produce robust performances across various VQA datasets including cross-database settings, despite being learned with limited human annotated videos. Our model improves the state-of-the-art performance when trained only with limited data by around 10%, and by around 15% when unlabelled data is also used in SSL. Source codes and checkpoints are available at this https URL.

Abstract (translated)

用户生成内容(UGC)视频的感知质量评估具有挑战性,因为需要大量标注大规模人类视频进行训练。在这项工作中,我们通过首先设计了一个自监督的时空视觉质量表示学习(ST-VQRL)框架来生成具有稳健质量感的视频特征。然后,我们提出了一种基于双模型的半监督学习(SSL)方法,特别针对视频质量评估(SSL-VQA)任务进行设计,通过两个模型之间质量预测的新知识传递。我们的SSL-VQA方法使用ST-VQRL骨架在各种VQA数据集(包括跨数据库设置)上产生稳健的性能。通过仅使用有限的人类标注视频进行训练,我们的模型在仅有限数据时将最先进的性能提高了约10%,而在使用未标记数据时提高了约15%。源代码和检查点可通过此链接访问:https://www.osac.org/papers/transactions-on-multimedia-computing-and-communication-tmmc-2022-v2/

URL

https://arxiv.org/abs/2312.15425

PDF

https://arxiv.org/pdf/2312.15425.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot