Paper Reading AI Learner

Tiger200K: Manually Curated High Visual Quality Video Dataset from UGC Platform

2025-04-21 15:44:06
Xianpan Zhou

Abstract

The recent surge in open-source text-to-video generation models has significantly energized the research community, yet their dependence on proprietary training datasets remains a key constraint. While existing open datasets like Koala-36M employ algorithmic filtering of web-scraped videos from early platforms, they still lack the quality required for fine-tuning advanced video generation models. We present Tiger200K, a manually curated high visual quality video dataset sourced from User-Generated Content (UGC) platforms. By prioritizing visual fidelity and aesthetic quality, Tiger200K underscores the critical role of human expertise in data curation, and providing high-quality, temporally consistent video-text pairs for fine-tuning and optimizing video generation architectures through a simple but effective pipeline including shot boundary detection, OCR, border detecting, motion filter and fine bilingual caption. The dataset will undergo ongoing expansion and be released as an open-source initiative to advance research and applications in video generative models. Project page: this https URL

Abstract (translated)

近期,开源文本到视频生成模型的激增显著激发了研究社区的热情,然而它们对专有训练数据集的依赖仍然是一个主要限制。虽然现有的公开数据集如Koala-36M采用了从早期平台抓取并算法过滤网络视频的方法,但这些数据集仍然缺乏用于精细调整先进视频生成模型所需的高质量内容。我们推出了Tiger200K,这是一个来自用户生成内容(UGC)平台的、由人工精挑细选而成的高视觉质量视频数据集。 通过优先考虑视觉保真度和审美品质,Tiger200K突显了人类专业知识在数据整理中的关键作用,并提供了高质量的时间连贯性视频-文本对,用于通过包括镜头边界检测、OCR(光学字符识别)、边框检测、运动过滤及精细双语描述在内的简单但有效的管道进行微调和优化视频生成架构。该数据集将不断扩展并作为开源倡议发布,以推进视频生成模型的研究和应用。 项目页面:[此链接](https://this-url.com)

URL

https://arxiv.org/abs/2504.15182

PDF

https://arxiv.org/pdf/2504.15182.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot