Paper Reading AI Learner

Malicious or Benign? Towards Effective Content Moderation for Children's Videos

2023-05-24 20:33:38
Syed Hammad Ahmed, Muhammad Junaid Khan, H. M. Umer Qaisar, Gita Sukthankar

Abstract

Online video platforms receive hundreds of hours of uploads every minute, making manual content moderation impossible. Unfortunately, the most vulnerable consumers of malicious video content are children from ages 1-5 whose attention is easily captured by bursts of color and sound. Scammers attempting to monetize their content may craft malicious children's videos that are superficially similar to educational videos, but include scary and disgusting characters, violent motions, loud music, and disturbing noises. Prominent video hosting platforms like YouTube have taken measures to mitigate malicious content on their platform, but these videos often go undetected by current content moderation tools that are focused on removing pornographic or copyrighted content. This paper introduces our toolkit Malicious or Benign for promoting research on automated content moderation of children's videos. We present 1) a customizable annotation tool for videos, 2) a new dataset with difficult to detect test cases of malicious content and 3) a benchmark suite of state-of-the-art video classification models.

Abstract (translated)

在线视频平台每分钟接受数百小时的上传,使得手动内容 moderation 变得不可能。不幸的是,恶意视频内容的消费者主要是年龄在1-5岁的儿童,他们的注意力集中容易被色彩和声音的突然爆发所捕获。诈骗分子试图以此赚钱可能会制作恶意儿童视频,表面上与教育视频相似,但包括令人恐惧和恶心的角色、暴力行为、音量极高的音乐和令人不安的声音。像YouTube这样的知名视频平台已经采取措施减轻平台上的恶意内容,但这些视频往往被当前专注于删除色情或版权内容的 content moderation 工具所忽略。本文介绍了我们用于促进儿童视频自动内容 moderation 的研究的恶意或良性工具集。我们介绍了1)一个可自定义的视频标注工具、2)一个难以检测的恶意内容测试集和新的数据集,以及3)最先进的视频分类模型的基准套件。

URL

https://arxiv.org/abs/2305.15551

PDF

https://arxiv.org/pdf/2305.15551.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot