Paper Reading AI Learner

HQ-MPSD: A Multilingual Artifact-Controlled Benchmark for Partial Deepfake Speech Detection

2025-12-15 06:18:43
Menglu Li, Majd Alber, Ramtin Asgarianamiri, Lian Zhao, Xiao-Ping Zhang

Abstract

Detecting partial deepfake speech is challenging because manipulations occur only in short regions while the surrounding audio remains authentic. However, existing detection methods are fundamentally limited by the quality of available datasets, many of which rely on outdated synthesis systems and generation procedures that introduce dataset-specific artifacts rather than realistic manipulation cues. To address this gap, we introduce HQ-MPSD, a high-quality multilingual partial deepfake speech dataset. HQ-MPSD is constructed using linguistically coherent splice points derived from fine-grained forced alignment, preserving prosodic and semantic continuity and minimizing audible and visual boundary artifacts. The dataset contains 350.8 hours of speech across eight languages and 550 speakers, with background effects added to better reflect real-world acoustic conditions. MOS evaluations and spectrogram analysis confirm the high perceptual naturalness of the samples. We benchmark state-of-the-art detection models through cross-language and cross-dataset evaluations, and all models experience performance drops exceeding 80% on HQ-MPSD. These results demonstrate that HQ-MPSD exposes significant generalization challenges once low-level artifacts are removed and multilingual and acoustic diversity are introduced, providing a more realistic and demanding benchmark for partial deepfake detection. The dataset can be found at: this https URL.

Abstract (translated)

检测部分深度伪造语音的难度在于,篡改仅发生在短片段中,而周围的声音仍然是真实的。然而,现有的检测方法在根本上受限于可用数据集的质量,许多这些数据集依赖过时的合成系统和生成过程,引入的是特定于数据集的伪影而非现实中的操作线索。为解决这一缺口,我们推出了HQ-MPSD(高质量多语言部分深度伪造语音数据集)。HQ-MPSD使用通过细粒度强制对齐衍生出的语言连贯拼接点构建而成,保留了韵律和语义连续性,并最小化了听觉和视觉边界伪影。该数据集中包含来自八种语言的550位发言者的350.8小时语音,并添加了背景效果以更好地反映现实世界的声学条件。MOS评估和频谱图分析证实了样本的高度感知自然度。我们通过跨语言和跨数据集评估对最先进的检测模型进行了基准测试,所有模型在HQ-MPSD上的性能下降均超过80%。这些结果表明,当低级伪影被移除且多语言及声学多样性引入时,HQ-MPSD揭示了显著的泛化挑战,并为部分深度伪造检测提供了一个更加现实和苛刻的基准测试标准。 数据集可以在以下网址找到:[此链接](this%20https%20URL)。

URL

https://arxiv.org/abs/2512.13012

PDF

https://arxiv.org/pdf/2512.13012.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot