Paper Reading AI Learner

One LLM to Train Them All: Multi-Task Learning Framework for Fact-Checking

2026-01-16 13:44:25
Malin Astrid Larsson, Harald Fosen Grunnaleite, Vinay Setty

Abstract

Large language models (LLMs) are reshaping automated fact-checking (AFC) by enabling unified, end-to-end verification pipelines rather than isolated components. While large proprietary models achieve strong performance, their closed weights, complexity, and high costs limit sustainability. Fine-tuning smaller open weight models for individual AFC tasks can help but requires multiple specialized models resulting in high costs. We propose \textbf{multi-task learning (MTL)} as a more efficient alternative that fine-tunes a single model to perform claim detection, evidence ranking, and stance detection jointly. Using small decoder-only LLMs (e.g., Qwen3-4b), we explore three MTL strategies: classification heads, causal language modeling heads, and instruction-tuning, and evaluate them across model sizes, task orders, and standard non-LLM baselines. While multitask models do not universally surpass single-task baselines, they yield substantial improvements, achieving up to \textbf{44\%}, \textbf{54\%}, and \textbf{31\%} relative gains for claim detection, evidence re-ranking, and stance detection, respectively, over zero-/few-shot settings. Finally, we also provide practical, empirically grounded guidelines to help practitioners apply MTL with LLMs for automated fact-checking.

Abstract (translated)

大型语言模型(LLMs)通过实现统一的端到端验证流水线而非孤立组件,正在重塑自动化事实核查(AFC)。尽管大型专有模型在性能上表现出色,但其封闭权重、复杂性和高成本限制了可持续性。为特定的AFC任务微调较小的开源模型可以有所帮助,但这需要多个专业化的模型,从而导致高昂的成本。我们提出了一种更高效的替代方案——**多任务学习(MTL)**,该方法通过联合训练单一模型来执行声明检测、证据排序和立场检测任务。使用小型解码器模型(例如Qwen3-4b),我们探索了三种MTL策略:分类头、因果语言建模头以及指令调优,并在不同规模的模型、任务顺序以及非LLM的标准基准线方面对其进行了评估。虽然多任务模型并不总是超越单一任务基线,但它们确实带来了显著改进,在零/少量样本设置下分别实现了**44%**、**54%**和**31%**的相对提升,对应于声明检测、证据重新排序以及立场检测。最后,我们还提供了实用且基于实证的研究指导方针,以帮助实践者利用LLM进行自动化事实核查中的多任务学习应用。

URL

https://arxiv.org/abs/2601.11293

PDF

https://arxiv.org/pdf/2601.11293.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot