Paper Reading AI Learner

Your Pre-trained LLM is Secretly an Unsupervised Confidence Calibrator

2025-05-22 13:55:39
Beier Luo, Shuoyuan Wang, Yixuan Li, Hongxin Wei

Abstract

Post-training of large language models is essential for adapting pre-trained language models (PLMs) to align with human preferences and downstream tasks. While PLMs typically exhibit well-calibrated confidence, post-trained language models (PoLMs) often suffer from over-confidence, assigning high confidence to both correct and incorrect outputs, which can undermine reliability in critical applications. A major obstacle in calibrating PoLMs is the scarcity of labeled data for individual downstream tasks. To address this, we propose Disagreement-Aware Confidence Alignment (DACA), a novel unsupervised method to optimize the parameters (e.g., temperature $\tau$) in post-hoc confidence calibration. Our method is motivated by the under-confidence issue caused by prediction disagreement between the PLM and PoLM while aligning their confidence via temperature scaling. Theoretically, the PLM's confidence underestimates PoLM's prediction accuracy on disagreement examples, causing a larger $\tau$ and producing under-confident predictions. DACA mitigates this by selectively using only agreement examples for calibration, effectively decoupling the influence of disagreement. In this manner, our method avoids an overly large $\tau$ in temperature scaling caused by disagreement examples, improving calibration performance. Extensive experiments demonstrate the effectiveness of our method, improving the average ECE of open-sourced and API-based LLMs (e.g. GPT-4o) by up to 15.08$\%$ on common benchmarks.

Abstract (translated)

大型语言模型的后期训练对于将预训练的语言模型(PLM)与人类偏好和下游任务对齐至关重要。尽管预训练的语言模型通常表现出良好的置信度校准,但经过后期训练的语言模型(PoLMs)常常会出现过度自信的问题,即在正确输出和错误输出上都赋予了过高的置信度,这可能会影响其在关键应用中的可靠性。校准PoLM的一个主要障碍是为特定的下游任务获取标注数据极为困难。 为了应对这一挑战,我们提出了一种新的无监督方法——不一致感知置信对齐(DACA),用于优化后期自信校准过程中的参数(如温度$\tau$)。我们的方法基于这样一个动机:当PLM和PoLM在预测中出现分歧时,后者会出现低估自身准确性的现象。理论上,在通过调整温度来校准时,这种低估会导致较大的$\tau$值,并产生过度保守的预测。 DACA通过仅使用一致样本进行校准来缓解这一问题,从而有效地解耦了不一致对校准的影响。这种方法避免了在温度缩放过程中由于不一致样本导致的过大的$\tau$值,从而提高了整体的校准性能。 广泛的实验结果证明了我们方法的有效性,在常见的基准测试中将开源和API基础的大规模语言模型(如GPT-4o)的平均ECE改进高达15.08%。

URL

https://arxiv.org/abs/2505.16690

PDF

https://arxiv.org/pdf/2505.16690.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot