Paper Reading AI Learner

HALT: Hallucination Assessment via Log-probs as Time series

2026-02-02 22:46:23
Ahmad Shapiro, Karan Taneja, Ashok Goel

Abstract

Hallucinations remain a major obstacle for large language models (LLMs), especially in safety-critical domains. We present HALT (Hallucination Assessment via Log-probs as Time series), a lightweight hallucination detector that leverages only the top-20 token log-probabilities from LLM generations as a time series. HALT uses a gated recurrent unit model combined with entropy-based features to learn model calibration bias, providing an extremely efficient alternative to large encoders. Unlike white-box approaches, HALT does not require access to hidden states or attention maps, relying only on output log-probabilities. Unlike black-box approaches, it operates on log-probs rather than surface-form text, which enables stronger domain generalization and compatibility with proprietary LLMs without requiring access to internal weights. To benchmark performance, we introduce HUB (Hallucination detection Unified Benchmark), which consolidates prior datasets into ten capabilities covering both reasoning tasks (Algorithmic, Commonsense, Mathematical, Symbolic, Code Generation) and general purpose skills (Chat, Data-to-Text, Question Answering, Summarization, World Knowledge). While being 30x smaller, HALT outperforms Lettuce, a fine-tuned modernBERT-base encoder, achieving a 60x speedup gain on HUB. HALT and HUB together establish an effective framework for hallucination detection across diverse LLM capabilities.

Abstract (translated)

幻觉(hallucinations)仍然是大型语言模型(LLMs)面临的主要障碍,特别是在安全关键领域。我们介绍了HALT(通过概率时间序列评估幻觉),这是一种轻量级的幻觉检测器,仅利用大模型生成输出中的前20个令牌对数概率作为时间序列。HALT采用门控循环单元模型结合基于熵的特征来学习模型校准偏差,并提供了一个极为高效的替代方案,相比大型编码器更加高效。与白盒方法不同,HALT不需要访问隐藏状态或注意力图,仅依赖于输出对数概率。不同于黑盒方法,它在对数概率而非表面形式文本上操作,这使得其能够在不同的领域中实现更强的一般化能力,并且能够兼容专有LLMs,而无需访问内部权重。 为了评估性能,我们引入了HUB(幻觉检测统一基准),该基准将先前的数据集整合为涵盖推理任务(算法、常识、数学、符号和代码生成)以及通用技能(聊天、数据到文本、问答、摘要、世界知识)在内的十种能力。尽管大小仅为30倍于后者,HALT的表现优于使用现代BERT-base模型进行微调的方法Lettuce,并在HUB上实现了60倍的速度提升。HALT和HUB共同为跨多种LLM功能的幻觉检测建立了有效的框架。

URL

https://arxiv.org/abs/2602.02888

PDF

https://arxiv.org/pdf/2602.02888.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot