Paper Reading AI Learner

FLAME: Factuality-Aware Alignment for Large Language Models

2024-05-02 17:54:54
Sheng-Chieh Lin, Luyu Gao, Barlas Oguz, Wenhan Xiong, Jimmy Lin, Wen-tau Yih, Xilun Chen

Abstract

Alignment is a standard procedure to fine-tune pre-trained large language models (LLMs) to follow natural language instructions and serve as helpful AI assistants. We have observed, however, that the conventional alignment process fails to enhance the factual accuracy of LLMs, and often leads to the generation of more false facts (i.e. hallucination). In this paper, we study how to make the LLM alignment process more factual, by first identifying factors that lead to hallucination in both alignment steps:\ supervised fine-tuning (SFT) and reinforcement learning (RL). In particular, we find that training the LLM on new knowledge or unfamiliar texts can encourage hallucination. This makes SFT less factual as it trains on human labeled data that may be novel to the LLM. Furthermore, reward functions used in standard RL can also encourage hallucination, because it guides the LLM to provide more helpful responses on a diverse set of instructions, often preferring longer and more detailed responses. Based on these observations, we propose factuality-aware alignment, comprised of factuality-aware SFT and factuality-aware RL through direct preference optimization. Experiments show that our proposed factuality-aware alignment guides LLMs to output more factual responses while maintaining instruction-following capability.

Abstract (translated)

对齐是一种对预训练的大型语言模型(LLMs)进行微调的标准程序,以遵循自然语言指令并作为有帮助的AI助手。然而,我们观察到,传统的对齐过程无法增强LLMs的事实准确性,并通常导致生成更多的虚假事实(即幻觉)。在本文中,我们研究了如何使LLM的对齐过程更加事实准确,通过首先确定导致对齐步骤中出现幻觉的因素:有监督的微调(SFT)和强化学习(RL)。 特别是,我们发现,在为LLM提供新知识或熟悉文本进行训练时,可能会鼓励幻觉。这使得SFT变得不准确,因为它在训练时使用的人类标注数据可能对LLM来说是新颖的。此外,标准RL中使用的奖励函数也可能鼓励幻觉,因为它引导LLM为多样性的指令提供更有帮助的回答,往往更喜欢更长的、更详细的回答。 基于这些观察结果,我们提出了具有事实意识的对齐方法,通过直接偏好优化实现事实意识SFT和事实意识RL。实验证明,我们提出的事实意识对齐引导LLMs输出更准确的事实性响应,同时保持指令跟踪能力。

URL

https://arxiv.org/abs/2405.01525

PDF

https://arxiv.org/pdf/2405.01525.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot