Paper Reading AI Learner

Enhancing Contextual Understanding in Large Language Models through Contrastive Decoding

2024-05-04 20:38:41
Zheng Zhao, Emilio Monti, Jens Lehmann, Haytham Assem

Abstract

Large language models (LLMs) tend to inadequately integrate input context during text generation, relying excessively on encoded prior knowledge in model parameters, potentially resulting in generated text with factual inconsistencies or contextually unfaithful content. LLMs utilize two primary knowledge sources: 1) prior (parametric) knowledge from pretraining, and 2) contextual (non-parametric) knowledge from input prompts. The study addresses the open question of how LLMs effectively balance these knowledge sources during the generation process, specifically in the context of open-domain question answering. To address this issue, we introduce a novel approach integrating contrastive decoding with adversarial irrelevant passages as negative samples to enhance robust context grounding during generation. Notably, our method operates at inference time without requiring further training. We conduct comprehensive experiments to demonstrate its applicability and effectiveness, providing empirical evidence showcasing its superiority over existing methodologies. Our code is publicly available at: this https URL.

Abstract (translated)

大语言模型(LLMs)在文本生成过程中往往不足以很好地整合输入上下文,过分依赖模型参数中的编码先验知识,可能导致生成的文本存在事实不一致或上下文不忠实的内容。LLM利用两个主要知识来源:1)预训练中的先验(参数)知识,2)输入提示中的上下文(非参数)知识。本研究回答了一个开放性问题:LLM在生成过程中如何有效地平衡这些知识来源,尤其是在开放领域问题回答的背景下。为解决这个问题,我们引入了一种新颖的方法,将对比性解码与对抗无关段落作为负样本,以增强生成过程中的上下文接地。值得注意的是,我们的方法在推理过程中无需进一步训练。我们进行了全面的实验来证明其应用性和有效性,提供了其优于现有方法的实证证据。我们的代码可在以下这个链接公开使用:https:// this URL。

URL

https://arxiv.org/abs/2405.02750

PDF

https://arxiv.org/pdf/2405.02750.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot