Paper Reading AI Learner

Probabilistic Inference in Language Models via Twisted Sequential Monte Carlo

2024-04-26 17:18:32
Stephen Zhao, Rob Brekelmans, Alireza Makhzani, Roger Grosse

Abstract

Numerous capability and safety techniques of Large Language Models (LLMs), including RLHF, automated red-teaming, prompt engineering, and infilling, can be cast as sampling from an unnormalized target distribution defined by a given reward or potential function over the full sequence. In this work, we leverage the rich toolkit of Sequential Monte Carlo (SMC) for these probabilistic inference problems. In particular, we use learned twist functions to estimate the expected future value of the potential at each timestep, which enables us to focus inference-time computation on promising partial sequences. We propose a novel contrastive method for learning the twist functions, and establish connections with the rich literature of soft reinforcement learning. As a complementary application of our twisted SMC framework, we present methods for evaluating the accuracy of language model inference techniques using novel bidirectional SMC bounds on the log partition function. These bounds can be used to estimate the KL divergence between the inference and target distributions in both directions. We apply our inference evaluation techniques to show that twisted SMC is effective for sampling undesirable outputs from a pretrained model (a useful component of harmlessness training and automated red-teaming), generating reviews with varied sentiment, and performing infilling tasks.

Abstract (translated)

大语言模型(LLMs)具有许多能力和安全技术,包括强化学习(RLHF)、自动红色代理、提示工程和填充,可以将其视为对给定奖励或潜在函数定义的规范化目标分布的采样。在这项工作中,我们利用Sequential Monte Carlo(SMC)的丰富工具箱解决这些概率推理问题。特别是,我们使用学习来的扭曲函数来估计每个时间步的潜在价值的期望,这使得我们在推理时间内专注于有前景的局部序列。我们提出了一个新颖的对比学习方法来学习扭曲函数,并建立了与软强化学习丰富文献的联系。作为我们扭曲SMC框架的补充应用,我们提出了使用新颖的双向SMC边界来评估语言模型推理技术准确性的方法。这些边界可用于在两个方向上估计推理和目标分布之间的KL散度。我们将推理评估技术应用于表明,扭曲SMC对于从预训练模型( harmlessness培训和自动红色代理的有用组件)中采样不良输出(有用的训练和自动红色代理的一个有用组件)和生成带有不同情感的评论以及执行填充任务非常有效。

URL

https://arxiv.org/abs/2404.17546

PDF

https://arxiv.org/pdf/2404.17546.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot