Paper Reading AI Learner

Controlled Text Generation via Language Model Arithmetic

2023-11-24 13:41:12
Jasper Dekoninck, Marc Fischer, Luca Beurer-Kellner, Martin Vechev

Abstract

As Large Language Models (LLMs) are deployed more widely, customization with respect to vocabulary, style and character becomes more important. In this work we introduce model arithmetic, a novel inference framework for composing and biasing LLMs without the need for model (re)training or highly specific datasets. In addition, the framework allows for more precise control of generated text than direct prompting and prior controlled text generation (CTG) techniques. Using model arithmetic, we can express prior CTG techniques as simple formulas and naturally extend them to new and more effective formulations. Further, we show that speculative sampling, a technique for efficient LLM sampling, extends to our setting. This enables highly efficient text generation with multiple composed models with only marginal overhead over a single model. Our empirical evaluation demonstrates that model arithmetic allows fine-grained control of generated text while outperforming state-of-the-art on the task of toxicity reduction.

Abstract (translated)

随着大型语言模型(LLMs)的部署范围越来越广泛,对于词汇、风格和字符的定制就变得更加重要。在这项工作中,我们引入了模型代数,一种新的用于合成和调整LLM的新推理框架,无需进行模型(重新)训练或高度特定的数据集。此外,该框架比直接提示和受控文本生成(CTG)技术更精确地控制生成的文本。使用模型代数,我们可以将先前的CTG技术表示为简单的公式,并自然地将它们扩展到新的更有效的形式。此外,我们还证明了探索性采样技术(一种高效的LLM抽样技术)扩展到我们设置中。这使得仅使用单个模型即可生成具有微小开销的多组合模型。我们的实证评估表明,模型代数允许在生成文本的同时精细地控制文本,并在毒性降低任务中优于最先进的模型。

URL

https://arxiv.org/abs/2311.14479

PDF

https://arxiv.org/pdf/2311.14479.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot