Paper Reading AI Learner

Constraint Back-translation Improves Complex Instruction Following of Large Language Models

2024-10-31 17:42:26
Yunjia Qi, Hao Peng, Xiaozhi Wang, Bin Xu, Lei Hou, Juanzi Li

Abstract

Large language models (LLMs) struggle to follow instructions with complex constraints in format, length, etc. Following the conventional instruction-tuning practice, previous works conduct post-training on complex instruction-response pairs generated by feeding complex instructions to advanced LLMs. However, even advanced LLMs cannot follow complex instructions well, thus limiting the quality of generated data. In this work, we find that existing datasets inherently contain implicit complex constraints and propose a novel data generation technique, constraint back-translation. Specifically, we take the high-quality instruction-response pairs in existing datasets and only adopt advanced LLMs to add complex constraints already met by the responses to the instructions, which naturally reduces costs and data noise. In the experiments, we adopt Llama3-70B-Instruct to back-translate constraints and create a high-quality complex instruction-response dataset, named CRAB. We present that post-training on CRAB improves multiple backbone LLMs' complex instruction-following ability, evaluated on extensive instruction-following benchmarks. We further find that constraint back-translation also serves as a useful auxiliary training objective in post-training. Our code, data, and models will be released to facilitate future research.

Abstract (translated)

大型语言模型(LLMs)在遵循包含复杂格式、长度等约束的指令方面存在困难。依照传统的指令微调实践,之前的研究对由向高级LLMs提供复杂指令生成的复杂指令-响应对进行了后训练。然而,即使是高级LLMs也无法很好地遵循复杂的指令,从而限制了生成数据的质量。在这项工作中,我们发现现有的数据集内在地包含了隐性的复杂约束,并提出了一种新颖的数据生成技术——约束反翻译。具体而言,我们采用现有数据集中高质量的指令-响应对,并仅使用高级LLMs将响应已经满足的复杂约束添加到指令中,这自然降低了成本和数据噪声。在实验中,我们采用了Llama3-70B-Instruct进行反向翻译约束,并创建了一个高质量的复杂指令-响应数据集,命名为CRAB。我们的结果显示,在CRAB上进行后训练可以提升多个骨干LLMs的复杂指令遵循能力,通过广泛的指令遵循基准测试进行了评估。我们进一步发现,约束反翻译也作为后训练中的一个有用的辅助训练目标。我们的代码、数据和模型将会公开发布,以促进未来的研究。

URL

https://arxiv.org/abs/2410.24175

PDF

https://arxiv.org/pdf/2410.24175.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot