Abstract
Large language models (LLMs) struggle to follow instructions with complex constraints in format, length, etc. Following the conventional instruction-tuning practice, previous works conduct post-training on complex instruction-response pairs generated by feeding complex instructions to advanced LLMs. However, even advanced LLMs cannot follow complex instructions well, thus limiting the quality of generated data. In this work, we find that existing datasets inherently contain implicit complex constraints and propose a novel data generation technique, constraint back-translation. Specifically, we take the high-quality instruction-response pairs in existing datasets and only adopt advanced LLMs to add complex constraints already met by the responses to the instructions, which naturally reduces costs and data noise. In the experiments, we adopt Llama3-70B-Instruct to back-translate constraints and create a high-quality complex instruction-response dataset, named CRAB. We present that post-training on CRAB improves multiple backbone LLMs' complex instruction-following ability, evaluated on extensive instruction-following benchmarks. We further find that constraint back-translation also serves as a useful auxiliary training objective in post-training. Our code, data, and models will be released to facilitate future research.
Abstract (translated)
大型语言模型(LLMs)在遵循包含复杂格式、长度等约束的指令方面存在困难。依照传统的指令微调实践,之前的研究对由向高级LLMs提供复杂指令生成的复杂指令-响应对进行了后训练。然而,即使是高级LLMs也无法很好地遵循复杂的指令,从而限制了生成数据的质量。在这项工作中,我们发现现有的数据集内在地包含了隐性的复杂约束,并提出了一种新颖的数据生成技术——约束反翻译。具体而言,我们采用现有数据集中高质量的指令-响应对,并仅使用高级LLMs将响应已经满足的复杂约束添加到指令中,这自然降低了成本和数据噪声。在实验中,我们采用了Llama3-70B-Instruct进行反向翻译约束,并创建了一个高质量的复杂指令-响应数据集,命名为CRAB。我们的结果显示,在CRAB上进行后训练可以提升多个骨干LLMs的复杂指令遵循能力,通过广泛的指令遵循基准测试进行了评估。我们进一步发现,约束反翻译也作为后训练中的一个有用的辅助训练目标。我们的代码、数据和模型将会公开发布,以促进未来的研究。
URL
https://arxiv.org/abs/2410.24175