Abstract
While Chain-of-Thought (CoT) prompting boosts Language Models' (LM) performance on a gamut of complex reasoning tasks, the generated reasoning chain does not necessarily reflect how the model arrives at the answer (aka. faithfulness). We propose Faithful CoT, a faithful-by-construction framework that decomposes a reasoning task into two stages: Translation (Natural Language query $\rightarrow$ symbolic reasoning chain) and Problem Solving (reasoning chain $\rightarrow$ answer), using an LM and a deterministic solver respectively. We demonstrate the efficacy of our approach on 10 reasoning datasets from 4 diverse domains. It outperforms traditional CoT prompting on 9 out of the 10 datasets, with an average accuracy gain of 4.4 on Math Word Problems, 1.9 on Planning, 4.0 on Multi-hop Question Answering (QA), and 18.1 on Logical Inference, under greedy decoding. Together with self-consistency decoding, we achieve new state-of-the-art few-shot performance on 7 out of the 10 datasets, showing a strong synergy between faithfulness and accuracy.
Abstract (translated)
虽然思维链(CoT)prompting在复杂推理任务中可以提高语言模型(LM)的性能,但生成的推理链并不一定反映出模型到达答案的方式(即忠实度)。我们提出了忠实性CoT框架,这是一个构造忠实度的框架,将推理任务分解为两个阶段:翻译(自然语言查询到符号推理链)和问题求解(推理链到答案),分别使用LM和确定性求解器。我们使用这些工具展示了我们方法的有效性,在四个不同的领域中研究了10个推理数据集。传统CoTprompting在9个数据集中表现更好,平均准确率提高了4.4在数学问题中,提高了1.9在规划,提高了4.0在多级问答(QA)中,以及提高了18.1在逻辑推断中,在贪心解码的情况下。同时,结合一致性解码,我们在7个数据集中实现了新的高性能单点行为,这表明忠实度和准确性之间存在强烈的协同作用。
URL
https://arxiv.org/abs/2301.13379