Schema-Guided Semantic Accuracy: Faithfulness in Task-Oriented Dialogue Response Generation

Abstract
Abstract (translated)
URL
PDF

Abstract

Ensuring that generated utterances are faithful to dialogue actions is crucial for Task-Oriented Dialogue Response Generation. Slot Error Rate (SER) only partially measures generation quality in that it solely assesses utterances generated from non-categorical slots whose values are expected to be reproduced exactly. Utterances generated from categorical slots, which are more variable, are not assessed by SER. We propose Schema-Guided Semantic Accuracy (SGSAcc) to evaluate utterances generated from both categorical and non-categorical slots by recognizing textual entailment. We show that SGSAcc can be applied to evaluate utterances generated from a wide range of dialogue actions in the Schema Guided Dialogue (SGD) dataset with good agreement with human judgment. We also identify a previously overlooked weakness in generating faithful utterances from categorical slots in unseen domains. We show that prefix tuning applied to T5 generation can address this problem. We further build an ensemble of prefix-tuning and fine-tuning models that achieves the lowest SER reported and high SGSAcc on the SGD dataset.

Abstract (translated)

确保生成的响应与对话行动一致对于任务导向的对话响应生成至关重要。隙错误率(ser)仅部分衡量生成质量,因为它仅评估从期望完全复制值的非分类隙生成的响应。生成 categorical 隙的响应,这些响应更不稳定,不受ser评估。我们提出Schema- Guided Semantic Accuracy(SGSacc),以评估从分类和非分类隙生成的响应,通过识别文本关联。我们证明,SGSacc可以应用于评估在Schema GuidedDialogue(SGD)数据集上生成的一系列对话行动的响应,与人类判断有很好的一致性。我们还发现在从未曾访问过的域中生成准确响应时,之前未被注意到的一个弱点。我们展示,对T5生成进行前缀调整可以解决这个问题。我们进一步构建前缀调整和精细调整模型的集成,在SGD数据集上实现了ser报告最低的和高SGSacc的结果。

URL

https://arxiv.org/abs/2301.12568

PDF

https://arxiv.org/pdf/2301.12568.pdf