Paper Reading AI Learner

Schema-Guided Semantic Accuracy: Faithfulness in Task-Oriented Dialogue Response Generation

2023-01-29 22:32:48
Jinghong Chen, Weizhe Lin, Bill Byrne

Abstract

Ensuring that generated utterances are faithful to dialogue actions is crucial for Task-Oriented Dialogue Response Generation. Slot Error Rate (SER) only partially measures generation quality in that it solely assesses utterances generated from non-categorical slots whose values are expected to be reproduced exactly. Utterances generated from categorical slots, which are more variable, are not assessed by SER. We propose Schema-Guided Semantic Accuracy (SGSAcc) to evaluate utterances generated from both categorical and non-categorical slots by recognizing textual entailment. We show that SGSAcc can be applied to evaluate utterances generated from a wide range of dialogue actions in the Schema Guided Dialogue (SGD) dataset with good agreement with human judgment. We also identify a previously overlooked weakness in generating faithful utterances from categorical slots in unseen domains. We show that prefix tuning applied to T5 generation can address this problem. We further build an ensemble of prefix-tuning and fine-tuning models that achieves the lowest SER reported and high SGSAcc on the SGD dataset.

Abstract (translated)

确保生成的响应与对话行动一致对于任务导向的对话响应生成至关重要。隙错误率(ser)仅部分衡量生成质量,因为它仅评估从期望完全复制值的非分类隙生成的响应。生成 categorical 隙的响应,这些响应更不稳定,不受ser评估。我们提出Schema- Guided Semantic Accuracy(SGSacc),以评估从分类和非分类隙生成的响应,通过识别文本关联。我们证明,SGSacc可以应用于评估在Schema GuidedDialogue(SGD)数据集上生成的一系列对话行动的响应,与人类判断有很好的一致性。我们还发现在从未曾访问过的域中生成准确响应时,之前未被注意到的一个弱点。我们展示,对T5生成进行前缀调整可以解决这个问题。我们进一步构建前缀调整和精细调整模型的集成,在SGD数据集上实现了ser报告最低的和高SGSacc的结果。

URL

https://arxiv.org/abs/2301.12568

PDF

https://arxiv.org/pdf/2301.12568.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot