DCH-2: A Parallel Customer-Helpdesk Dialogue Corpus with Distributions of Annotators' Labels

2021-04-18 07:35:15

Zhaohao Zeng, Tetsuya Sakai

arXiv_AI

Abstract
Abstract (translated)
URL
PDF

Abstract

We introduce a data set called DCH-2, which contains 4,390 real customer-helpdesk dialogues in Chinese and their English translations. DCH-2 also contains dialogue-level annotations and turn-level annotations obtained independently from either 19 or 20 annotators. The data set was built through our effort as organisers of the NTCIR-14 Short Text Conversation and NTCIR-15 Dialogue Evaluation tasks, to help researchers understand what constitutes an effective customer-helpdesk dialogue, and thereby build efficient and helpful helpdesk systems that are available to customers at all times. In addition, DCH-2 may be utilised for other purposes, for example, as a repository for retrieval-based dialogue systems, or as a parallel corpus for machine translation in the helpdesk domain.

Abstract (translated)

URL

https://arxiv.org/abs/2104.08755

PDF

https://arxiv.org/pdf/2104.08755.pdf