Abstract
Conditional variational autoencoders (CVAEs) have been used recently for diverse response generation, by introducing latent variables to represent the relationship between a dialog context and its potential responses. However, the diversity of the generated responses brought by a CVAE model is limited due to the oversimplified assumption of the isotropic Gaussian prior. We propose, Dior-CVAE, a hierarchical CVAE model with an informative prior produced by a diffusion model. Dior-CVAE derives a series of layer-wise latent variables using attention mechanism and infusing them into decoder layers accordingly. We propose memory dropout in the latent infusion to alleviate posterior collapse. The prior distribution of the latent variables is parameterized by a diffusion model to introduce a multimodal distribution. Overall, experiments on two popular open-domain dialog datasets indicate the advantages of our approach over previous Transformer-based variational dialog models in dialog response generation. We publicly release the code for reproducing Dior-CVAE and all baselines at this https URL.
Abstract (translated)
条件变分自编码器(VAEs)最近被用于多种响应生成,通过引入隐变量来代表对话上下文及其可能响应之间的关系。然而,VAE模型生成的响应多样性受到 isotropicGaussian 前趋式的简化假设的限制。我们提出,Dio-CVAE,一种由扩散模型生成的层级式VAE模型,并提出了 informative prior,该prior 通过注意力机制从每个层生成一组隐变量,并将其注入解码层。我们提出在隐变量注入过程中进行内存删除以减轻后向崩溃。隐变量的前趋分布通过扩散模型参数化以引入多模式分布。总的来说,对两个流行的开放域对话数据集的实验表明,我们的方法和以前的基于Transformer的变分自编码器在对话响应生成方面的优势。我们将在此httpsURL上公开发布代码以复制Dio-CVAE和所有基准模型。
URL
https://arxiv.org/abs/2305.15025