Paper Reading AI Learner

Dior-CVAE: Diffusion Priors in Variational Dialog Generation

2023-05-24 11:06:52
Tianyu Yang, Thy Thy Tran, Iryna Gurevych

Abstract

Conditional variational autoencoders (CVAEs) have been used recently for diverse response generation, by introducing latent variables to represent the relationship between a dialog context and its potential responses. However, the diversity of the generated responses brought by a CVAE model is limited due to the oversimplified assumption of the isotropic Gaussian prior. We propose, Dior-CVAE, a hierarchical CVAE model with an informative prior produced by a diffusion model. Dior-CVAE derives a series of layer-wise latent variables using attention mechanism and infusing them into decoder layers accordingly. We propose memory dropout in the latent infusion to alleviate posterior collapse. The prior distribution of the latent variables is parameterized by a diffusion model to introduce a multimodal distribution. Overall, experiments on two popular open-domain dialog datasets indicate the advantages of our approach over previous Transformer-based variational dialog models in dialog response generation. We publicly release the code for reproducing Dior-CVAE and all baselines at this https URL.

Abstract (translated)

条件变分自编码器(VAEs)最近被用于多种响应生成,通过引入隐变量来代表对话上下文及其可能响应之间的关系。然而,VAE模型生成的响应多样性受到 isotropicGaussian 前趋式的简化假设的限制。我们提出,Dio-CVAE,一种由扩散模型生成的层级式VAE模型,并提出了 informative prior,该prior 通过注意力机制从每个层生成一组隐变量,并将其注入解码层。我们提出在隐变量注入过程中进行内存删除以减轻后向崩溃。隐变量的前趋分布通过扩散模型参数化以引入多模式分布。总的来说,对两个流行的开放域对话数据集的实验表明,我们的方法和以前的基于Transformer的变分自编码器在对话响应生成方面的优势。我们将在此httpsURL上公开发布代码以复制Dio-CVAE和所有基准模型。

URL

https://arxiv.org/abs/2305.15025

PDF

https://arxiv.org/pdf/2305.15025.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot