Paper Reading AI Learner

Natural Language Generation with Neural Variational Models

2018-08-27 19:40:53
Hareesh Bahuleyan

Abstract

In this thesis, we explore the use of deep neural networks for generation of natural language. Specifically, we implement two sequence-to-sequence neural variational models - variational autoencoders (VAE) and variational encoder-decoders (VED). VAEs for text generation are difficult to train due to issues associated with the Kullback-Leibler (KL) divergence term of the loss function vanishing to zero. We successfully train VAEs by implementing optimization heuristics such as KL weight annealing and word dropout. We also demonstrate the effectiveness of this continuous latent space through experiments such as random sampling, linear interpolation and sampling from the neighborhood of the input. We argue that if VAEs are not designed appropriately, it may lead to bypassing connections which results in the latent space being ignored during training. We show experimentally with the example of decoder hidden state initialization that such bypassing connections degrade the VAE into a deterministic model, thereby reducing the diversity of generated sentences. We discover that the traditional attention mechanism used in sequence-to-sequence VED models serves as a bypassing connection, thereby deteriorating the model's latent space. In order to circumvent this issue, we propose the variational attention mechanism where the attention context vector is modeled as a random variable that can be sampled from a distribution. We show empirically using automatic evaluation metrics, namely entropy and distinct measures, that our variational attention model generates more diverse output sentences than the deterministic attention model. A qualitative analysis with human evaluation study proves that our model simultaneously produces sentences that are of high quality and equally fluent as the ones generated by the deterministic attention counterpart.

Abstract (translated)

在本论文中,我们探索了深度神经网络在自然语言生成中的应用。具体来说,我们实现了两个序列到序列的神经变分模型 - 变分自动编码器(VAE)和变分编码器 - 解码器(VED)。由于与损失函数的Kullback-Leibler(KL)发散项相关的问题消失为零,因此用于文本生成的VAE难以训练。我们通过实施优化启发式(例如KL权重退火和字丢失)成功地训练VAE。我们还通过诸如随机采样,线性插值和来自输入邻域的采样等实验证明了这种连续潜在空间的有效性。我们认为,如果VAE设计不当,可能会导致绕过连接,从而导致潜在空间在训练期间被忽略。我们通过实验证明了解码器隐藏状态初始化的示例,这种绕过连接将VAE降级为确定性模型,从而减少了生成的句子的多样性。我们发现序列到序列VED模型中使用的传统注意机制用作旁路连接,从而恶化了模型的潜在空间。为了避免这个问题,我们提出了变分注意机制,其中注意上下文向量被建模为可以从分布中采样的随机变量。我们使用自动评估指标(即熵和不同的度量)来经验证明,我们的变分注意模型比确定性注意模型产生更多样化的输出句子。通过人类评估研究进行的定性分析证明,我们的模型同时产生的句子质量高,并且与确定性关注对象产生的句子一样流畅。

URL

https://arxiv.org/abs/1808.09012

PDF

https://arxiv.org/pdf/1808.09012.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot