Paper Reading AI Learner

Latent Space Secrets of Denoising Text-Autoencoders

2019-05-29 23:22:56
Tianxiao Shen, Jonas Mueller, Regina Barzilay, Tommi Jaakkola

Abstract

While neural language models have recently demonstrated impressive performance in unconditional text generation, controllable generation and manipulation of text remain challenging. Latent variable generative models provide a natural approach for control, but their application to text has proven more difficult than to images. Models such as variational autoencoders may suffer from posterior collapse or learning an irregular latent geometry. We propose to instead employ adversarial autoencoders (AAEs) and add local perturbations by randomly replacing/removing words from input sentences during training. Within the prior enforced by the adversary, structured perturbations in the data space begin to carve and organize the latent space. Theoretically, we prove that perturbations encourage similar sentences to map to similar latent representations. Experimentally, we investigate the trade-off between text-generation and autoencoder-reconstruction capabilities. Our straightforward approach significantly improves over regular AAEs as well as other autoencoders, and enables altering the tense/sentiment of sentences through simple addition of a fixed vector offset to their latent representation.

Abstract (translated)

虽然神经语言模型最近在无条件文本生成方面表现出了令人印象深刻的性能,但文本的可控生成和操作仍然具有挑战性。潜在变量生成模型为控制提供了一种自然的方法,但事实证明,它们在文本中的应用比图像更困难。像变分自动编码器这样的模型可能会出现后塌陷或学习不规则的潜在几何结构。我们建议在培训期间使用对抗性自动编码器(aaes)并通过随机替换/删除输入句子中的单词来增加局部干扰。在敌方实施的先例中,数据空间中的结构化扰动开始分割和组织潜在空间。理论上,我们证明了扰动会促使相似的句子映射到相似的潜在表示。实验上,我们研究了文本生成和自动编码器重建能力之间的权衡。我们的直截了当的方法比常规的aaes和其他自动编码器有了显著的改进,并且能够通过简单地添加一个固定向量来偏移句子的潜在表示来改变句子的时态/情感。

URL

https://arxiv.org/abs/1905.12777

PDF

https://arxiv.org/pdf/1905.12777.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot