Pun Generation with Surprise

Abstract
Abstract (translated)
URL
PDF

Abstract

We tackle the problem of generating a pun sentence given a pair of homophones (e.g., "died" and "dyed"). Supervised text generation is inappropriate due to the lack of a large corpus of puns, and even if such a corpus existed, mimicry is at odds with generating novel content. In this paper, we propose an unsupervised approach to pun generation using a corpus of unhumorous text and what we call the local-global surprisal principle: we posit that in a pun sentence, there is a strong association between the pun word (e.g., "dyed") and the distant context, as well as a strong association between the alternative word (e.g., "died") and the immediate context. This contrast creates surprise and thus humor. We instantiate this principle for pun generation in two ways: (i) as a measure based on the ratio of probabilities under a language model, and (ii) a retrieve-and-edit approach based on words suggested by a skip-gram model. Human evaluation shows that our retrieve-and-edit approach generates puns successfully 31% of the time, tripling the success rate of a neural generation baseline.

Abstract (translated)

我们解决了在一对同音词的情况下产生双关语句子的问题（例如，“死亡”和“染色”）。由于缺乏大量的双关语语料库，监督文本生成是不合适的，即使存在这样的语料库，模仿也与生成小说内容不一致。在本文中，我们提出了一种无监督的双关语生成方法，该方法使用了一组不含有害成分的文本和我们称之为局部全局意外原则：我们假定在双关语句子中，双关语单词（例如，“染色”）与遥远的上下文之间存在着强烈的联系，以及替代词（例如，“死亡”）和直接背景。这种对比产生了惊喜，因此也带来了幽默。我们用两种方式来举例说明双关语生成的原理：（i）作为一种基于语言模型下概率比的度量；（ii）基于跳过图模型建议的单词的检索和编辑方法。人类评估表明，我们的检索和编辑方法能成功生成31%的双关语，使神经生成基线的成功率增加了两倍。

URL

https://arxiv.org/abs/1904.06828

PDF

https://arxiv.org/pdf/1904.06828.pdf