Paper Reading AI Learner

Contextual Compositionality Detection with External Knowledge Bases andWord Embeddings

2019-03-20 08:53:05
Dongsheng Wang, Quichi Li, Lucas Chaves Lima, Jakob grue Simonsen, Christina Lioma

Abstract

When the meaning of a phrase cannot be inferred from the individual meanings of its words (e.g., hot dog), that phrase is said to be non-compositional. Automatic compositionality detection in multi-word phrases is critical in any application of semantic processing, such as search engines; failing to detect non-compositional phrases can hurt system effectiveness notably. Existing research treats phrases as either compositional or non-compositional in a deterministic manner. In this paper, we operationalize the viewpoint that compositionality is contextual rather than deterministic, i.e., that whether a phrase is compositional or non-compositional depends on its context. For example, the phrase `green card' is compositional when referring to a green colored card, whereas it is non-compositional when meaning permanent residence authorization. We address the challenge of detecting this type of contextual compositionality as follows: given a multi-word phrase, we enrich the word embedding representing its semantics with evidence about its global context (terms it often collocates with) as well as its local context (narratives where that phrase is used, which we call usage scenarios). We further extend this representation with information extracted from external knowledge bases. The resulting representation incorporates both localized context and more general usage of the phrase and allows to detect its compositionality in a non-deterministic and contextual way. Empirical evaluation of our model on a dataset of phrase compositionality, manually collected by crowdsourcing contextual compositionality assessments, shows that our model outperforms state-of-the-art baselines notably on detecting phrase compositionality.

Abstract (translated)

当一个短语的意思不能从它的词(如热狗)的个别含义中推断出来时,这个短语就被认为是非成分的。多词短语的自动组合性检测在任何语义处理应用中都是至关重要的,例如搜索引擎;如果不检测非组合性短语,则会显著损害系统的有效性。现有的研究以确定性的方式将短语视为复合或非复合。本文将复合性是语境而不是确定性的观点具体化,即一个短语是复合性还是非复合性取决于它的语境。例如,当提及绿色卡时,“绿卡”一词是复合词,而当表示永久居留授权时,它是非复合词。我们解决了检测这种上下文组合性的挑战,如下所示:给定一个多词短语,我们用它的全局上下文(它经常与之搭配的术语)以及它的本地上下文(使用该短语的叙述,我们称之为使用场景)的证据丰富了表示其语义的单词嵌入。我们用从外部知识库中提取的信息进一步扩展了这种表示。由此产生的表示结合了本地化上下文和更普遍的短语用法,并允许以非确定性和上下文方式检测其组合性。通过众包上下文组合性评估手动收集的短语组合性数据集对我们模型的实证评估表明,我们的模型在检测短语组合性方面优于最先进的基线。

URL

https://arxiv.org/abs/1903.08389

PDF

https://arxiv.org/pdf/1903.08389.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot