Abstract
A recent advancement in the domain of biomedical Entity Linking is the development of powerful two-stage algorithms, an initial candidate retrieval stage that generates a shortlist of entities for each mention, followed by a candidate ranking stage. However, the effectiveness of both stages are inextricably dependent on computationally expensive components. Specifically, in candidate retrieval via dense representation retrieval it is important to have hard negative samples, which require repeated forward passes and nearest neighbour searches across the entire entity label set throughout training. In this work, we show that pairing a proxy-based metric learning loss with an adversarial regularizer provides an efficient alternative to hard negative sampling in the candidate retrieval stage. In particular, we show competitive performance on the recall@1 metric, thereby providing the option to leave out the expensive candidate ranking step. Finally, we demonstrate how the model can be used in a zero-shot setting to discover out of knowledge base biomedical entities.
Abstract (translated)
在生物医学实体链接领域最近的一项进展是开发强大的两个阶段的算法,第一个阶段是初始候选检索阶段,该阶段为每个提及生成一个列表,第二阶段是候选排序阶段。然而,这两个阶段的效率和效果都紧密依赖于计算代价高昂的组件。具体来说,在基于密集表示检索的候选检索阶段中,需要有坚硬的负样本,这些样本需要在整个实体标签集合中反复forward pass并进行相邻邻居搜索,在整个训练过程中都需要进行多次。在这项工作中,我们表明,将基于代理度量学习的loss与对抗性 Regularizer 配对可以在候选检索阶段中提供硬负样本的高效替代方案。特别是,我们展示了在召回@1度量上的 competitive 性能,从而提供了删除昂贵候选排序步骤的选择。最后,我们展示了如何使用模型在零样本环境中从知识库中检索非关联的生物医学实体。
URL
https://arxiv.org/abs/2301.13318