Abstract
Literature-Based Discovery (LBD) aims to discover new scientific knowledge by mining papers and generating hypotheses. Standard LBD is limited to predicting pairwise relations between discrete concepts (e.g., drug-disease links). LBD also ignores critical contexts like experimental settings (e.g., a specific patient population where a drug is evaluated) and background knowledge and motivations that human scientists consider (e.g., to find a drug candidate without specific side effects). We address these limitations with a novel formulation of contextualized-LBD (C-LBD): generating scientific hypotheses in natural language, while grounding them in a context that controls the hypothesis search space. We present a new modeling framework using retrieval of ``inspirations'' from a heterogeneous network of citations and knowledge graph relations, and create a new dataset derived from papers. In automated and human evaluations, our models improve over baselines, including powerful large language models (LLMs), but also reveal challenges on the road to building machines that generate new scientific knowledge.
Abstract (translated)
文献发现(LBD)旨在通过挖掘论文并生成假设来发现新的科学知识。标准LBD只能预测离散概念之间的一对一关系(例如,药物-疾病联系)。LBD也忽略了实验设置(例如,一个特定患者群体评估药物)、人类科学家考虑的背景知识和动机(例如,找到没有特定副作用的药物候选者)等关键上下文。我们采用了一种新的上下文化LBD(C-LBD) formulation,通过从引用和知识图关联网络中的异质性网络中检索“灵感”,提出了一种新的建模框架,并使用该框架从论文中创建了一个新的数据集。在自动化和人类评估中,我们的模型比基准模型更美好,包括强大的大型语言模型(LLMs),但也揭示了在构建生成新知识的机器方面所面临的挑战。
URL
https://arxiv.org/abs/2305.14259