Abstract
We propose an ontology enhanced model for sentence based claim detection. We fused ontology embeddings from a knowledge base with BERT sentence embeddings to perform claim detection for the ClaimBuster and the NewsClaims datasets. Our ontology enhanced approach showed the best results with these small-sized unbalanced datasets, compared to other statistical and neural machine learning models. The experiments demonstrate that adding domain specific features (either trained word embeddings or knowledge graph metadata) can improve traditional ML methods. In addition, adding domain knowledge in the form of ontology embeddings helps avoid the bias encountered in neural network based models, for example the pure BERT model bias towards larger classes in our small corpus.
Abstract (translated)
我们提出了一个基于语义信息论的句子基检测模型。我们将知识库中的元数据与BERT句子嵌入融合,以对ClaimBuster和NewsClaims数据集进行声称检测。我们的元数据增强方法在这些小型不均衡数据集上显示出最佳结果,与其他统计和神经机器学习模型相比。实验证明,向传统机器学习方法中添加领域特定特征(无论是训练的词向量还是知识图元数据)可以提高其性能。此外,以元数据的形式添加领域知识有助于避免基于神经网络模型的偏见,例如在我们小型文本数据集上BERT模型对较大类别的纯偏见。
URL
https://arxiv.org/abs/2402.12282