Paper Reading AI Learner

Global Structure Knowledge-Guided Relation Extraction Method for Visually-Rich Document

2023-05-23 09:18:47
Xiangnan Chen, Juncheng Li, Duo Dong, Qian Xiao, Jun Lin, Xiaozhong Liu, Siliang Tang

Abstract

Visual relation extraction (VRE) aims to extract relations between entities from visuallyrich documents. Existing methods usually predict relations for each entity pair independently based on entity features but ignore the global structure information, i.e., dependencies between entity pairs. The absence of global structure information may make the model struggle to learn long-range relations and easily predict conflicted results. To alleviate such limitations, we propose a GlObal Structure knowledgeguided relation Extraction (GOSE) framework, which captures dependencies between entity pairs in an iterative manner. Given a scanned image of the document, GOSE firstly generates preliminary relation predictions on entity pairs. Secondly, it mines global structure knowledge based on prediction results of the previous iteration and further incorporates global structure knowledge into entity representations. This "generate-capture-incorporate" schema is performed multiple times so that entity representations and global structure knowledge can mutually reinforce each other. Extensive experiments show that GOSE not only outperforms previous methods on the standard fine-tuning setting but also shows promising superiority in cross-lingual learning; even yields stronger data-efficient performance in the low-resource setting.

Abstract (translated)

视觉关系提取(VRE)旨在从视觉丰富的文档中提取实体之间的关系。现有的方法通常基于实体特征独立地预测每个实体对之间的关系,但忽略了全球结构信息,即实体对之间的依赖关系。缺乏全球结构信息可能导致模型 struggle 学习远程关系,并容易预测冲突的结果。为了减轻这些限制,我们提出了一个GLObal Structure知识 guided关系提取(GOSE)框架,该框架以迭代方式捕获实体对之间的依赖关系。给定文档的扫描图像,GOSE首先在实体对上生成初步的关系预测。其次,它基于先前迭代的预测结果挖掘全球结构知识,并进一步将全球结构知识融入实体表示中。这种“生成-捕获-融合” schema 多次执行,以便实体表示和全球结构知识可以互相加强。广泛的实验表明,GOSE不仅在标准微调环境中比先前方法表现更好,而且在跨语言学习中表现出令人瞩目的优越性;甚至在资源匮乏的环境中表现出更强的数据高效性。

URL

https://arxiv.org/abs/2305.13850

PDF

https://arxiv.org/pdf/2305.13850.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot