Paper Reading AI Learner

Beyond Text: Characterizing Domain Expert Needs in Document Research

2025-04-16 21:24:41
Sireesh Gururaja, Nupoor Gandhi, Jeremiah Milbauer, Emma Strubell

Abstract

Working with documents is a key part of almost any knowledge work, from contextualizing research in a literature review to reviewing legal precedent. Recently, as their capabilities have expanded, primarily text-based NLP systems have often been billed as able to assist or even automate this kind of work. But to what extent are these systems able to model these tasks as experts conceptualize and perform them now? In this study, we interview sixteen domain experts across two domains to understand their processes of document research, and compare it to the current state of NLP systems. We find that our participants processes are idiosyncratic, iterative, and rely extensively on the social context of a document in addition its content; existing approaches in NLP and adjacent fields that explicitly center the document as an object, rather than as merely a container for text, tend to better reflect our participants' priorities, though they are often less accessible outside their research communities. We call on the NLP community to more carefully consider the role of the document in building useful tools that are accessible, personalizable, iterative, and socially aware.

Abstract (translated)

处理文档是几乎任何知识工作中不可或缺的一部分,无论是将研究成果放在文献综述的背景下还是审查法律先例。近年来,随着其能力的扩展,主要基于文本的自然语言处理(NLP)系统经常被宣传为能够协助甚至自动化此类工作。但是,这些系统在多大程度上能够像专家现在概念化和执行的任务一样建模这些任务?在这项研究中,我们采访了两个领域的十六位领域专家,以了解他们的文档调研过程,并将其与当前的NLP系统的状态进行比较。我们发现,参与者的流程具有独特性、迭代性和社会背景依赖性,除了内容之外,还高度依赖于文档的社会背景;现有的将文档视为对象而非仅仅是文本容器的NLP及其相关领域的研究方法更符合参与者的需求,尽管这些方法在领域外往往不够普及。我们呼吁NLP社区更加重视文档的角色,在构建可用性强、个性化程度高、迭代性好和社会意识强的工具时考虑这一因素。

URL

https://arxiv.org/abs/2504.12495

PDF

https://arxiv.org/pdf/2504.12495.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot