Abstract
Working with documents is a key part of almost any knowledge work, from contextualizing research in a literature review to reviewing legal precedent. Recently, as their capabilities have expanded, primarily text-based NLP systems have often been billed as able to assist or even automate this kind of work. But to what extent are these systems able to model these tasks as experts conceptualize and perform them now? In this study, we interview sixteen domain experts across two domains to understand their processes of document research, and compare it to the current state of NLP systems. We find that our participants processes are idiosyncratic, iterative, and rely extensively on the social context of a document in addition its content; existing approaches in NLP and adjacent fields that explicitly center the document as an object, rather than as merely a container for text, tend to better reflect our participants' priorities, though they are often less accessible outside their research communities. We call on the NLP community to more carefully consider the role of the document in building useful tools that are accessible, personalizable, iterative, and socially aware.
Abstract (translated)
处理文档是几乎任何知识工作中不可或缺的一部分,无论是将研究成果放在文献综述的背景下还是审查法律先例。近年来,随着其能力的扩展,主要基于文本的自然语言处理(NLP)系统经常被宣传为能够协助甚至自动化此类工作。但是,这些系统在多大程度上能够像专家现在概念化和执行的任务一样建模这些任务?在这项研究中,我们采访了两个领域的十六位领域专家,以了解他们的文档调研过程,并将其与当前的NLP系统的状态进行比较。我们发现,参与者的流程具有独特性、迭代性和社会背景依赖性,除了内容之外,还高度依赖于文档的社会背景;现有的将文档视为对象而非仅仅是文本容器的NLP及其相关领域的研究方法更符合参与者的需求,尽管这些方法在领域外往往不够普及。我们呼吁NLP社区更加重视文档的角色,在构建可用性强、个性化程度高、迭代性好和社会意识强的工具时考虑这一因素。
URL
https://arxiv.org/abs/2504.12495