Abstract
The computational treatment of arguments on controversial issues has been subject to extensive NLP research, due to its envisioned impact on opinion formation, decision making, writing education, and the like. A critical task in any such application is the assessment of an argument's quality - but it is also particularly challenging. In this position paper, we start from a brief survey of argument quality research, where we identify the diversity of quality notions and the subjectiveness of their perception as the main hurdles towards substantial progress on argument quality assessment. We argue that the capabilities of instruction-following large language models (LLMs) to leverage knowledge across contexts enable a much more reliable assessment. Rather than just fine-tuning LLMs towards leaderboard chasing on assessment tasks, they need to be instructed systematically with argumentation theories and scenarios as well as with ways to solve argument-related problems. We discuss the real-world opportunities and ethical issues emerging thereby.
Abstract (translated)
翻译:在有争议问题上的论点处理计算方法一直以来都受到自然语言处理(NLP)研究的广泛探讨,因为它的设想对观点形成、决策和写作教育等方面都产生了影响。在这样一个应用中,对论点质量的评估是一个关键的任务,但这也是一个特别具有挑战性的任务。在这篇论文中,我们从对论点质量研究的简要调查开始,我们识别出质量观念的多样性以及它们的可感知性是影响对论点质量评估的重大障碍。我们认为,指令大型语言模型(LLMs)在上下文中共享知识的能力使得对论点质量的评估更加可靠。而不仅仅是将LLMs微调到评估任务的领先位置,它们还需要系统地通过论点相关理论场景以及解决论点相关问题的方法进行指令。我们讨论了由此产生的现实世界机会和道德问题。
URL
https://arxiv.org/abs/2403.16084