Abstract
Do Large Language Models (LLMs) hold positions that conflict with your country's values? Occasionally they do! However, existing works primarily focus on ethical reviews, failing to capture the diversity of national values, which encompass broader policy, legal, and moral considerations. Furthermore, current benchmarks that rely on spectrum tests using manually designed questionnaires are not easily scalable. To address these limitations, we introduce NaVAB, a comprehensive benchmark to evaluate the alignment of LLMs with the values of five major nations: China, the United States, the United Kingdom, France, and Germany. NaVAB implements a national value extraction pipeline to efficiently construct value assessment datasets. Specifically, we propose a modeling procedure with instruction tagging to process raw data sources, a screening process to filter value-related topics and a generation process with a Conflict Reduction mechanism to filter non-conflicting this http URL conduct extensive experiments on various LLMs across countries, and the results provide insights into assisting in the identification of misaligned scenarios. Moreover, we demonstrate that NaVAB can be combined with alignment techniques to effectively reduce value concerns by aligning LLMs' values with the target country.
Abstract (translated)
大型语言模型(LLMs)是否持有与您国家价值观相冲突的立场?有时确实如此!然而,现有研究主要侧重于伦理审查,未能捕捉到国家价值体系中广泛存在的政策、法律和道德考量的多样性。此外,当前依赖手动设计问卷进行范围测试的基准方法难以扩展。为了解决这些限制,我们介绍了NaVAB,这是一个全面的评估基准,旨在衡量大型语言模型与五个主要国家——中国、美国、英国、法国和德国的价值观之间的契合度。NaVAB 实现了一个国家价值观提取管道,能够高效构建价值评估数据集。具体来说,我们提出了一种带有指令标记的建模程序来处理原始数据源,一个筛选过程用于过滤相关主题,并且通过引入冲突减少机制的生成过程来筛除不相关的项目。我们在多个国家的各种LLMs上进行了广泛的实验,结果为识别偏离价值观的情景提供了见解。此外,我们展示了NaVAB可以与对齐技术相结合,有效地降低价值顾虑,将LLM的价值观与其目标国家相协调。
URL
https://arxiv.org/abs/2504.12911