The PRISM Alignment Project: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models

Abstract
Abstract (translated)
URL
PDF

Abstract

Human feedback plays a central role in the alignment of Large Language Models (LLMs). However, open questions remain about the methods (how), domains (where), people (who) and objectives (to what end) of human feedback collection. To navigate these questions, we introduce PRISM, a new dataset which maps the sociodemographics and stated preferences of 1,500 diverse participants from 75 countries, to their contextual preferences and fine-grained feedback in 8,011 live conversations with 21 LLMs. PRISM contributes (i) wide geographic and demographic participation in human feedback data; (ii) two census-representative samples for understanding collective welfare (UK and US); and (iii) individualised feedback where every rating is linked to a detailed participant profile, thus permitting exploration of personalisation and attribution of sample artefacts. We focus on collecting conversations that centre subjective and multicultural perspectives on value-laden and controversial topics, where we expect the most interpersonal and cross-cultural disagreement. We demonstrate the usefulness of PRISM via three case studies of dialogue diversity, preference diversity, and welfare outcomes, showing that it matters which humans set alignment norms. As well as offering a rich community resource, we advocate for broader participation in AI development and a more inclusive approach to technology design.

Abstract (translated)

人类反馈在大型语言模型的对齐中扮演着中心角色。然而，关于人类反馈的方法（如何）、领域（在哪里）、参与人群（谁）以及目标（为什么）等问题，仍然存在 open questions。为了回答这些问题，我们引入了 PRISM，一个新数据集，它将 1,500 个不同国家和地区的参与者的社会人口统计学和个人陈述偏好与他们对语境中的人工智能模型的反馈联系起来，在 8,011 个与 21 个大型语言模型进行的有 21,011 个实时对话。PRISM 作出了以下贡献：（i）在人类反馈数据中广泛地理和人口统计学参与；（ii）两个具有代表性的英国和美国人口统计样本，以了解共同福利；（iii）每个人工智能模型中的评分都与详细参与者个人资料相关联，因此可以探索个性化以及对样本元数据的归属。我们关注的是收集那些关注有价值和争议话题的对话，我们预计这将是人与人之间最人际化和跨文化分歧最大的情况。通过三个对话多样性的案例研究、偏好多样性案例研究和福利结果案例研究，我们展示了 PRISM 的有用性。它不仅提供了一个丰富的社区资源，还倡导更广泛地参与人工智能发展和更包容的技术设计。

URL

https://arxiv.org/abs/2404.16019

PDF

https://arxiv.org/pdf/2404.16019.pdf

The PRISM Alignment Project: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models

Abstract

Abstract (translated)

URL

PDF Copy

PDF