Abstract
As we increasingly seek guidance from LLMs for decision-making in daily life, many of these decisions are not clear-cut and depend significantly on the personal values and ethical standards of the users. We present DailyDilemmas, a dataset of 1,360 moral dilemmas encountered in everyday life. Each dilemma includes two possible actions and with each action, the affected parties and human values invoked. Based on these dilemmas, we consolidated a set of human values across everyday topics e.g., interpersonal relationships, workplace, and environmental issues. We evaluated LLMs on these dilemmas to determine what action they will take and the values represented by these actions. Then, we analyzed these values through the lens of five popular theories inspired by sociology, psychology and philosophy. These theories are: World Value Survey, Moral Foundation Theory, Maslow's Hierarchy of Needs, Aristotle's Virtues, and Plutchik Wheel of Emotion. We find that LLMs are most aligned with the self-expression over survival values in terms of World Value Survey, care over loyalty in Moral Foundation Theory. Interestingly, we find large preferences differences in models for some core values such as truthfulness e.g., Mixtral-8x7B model tends to neglect it by 9.7% while GPT-4-turbo model tends to select it by 9.4%. We also study the recent guidance released by OpenAI (ModelSpec), and Anthropic (Constitutional AI) to understand how their released principles reflect their actual value prioritization when facing nuanced moral reasoning in daily-life settings. We find that end users cannot effectively steer such prioritization using system prompts.
Abstract (translated)
随着我们在日常生活中越来越多地寻求LLM在决策中的指导,许多这些决策并不是非黑即白的,并且取决于用户的个人价值观和道德准则。我们提出了DailyDilemmas数据集,这是一个包含1360个在日常生活中的道德困境的数据集。每个困境都包括两种可能的行动,并且每种行动都涉及到受到影响的各方和 invoked的人类价值观。基于这些困境,我们在日常生活中话题上汇总了人类价值观,例如人际关系、工作和环境问题。我们对LLM在这些困境上的行动进行了评估,以确定他们将采取的行动以及这些行动所代表的人类价值观。然后,我们通过社会、心理学和哲学五个影响较大的理论对这些价值观进行分析。这些理论是:世界价值观调查、道德基础理论、马斯洛需求层次理论、亚里士多德美德理论和情感 wheel 理论。我们发现,LLM在关于自我表达生存价值观方面与自我表达和生存价值观最为相似,在道德基础理论方面与关心忠诚方面最为相似。有趣的是,我们在一些核心价值上发现了很大的偏好差异,例如真理fulness,例如Mixtral-8x7B模型往往忽视了它,而GPT-4-turbo模型往往选择了它。我们还研究了OpenAI(ModelSpec)和Anthropic(宪法AI)最近发布的指导,以了解他们在面对复杂道德推理的日常生活中环境中的实际价值优先级。我们发现,用户无法有效地使用系统提示来引导这种优先级。
URL
https://arxiv.org/abs/2410.02683