Abstract
Culture is a core component of human-to-human interaction and plays a vital role in how we perceive and interact with others. Advancements in the effectiveness of Large Language Models (LLMs) in generating human-sounding text have greatly increased the amount of human-to-computer interaction. As this field grows, the cultural alignment of these human-like agents becomes an important field of study. Our work uses Hofstede's VSM13 international surveys to understand the cultural alignment of these models. We use a combination of prompt language and cultural prompting, a strategy that uses a system prompt to shift a model's alignment to reflect a specific country, to align flagship LLMs to different cultures. Our results show that DeepSeek-V3, V3.1, and OpenAI's GPT-5 exhibit a close alignment with the survey responses of the United States and do not achieve a strong or soft alignment with China, even when using cultural prompts or changing the prompt language. We also find that GPT-4 exhibits an alignment closer to China when prompted in English, but cultural prompting is effective in shifting this alignment closer to the United States. Other low-cost models, GPT-4o and GPT-4.1, respond to the prompt language used (i.e., English or Simplified Chinese) and cultural prompting strategies to create acceptable alignments with both the United States and China.
Abstract (translated)
文化是人与人互动的核心组成部分,在我们感知和与他人互动的方式中扮演着重要角色。大型语言模型(LLMs)生成类似人类文本的有效性提升,大大增加了人机交互的频率。随着这一领域的发展,这些类人性代理的文化对齐成为了一个重要的研究课题。我们的工作使用霍夫斯泰德的文化维度理论VSM13国际调查来理解这些模型的文化对齐情况。我们采用了一种结合提示语言和文化引导的方法,这是一种通过系统提示将模型的对齐调整为反映特定国家文化的策略,以此使旗舰LLMs与不同的文化相一致。 我们的结果显示,DeepSeek-V3、V3.1以及OpenAI的GPT-5在面对美国的调查回应时表现出较高的契合度,并且即使使用了文化引导或改变提示语言,它们也无法实现与中国较强的对齐。此外,我们还发现GPT-4在美国英语环境中被提示时文化定位更接近中国,但通过文化引导可以将其与美国文化的对齐程度提升得更加紧密。其他低成本模型如GPT-4o和GPT-4.1能够根据所使用的提示语言(即英文或简体中文)以及文化引导策略,形成与中国及美国都较为合适的契合度。
URL
https://arxiv.org/abs/2512.09772