WildChat: 1M ChatGPT Interaction Logs in the Wild

Abstract
Abstract (translated)
URL
PDF

Abstract

Chatbots such as GPT-4 and ChatGPT are now serving millions of users. Despite their widespread use, there remains a lack of public datasets showcasing how these tools are used by a population of users in practice. To bridge this gap, we offered free access to ChatGPT for online users in exchange for their affirmative, consensual opt-in to anonymously collect their chat transcripts and request headers. From this, we compiled WildChat, a corpus of 1 million user-ChatGPT conversations, which consists of over 2.5 million interaction turns. We compare WildChat with other popular user-chatbot interaction datasets, and find that our dataset offers the most diverse user prompts, contains the largest number of languages, and presents the richest variety of potentially toxic use-cases for researchers to study. In addition to timestamped chat transcripts, we enrich the dataset with demographic data, including state, country, and hashed IP addresses, alongside request headers. This augmentation allows for more detailed analysis of user behaviors across different geographical regions and temporal dimensions. Finally, because it captures a broad range of use cases, we demonstrate the dataset's potential utility in fine-tuning instruction-following models. WildChat is released at this https URL under AI2 ImpACT Licenses.

Abstract (translated)

像GPT-4和ChatGPT这样的聊天机器人现在服务着数百万用户。尽管它们在范围内得到了广泛应用，但目前还没有公开的数据集展示这些工具如何在实际用户中使用。为了填补这个空白，我们向在线用户免费提供ChatGPT，条件是他们同意匿名收集他们的聊天记录并请求头。从中，我们编写了WildChat，一个由100万用户与ChatGPT的对话组成的语料库，包括超过250万交互回合。我们比较WildChat与其他受欢迎的用户聊天机器人交互数据集，发现我们的数据集提供了最丰富的用户提示，包含了最多的语言，以及研究人员可以研究的最可能具有破坏性的用例的丰富多样性。除了时间戳化的聊天记录外，我们还通过包括人口统计学数据（包括州、国家和哈希IP地址）来丰富这个数据集。这使得我们可以更详细地分析用户行为在不同地理区域和时间维度上的差异。最后，因为它涵盖了广泛的用例，我们证明了该数据集在微调指令跟随模型的潜在用途上的价值。WildChat现已在https://this URL上发布，符合AI2 ImpACT许可证。

URL

https://arxiv.org/abs/2405.01470

PDF

https://arxiv.org/pdf/2405.01470.pdf

WildChat: 1M ChatGPT Interaction Logs in the Wild

Abstract

Abstract (translated)

URL

PDF Copy

PDF