Abstract
We introduce QSTN, an open-source Python framework for systematically generating responses from questionnaire-style prompts to support in-silico surveys and annotation tasks with large language models (LLMs). QSTN enables robust evaluation of questionnaire presentation, prompt perturbations, and response generation methods. Our extensive evaluation ($>40 $ million survey responses) shows that question structure and response generation methods have a significant impact on the alignment of generated survey responses with human answers, and can be obtained for a fraction of the compute cost. In addition, we offer a no-code user interface that allows researchers to set up robust experiments with LLMs without coding knowledge. We hope that QSTN will support the reproducibility and reliability of LLM-based research in the future.
Abstract (translated)
我们介绍了QSTN,这是一个开源的Python框架,用于系统地从问卷式提示中生成响应,以支持基于大型语言模型(LLM)的模拟调查和注释任务。QSTN能够对问卷呈现方式、提示变化以及响应生成方法进行稳健评估。我们的广泛测试(超过4000万份调查回复)表明,问题结构与响应生成方法会对生成的调查回复与人类回答的一致性产生显著影响,并且可以用较低的计算成本获得这些效果。此外,我们还提供了一个无需编码知识即可使用的用户界面,研究人员可以利用它在不编写代码的情况下设置使用LLM的稳健实验。我们希望QSTN能够支持未来基于LLM的研究的可重复性和可靠性。
URL
https://arxiv.org/abs/2512.08646