Counterfactual Reasoning Using Predicted Latent Personality Dimensions for Optimizing Persuasion Outcome

Abstract
Abstract (translated)
URL
PDF

Abstract

Customizing persuasive conversations related to the outcome of interest for specific users achieves better persuasion results. However, existing persuasive conversation systems rely on persuasive strategies and encounter challenges in dynamically adjusting dialogues to suit the evolving states of individual users during interactions. This limitation restricts the system's ability to deliver flexible or dynamic conversations and achieve suboptimal persuasion outcomes. In this paper, we present a novel approach that tracks a user's latent personality dimensions (LPDs) during ongoing persuasion conversation and generates tailored counterfactual utterances based on these LPDs to optimize the overall persuasion outcome. In particular, our proposed method leverages a Bi-directional Generative Adversarial Network (BiCoGAN) in tandem with a Dialogue-based Personality Prediction Regression (DPPR) model to generate counterfactual data. This enables the system to formulate alternative persuasive utterances that are more suited to the user. Subsequently, we utilize the D3QN model to learn policies for optimized selection of system utterances on counterfactual data. Experimental results we obtained from using the PersuasionForGood dataset demonstrate the superiority of our approach over the existing method, BiCoGAN. The cumulative rewards and Q-values produced by our method surpass ground truth benchmarks, showcasing the efficacy of employing counterfactual reasoning and LPDs to optimize reinforcement learning policy in online interactions.

Abstract (translated)

为了获得更好的说服力结果，为特定用户定制说服性对话是很有必要的。然而，现有的说服性对话系统依赖于说服策略，并且在动态调整对话以适应个体用户交互过程中的不断变化的过程中遇到了挑战。这一限制限制了系统实现灵活或动态对话的能力，并导致说服力结果不理想。在本文中，我们提出了一个新方法，用于跟踪用户在持续说服性对话中的潜在人格维度（LPDs），并基于这些LPDs生成定制反事实陈述，以优化整体说服力结果。特别是，我们的方法利用了双向生成对抗网络（BiCoGAN）与对话为基础的人格预测回归（DPPR）模型生成反事实数据。这使得系统能够形成更适合用户的其他说服性陈述。随后，我们使用D3QN模型在反事实数据上学习策略，用于优化系统说出的语句。使用 PersuasionForGood 数据集获得的实验结果表明，我们的方法优越于现有的方法 BiCoGAN。我们方法累积奖励和Q值超过真实基准，这表明在在线交互中利用反事实推理和LPDs优化强化学习策略非常有效。

URL

https://arxiv.org/abs/2404.13792

PDF

https://arxiv.org/pdf/2404.13792.pdf

Counterfactual Reasoning Using Predicted Latent Personality Dimensions for Optimizing Persuasion Outcome

Abstract

Abstract (translated)

URL

PDF Copy

PDF