Abstract
Recent advances in Emotional Support Conversation (ESC) have improved emotional support generation by fine-tuning Large Language Models (LLMs) via Supervised Fine-Tuning (SFT). However, common psychological errors still persist. While Direct Preference Optimization (DPO) shows promise in reducing such errors through pairwise preference learning, its effectiveness in ESC tasks is limited by two key challenges: (1) Entangled data structure: Existing ESC data inherently entangles psychological strategies and response content, making it difficult to construct high-quality preference pairs; and (2) Optimization ambiguity: Applying vanilla DPO to such entangled pairwise data leads to ambiguous training objectives. To address these issues, we introduce Inferential Preference Mining (IPM) to construct high-quality preference data, forming the IPM-PrefDial dataset. Building upon this data, we propose a Decoupled ESC framework inspired by Gross's Extended Process Model of Emotion Regulation, which decomposes the ESC task into two sequential subtasks: strategy planning and empathic response generation. Each was trained via SFT and subsequently enhanced by DPO to align with the psychological preference. Extensive experiments demonstrate that our Decoupled ESC framework outperforms joint optimization baselines, reducing preference bias and improving response quality.
Abstract (translated)
近期在情感支持对话(ESC)领域取得的进展,通过监督微调(SFT)对大规模语言模型(LLMs)进行精细调整,提高了情感支持生成的质量。然而,常见的心理错误仍然存在。直接偏好优化(DPO)通过成对偏好学习来减少这些错误显示出潜力,但其在ESC任务中的有效性受到两个关键挑战的限制:(1) 数据结构复杂交织:现有的ESC数据固有地将心理策略和响应内容纠缠在一起,这使得构建高质量的偏好对变得困难;(2) 优化模糊性:将原始DPO应用于这种复杂的成对数据会导致训练目标不明确。为了解决这些问题,我们引入了推断式偏好挖掘(IPM)来构建高质量的偏好数据,并形成了IPM-PrefDial数据集。基于这些数据,我们提出了一个解耦ESC框架,该框架借鉴了格罗斯的情感调节扩展过程模型,将ESC任务分解为两个顺序子任务:策略规划和共鸣回应生成。每个子任务都通过SFT进行训练,并进一步通过DPO进行优化以符合心理偏好。广泛的实验表明,我们的解耦ESC框架优于联合优化基线,在减少偏好评分偏差的同时提升了响应质量。
URL
https://arxiv.org/abs/2505.16995