Abstract
Generating realistic human motions that naturally respond to both spoken language and physical objects is crucial for interactive digital experiences. Current methods, however, address speech-driven gestures or object interactions independently, limiting real-world applicability due to a lack of integrated, comprehensive datasets. To overcome this, we introduce InteracTalker, a novel framework that seamlessly integrates prompt-based object-aware interactions with co-speech gesture generation. We achieve this by employing a multi-stage training process to learn a unified motion, speech, and prompt embedding space. To support this, we curate a rich human-object interaction dataset, formed by augmenting an existing text-to-motion dataset with detailed object interaction annotations. Our framework utilizes a Generalized Motion Adaptation Module that enables independent training, adapting to the corresponding motion condition, which is then dynamically combined during inference. To address the imbalance between heterogeneous conditioning signals, we propose an adaptive fusion strategy, which dynamically reweights the conditioning signals during diffusion sampling. InteracTalker successfully unifies these previously separate tasks, outperforming prior methods in both co-speech gesture generation and object-interaction synthesis, outperforming gesture-focused diffusion methods, yielding highly realistic, object-aware full-body motions with enhanced realism, flexibility, and control.
Abstract (translated)
生成逼真的人体动作,这些动作能够自然地响应口语和物理对象,对于互动数字体验至关重要。然而,当前的方法分别处理由语音驱动的手势和与物体的交互,这限制了它们在现实世界中的应用能力,因为缺乏综合的数据集。为了解决这一问题,我们引入了一个名为InteracTalker的新框架,该框架能够无缝地将基于提示的、感知到的对象互动与伴随言语的手势生成相结合。通过采用一个多阶段训练过程来学习统一的动作、语音和提示嵌入空间,我们实现了这一点。 为了支持这个框架,我们精心策划了一个丰富的人体-物体交互数据集,它是通过对现有的文本至动作的数据集进行详细的对象交互注释扩充而形成的。我们的框架利用了一个通用的运动适应模块,该模块允许独立训练,并根据相应的运动条件进行调整,然后在推理过程中动态结合。 为了应对异构调节信号之间的不平衡问题,我们提出了一种自适应融合策略,在扩散采样期间动态重新加权调节信号。InteracTalker成功地将这些以前分离的任务统一起来,在伴随言语的手势生成和对象互动合成方面均超过了先前的方法,特别是在手势聚焦的扩散方法上表现更优。通过这种方式,InteracTalker能够生成具有增强现实感、灵活性和控制性的高度逼真且感知到物体的存在全身体动。
URL
https://arxiv.org/abs/2512.12664