Abstract
The acquisition cost for large, annotated motion datasets remains a critical bottleneck for skeletal-based Human Activity Recognition (HAR). Although Text-to-Motion (T2M) generative models offer a compelling, scalable source of synthetic data, their training objectives, which emphasize general artistic motion, and dataset structures fundamentally differ from HAR's requirements for kinematically precise, class-discriminative actions. This disparity creates a significant domain gap, making generalist T2M models ill-equipped for generating motions suitable for HAR classifiers. To address this challenge, we propose KineMIC (Kinetic Mining In Context), a transfer learning framework for few-shot action synthesis. KineMIC adapts a T2M diffusion model to an HAR domain by hypothesizing that semantic correspondences in the text encoding space can provide soft supervision for kinematic distillation. We operationalize this via a kinetic mining strategy that leverages CLIP text embeddings to establish correspondences between sparse HAR labels and T2M source data. This process guides fine-tuning, transforming the generalist T2M backbone into a specialized few-shot Action-to-Motion generator. We validate KineMIC using HumanML3D as the source T2M dataset and a subset of NTU RGB+D 120 as the target HAR domain, randomly selecting just 10 samples per action class. Our approach generates significantly more coherent motions, providing a robust data augmentation source that delivers a +23.1% accuracy points improvement. Animated illustrations and supplementary materials are available at (this https URL).
Abstract (translated)
大规模标注运动数据集的获取成本仍然是基于骨骼的人体活动识别(HAR)的关键瓶颈。尽管文本到运动(T2M)生成模型提供了大量合成数据的有吸引力且可扩展来源,但这些模型的训练目标强调的是艺术性的通用动作,并且其数据结构与HAR对运动学精确性和类别区分性要求根本不同。这种差异导致了显著的领域差距,使得通才型T2M模型不适合为HAR分类器生成合适的运动。 为了应对这一挑战,我们提出了KineMIC(Kinetic Mining In Context),这是一个用于少量样本动作合成的迁移学习框架。KineMIC通过假设文本编码空间中的语义对应关系可以提供软监督来将T2M扩散模型适应到HAR领域,从而进行运动学蒸馏。我们通过一个利用CLIP文本嵌入建立稀疏HAR标签与T2M源数据之间关联的运动挖掘策略实现这一点。这一过程指导微调工作,将通才型T2M主干网络转化为少量样本的动作到运动生成器。 我们在HumanML3D作为源T2M数据集和NTU RGB+D 120的一部分作为目标HAR领域上验证了KineMIC的效果,并随机从每个动作类别中选择仅10个样本。我们的方法生成了更连贯的运动,提供了一种稳健的数据增强来源,使模型准确率提高了23.1个百分点。 有关动画插图和补充材料,请访问(此 https URL)。
URL
https://arxiv.org/abs/2512.11654