Abstract
Humans exhibit complex motions that vary depending on the task that they are performing, the interactions they engage in, as well as subject-specific preferences. Therefore, forecasting future poses based on the history of the previous motions is a challenging task. This paper presents an innovative auxiliary-memory-powered deep neural network framework for the improved modelling of historical knowledge. Specifically, we disentangle subject-specific, task-specific, and other auxiliary information from the observed pose sequences and utilise these factorised features to query the memory. A novel Multi-Head knowledge retrieval scheme leverages these factorised feature embeddings to perform multiple querying operations over the historical observations captured within the auxiliary memory. Moreover, our proposed dynamic masking strategy makes this feature disentanglement process dynamic. Two novel loss functions are introduced to encourage diversity within the auxiliary memory while ensuring the stability of the memory contents, such that it can locate and store salient information that can aid the long-term prediction of future motion, irrespective of data imbalances or the diversity of the input data distribution. With extensive experiments conducted on two public benchmarks, Human3.6M and CMU-Mocap, we demonstrate that these design choices collectively allow the proposed approach to outperform the current state-of-the-art methods by significant margins: $>$ 17\% on the Human3.6M dataset and $>$ 9\% on the CMU-Mocap dataset.
Abstract (translated)
人类表现出复杂的运动,这些运动取决于他们正在执行的任务、他们参与的互动以及特定的主题偏好。因此,基于之前运动的历史记录预测未来的 poses 是一项挑战性的任务。本文提出了一种创新的辅助记忆驱动的深度学习框架,以改进对历史知识的建模。具体而言,我们从观察的 pose 序列中分离出特定的主题、任务和其他辅助信息,并利用这些归一化特征来查询记忆。一个新颖的 Multi-Head 知识检索方案利用这些归一化特征嵌入来进行在辅助记忆内多次查询操作,并利用这些特征嵌入来定位和存储有助于长期预测未来运动的重要信息,无论数据不平衡或输入数据分布的多样性。通过在两个公共基准数据集 Human3.6M 和 CMU-Mocap 上进行广泛的实验,我们证明,这些设计选择 collectively 允许该方法通过显著的优势胜过当前的先进方法:在 Human3.6M 数据集上超过了 17%,而在 CMU-Mocap 数据集上超过了 9%。
URL
https://arxiv.org/abs/2305.11394