Abstract
In this paper, we introduce a MusIc conditioned 3D Dance GEneraTion model, named MIDGET based on Dance motion Vector Quantised Variational AutoEncoder (VQ-VAE) model and Motion Generative Pre-Training (GPT) model to generate vibrant and highquality dances that match the music rhythm. To tackle challenges in the field, we introduce three new components: 1) a pre-trained memory codebook based on the Motion VQ-VAE model to store different human pose codes, 2) employing Motion GPT model to generate pose codes with music and motion Encoders, 3) a simple framework for music feature extraction. We compare with existing state-of-the-art models and perform ablation experiments on AIST++, the largest publicly available music-dance dataset. Experiments demonstrate that our proposed framework achieves state-of-the-art performance on motion quality and its alignment with the music.
Abstract (translated)
在本文中,我们提出了一种名为MIDGET的3D舞蹈条件生成模型,基于舞蹈运动向量量化变分自编码器(VQ-VAE)模型和运动生成预训练(GPT)模型,以生成与音乐节奏相符的鲜艳和高质量的舞蹈。为了解决该领域内的挑战,我们引入了三个新组件:1)基于Motion VQ-VAE模型的预训练记忆编码书,用于存储不同人类姿势代码;2)使用运动生成预训练(GPT)模型生成音乐和运动编码器的姿势编码;3)一个简单的音乐特征提取框架。我们与现有的最先进模型进行了比较,并在AIST++这个最大的公开可用音乐舞蹈数据集上进行了消融实验。实验结果表明,我们提出的框架在动量质量和与音乐的对齐方面均取得了最先进的性能。
URL
https://arxiv.org/abs/2404.12062