Paper Reading AI Learner

Music-Driven Group Choreography

2023-03-22 06:26:56
Nhat Le, Thang Pham, Tuong Do, Erman Tjiputra, Quang D. Tran, Anh Nguyen

Abstract

Music-driven choreography is a challenging problem with a wide variety of industrial applications. Recently, many methods have been proposed to synthesize dance motions from music for a single dancer. However, generating dance motion for a group remains an open problem. In this paper, we present $\rm AIOZ-GDANCE$, a new large-scale dataset for music-driven group dance generation. Unlike existing datasets that only support single dance, our new dataset contains group dance videos, hence supporting the study of group choreography. We propose a semi-autonomous labeling method with humans in the loop to obtain the 3D ground truth for our dataset. The proposed dataset consists of $16.7$ hours of paired music and 3D motion from in-the-wild videos, covering $7$ dance styles and $16$ music genres. We show that naively applying single dance generation technique to creating group dance motion may lead to unsatisfactory results, such as inconsistent movements and collisions between dancers. Based on our new dataset, we propose a new method that takes an input music sequence and a set of 3D positions of dancers to efficiently produce multiple group-coherent choreographies. We propose new evaluation metrics for measuring group dance quality and perform intensive experiments to demonstrate the effectiveness of our method.

Abstract (translated)

音乐驱动的编舞是一个具有多种工业应用的挑战性问题。近年来,已经提出了许多方法,用于从音乐中合成舞蹈动作,为单个舞者生成舞蹈动作仍然是一个开放性问题。在本文中,我们提出了$\rm AIOZ-GDANCE$,一个新的大规模数据集,用于音乐驱动的团体舞蹈生成。与仅支持单个舞蹈的现有数据集不同,我们的新数据集包含团体舞蹈视频,因此支持团体编舞的研究。我们提出了一种半自主标记方法,与人类循环参与,以获得我们的数据集的3D基准事实。该提议数据集包括从野生视频中配对的音乐和3D运动长达16.7小时,涵盖了7种舞蹈风格和16种音乐类型。我们表明,天真地应用单个舞蹈生成技术创建团体舞蹈动作可能会产生不满意的结果,如不一致的动作和舞者之间的碰撞。基于我们的新数据集,我们提出了一种新方法,利用输入音乐序列和舞蹈者3D位置块,高效生成多个团体一致性的编舞。我们提出了用于衡量团体舞蹈质量的新评估指标,并进行了大量的实验,以证明我们方法的有效性。

URL

https://arxiv.org/abs/2303.12337

PDF

https://arxiv.org/pdf/2303.12337.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot