Paper Reading AI Learner

LVNS-RAVE: Diversified audio generation with RAVE and Latent Vector Novelty Search

2024-04-22 10:20:41
Jinyue Guo, Anna-Maria Christodoulou, Balint Laczko, Kyrre Glette

Abstract

Evolutionary Algorithms and Generative Deep Learning have been two of the most powerful tools for sound generation tasks. However, they have limitations: Evolutionary Algorithms require complicated designs, posing challenges in control and achieving realistic sound generation. Generative Deep Learning models often copy from the dataset and lack creativity. In this paper, we propose LVNS-RAVE, a method to combine Evolutionary Algorithms and Generative Deep Learning to produce realistic and novel sounds. We use the RAVE model as the sound generator and the VGGish model as a novelty evaluator in the Latent Vector Novelty Search (LVNS) algorithm. The reported experiments show that the method can successfully generate diversified, novel audio samples under different mutation setups using different pre-trained RAVE models. The characteristics of the generation process can be easily controlled with the mutation parameters. The proposed algorithm can be a creative tool for sound artists and musicians.

Abstract (translated)

进化算法和生成式深度学习是音效生成任务中最强大的工具之一。然而,它们也有局限性:进化算法需要复杂的架构,在控制和实现真实音效生成方面存在挑战。生成式深度学习模型通常从数据集中复制,缺乏创造性。在本文中,我们提出了LVNS-RAVE方法,将进化算法和生成式深度学习相结合,以产生真实和新的音效。我们使用RAVE模型作为音效生成器,VGGish模型作为新颖性评估器在Latent Vector Novelty Search(LVNS)算法中。报道的实验结果表明,该方法在不同突变设置下,可以成功生成具有多样性的新颖音频样本。通过控制突变参数,可以轻松控制生成过程的特点。所提出的算法可以为音乐家和音响师提供一种创新工具。

URL

https://arxiv.org/abs/2404.14063

PDF

https://arxiv.org/pdf/2404.14063.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot