Soundify: Matching Sound Effects to Video

Abstract
Abstract (translated)
URL
PDF

Abstract

In the art of video editing, sound is really half the story. A skilled video editor overlays sounds, such as effects and ambients, over footage to add character to an object or immerse the viewer within a space. However, through formative interviews with professional video editors, we found that this process can be extremely tedious and time-consuming. We introduce Soundify, a system that matches sound effects to video. By leveraging labeled, studio-quality sound effects libraries and extending CLIP, a neural network with impressive zero-shot image classification capabilities, into a "zero-shot detector", we are able to produce high-quality results without resource-intensive correspondence learning or audio generation. We encourage you to have a look at, or better yet, have a listen to the results at this https URL.

Abstract (translated)

URL

https://arxiv.org/abs/2112.09726

PDF

https://arxiv.org/pdf/2112.09726.pdf