V2Meow: Meowing to the Visual Beat via Video-to-Music Generation

Kun Su, Judith Yue Li, Qingqing Huang, Dima Kuzmin, Joonseok Lee, Chris Donahue, Fei Sha, Aren Jansen, Yu Wang, Mauro Verzetti, Timo I. Denk

February, 2024

Abstract

We propose V2Meow, a video-to-music generation system capable of producing high-quality music audio for a diverse range of video input types. Trained on 5,000 hours of music audio clips paired with video frames mined from in-the-wild music videos, V2Meow synthesizes high-fidelity music audio waveforms solely by conditioning on pre-trained general-purpose visual features extracted from video frames, with optional style control via text prompts. Through both qualitative and quantitative evaluations, we demonstrate that our model outperforms various existing music generation systems in terms of visual-audio correspondence and audio quality.

Type

Conference paper

Publication

In the Proceedings of the AAAI Conference on Artificial Intelligence

V2Meow: Meowing to the Visual Beat via Video-to-Music Generation

Abstract

Chris Donahue

Dannenberg Assistant Professor