paper-conference

Amuse: Human-AI Collaborative Songwriting with Multimodal Inspirations

Songwriting is often driven by multimodal inspirations, such as imagery, narratives, or existing music, yet songwriters remain …

Yewon Kim, Sung-Ju Lee, Chris Donahue

Vision Language Models Are Few-Shot Audio Spectrogram Classifiers

We demonstrate that vision language models (VLMs) are capable of recognizing the content in audio recordings when given corresponding …

Satvik Dixit, Laurie M. Heller, Chris Donahue

Local Deployment of Large-Scale Music AI Models on Commodity Hardware

We present the MIDInfinite, a web application capable of generating symbolic music using a large-scale generative AI model locally on …

Xun Zhou, Charlie Ruan, Zihe Zhao, Tianqi Chen, Chris Donahue

Just Label the Repeats for In-The-Wild Audio-to-Score Alignment

We propose an efficient workflow for high-quality offline alignment of in-the-wild performance audio and corresponding sheet music …

Irmak Bukey, Michael Feffer, Chris Donahue

Hookpad Aria: A Copilot for Songwriters

We present Hookpad Aria, a generative AI system designed to assist musicians in writing Western pop songs. Our system is seamlessly …

Chris Donahue, Shih-Lun Wu, Yewon Kim, Dave Carlton, Ryan Miyakawa, John Thickstun

Towards Music-Aware Virtual Assistants

We propose the concept of music-aware virtual assistants, where speech notifications are modified to resemble a voice singing in …

Alexander Wang, David Lindlbauer, Chris Donahue

Do Music Generation Models Encode Music Theory?

Music foundation models possess impressive music generation capabilities. When people compose music, they may infuse their …

Megan Wei, Michael Freeman, Chris Donahue, Chen Sun

The Impact of Element Ordering on LM Agent Performance

There has been a surge of interest in language model agents that can navigate virtual environments such as the web or desktop. To …

Wayne Chi, Ameet Talwalkar, Chris Donahue

Adaptive Accompaniment with ReaLchords

Jamming requires coordination, anticipation, and collaborative creativity between musicians. Current generative models of music produce …

Yusong Wu, Tim Cooijmans, Kyle Kastner, Adam Roberts, Ian Simon, Alexander Scarlatos, Chris Donahue, Cassie Tarakajian, Shayegan Omidshafiei, Aaron Courville, Pablo Samuel Castro, Natasha Jaques, Cheng-Zhi Anna Huang

V2Meow: Meowing to the Visual Beat via Video-to-Music Generation

We propose V2Meow, a video-to-music generation system capable of producing high-quality music audio for a diverse range of video input …

Kun Su, Judith Yue Li, Qingqing Huang, Dima Kuzmin, Joonseok Lee, Chris Donahue, Fei Sha, Aren Jansen, Yu Wang, Mauro Verzetti, Timo I. Denk