Chris Donahue
Chris Donahue
About
News
Group
Papers
Light
Dark
Automatic
Vision Language Models
Vision Language Models Are Few-Shot Audio Spectrogram Classifiers
We demonstrate that vision language models (VLMs) are capable of recognizing the content in audio recordings when given corresponding …
Satvik Dixit
,
Laurie M. Heller
,
Chris Donahue
arXiv
PDF
BibTeX
Cite
×