We propose V2Meow, a video-to-music generation system capable of producing high-quality music audio for a diverse range of video input types. Trained on 5,000 hours of music audio clips paired with video frames mined from in-the-wild music videos, V2Meow synthesizes high-fidelity music audio waveforms solely by conditioning on pre-trained general-purpose visual features extracted from video frames, with optional style control via text prompts. Through both qualitative and quantitative evaluations, we demonstrate that our model outperforms various existing music generation systems in terms of visual-audio correspondence and audio quality.