/topics/microsoft-vibevoice-speech-to-text-model

Microsoft VibeVoice speech-to-text model

2 items●2 sources●updated 2h ago●trend 2

┌─ summary ─────────────────────────────┐

Microsoft released VibeVoice, an open-source speech-to-text model under MIT license that combines speech recognition with speaker diarization capabilities. The model runs efficiently on consumer hardware, with a 4-bit quantized version consuming 5.71GB and transcribing one hour of audio in approximately 9 minutes on an M5 MacBook.

┌─ items (2) ───────────────────────────┐

[HN]hacker news1

VibeVoice: Open-source frontier voice AI

HN: open source AI · tosh · ▲458 · 1d

[BSKY]bluesky1

Microsoft's MIT licensed VibeVoice speech-to-text model (think Whisper with speaker diarization) is really good - my notes on running the 5.71GB 4bit MLX conversion on an M5 MacBook, using about 60GB of RAM at peak and transcribing 1hr of …

@simonw · @simonwillison.net · ▲144 · 1d