I built VibeWhisper because I was using Claude Code all day and the bottleneck wasn't thinking — it was typing. I tried an existing voice-to-text tool, loved the workflow, then saw the price: $20/month for a thin wrapper around OpenAI Whisper where 10 minutes of API usage costs $0.06. So I built my own in a weekend.
How it works: Hold a configurable global hotkey (default: Right Command), speak, release. Transcribed text appears in whatever text field has focus — VS Code, Terminal, Slack, browser, anything. No clipboard hijacking — it uses the macOS Accessibility API to inject text directly at the cursor position (same APIs screen readers use).
Two transcription engines:
Cloud: OpenAI Whisper API via your own key (~$0.006/min, fastest and most accurate)
Local: WhisperKit running on-device via CoreML/Apple Neural Engine. 10 models from Tiny (50 MB) to Large V3 Turbo (1.6 GB). Your voice never leaves your Mac. No API key needed, no internet needed. The app auto-recommends the best local model for your hardware.
Tech details for the curious:
Native Swift/SwiftUI, no Electron. Uses less RAM than a Chrome tab.
CGEvent taps for global keyboard hooks
Accessibility API for text injection
API key stored in macOS Keychain
Universal binary (arm64 + x86_64), macOS 14+
No backend, no accounts, no telemetry
Pricing: $19 one-time. No subscription. 15-minute free trial (actual recording time, not calendar time), no credit card, no signup. If you use cloud mode, you pay OpenAI directly at their rate. If you use local mode, it costs nothing after purchase.
I use it myself 2-3 hours a day. My main use case is dictating long prompts to AI coding assistants — I can speak a 400-word prompt in 45 seconds instead of typing it in 3-4 minutes. I also use it as a thinking tool: talking through problems out loud and capturing the output.
Website: https://vibewhisper.dev
Happy to answer any technical questions.
Voice to text with a push to talk workflow makes a lot of sense for developers who don't want constant background listening.
Curious how the local mode performs compared to the cloud version in terms of latency and accuracy