Show HN: Free Alternative to Wispr Flow, Superwhisper, and Monologue
130 points
7 hours ago
| 27 comments
| github.com
| HN
anvevoice
28 minutes ago
[-]
The latency bottleneck with local post-processing is interesting. Sending a screenshot to a vision LLM is expensive, but you don't actually need full image understanding for most context-aware corrections.

A lighter approach: use the accessibility API (NSAccessibility on macOS) to grab the focused app's text content — window title, selected text, nearby field labels, recipient names in mail composers. That gives you ~90% of the useful context as a small text prompt that a 1-3B parameter local model (like Qwen2.5-1.5B or Phi-3-mini) can process in under 500ms on Apple Silicon's Neural Engine.

The screenshot path is only really needed for non-standard UIs where text isn't programmatically accessible. Splitting the pipeline into a fast text-context path (common case) and a fallback vision path would get you sub-2s end-to-end locally, while still handling edge cases gracefully.

This is essentially the same pattern used in assistive technology — screen readers have solved the "what's on screen" problem without vision models for decades.

reply
digitalbase
6 hours ago
[-]
Was searching for this this morning and settled on https://handy.computer/
reply
d4rkp4ttern
2 hours ago
[-]
Big fan of handy and it’s cross platform as well. Parakeet V3 gives the best experience with very fast and accurate-enough transcriptions when talking to AIs that can read between the lines. It does have stuttering issues though. My primary use of these is when talking to coding agents.

But a few weeks ago someone on HN pointed me to Hex, which also supports Parakeet-V3 , and incredibly enough, is even faster than Handy because it’s a native MacOS-only app that leverages CoreML/Neural Engine for extremely quick transcriptions. Long ramblings transcribed in under a second!

It’s now my favorite fully local STT for MacOS:

https://github.com/kitlangton/Hex

reply
zachlatta
6 hours ago
[-]
I just learned about Handy in this thread and it looks great!

I think the biggest difference between FreeFlow and Handy is that FreeFlow implements what Monologue calls "deep context", where it post-processes the raw transcription with context from your currently open window.

This fixes misspelled names if you're replying to an email / makes sure technical terms are spelled right / etc.

The original hope for FreeFlow was for it to use all local models like Handy does, but with the post-processing step the pipeline took 5-10 seconds instead of <1 second with Groq.

reply
sipjca
2 hours ago
[-]
There's an open PR in the repo which will be merged which adds this support. Post processing is an optional feature if you want to use it, and when using it, end to end latency can still be under 3 seconds easily
reply
zachlatta
1 hour ago
[-]
That’s awesome! The specific thing that was causing the long latency was the image LLM call to describe the current context. I’m not sure if you’ve tested Handy’s post-processing with images or if there’s a technique to get image calls to be faster locally.

Thank you for making Handy! It looks amazing and I wish I found it before making FreeFlow.

reply
lemming
4 hours ago
[-]
Could you go into a little more detail about the deep context - what does it grab, and which model is used to process it? Are you also using a groq model for the transcription?
reply
zachlatta
2 hours ago
[-]
It takes a screenshot of the current window and sends it to Llama in Groq asking it to describe what you’re doing and pull out any key info like names with spelling.

You can go to Settings > Run Logs in FreeFlow to see the full pipeline ran on each request with the exact prompt and LLM response to see exactly what is sent / returned.

reply
stavros
6 hours ago
[-]
As a very happy Handy user, it doesn't do that indeed. It will be interesting to see if it works better, I'll give FreeFlow a shot, thanks!
reply
gurjeet
1 hour ago
[-]
Thanks for the recommendation! I picked the smallest model (Moonshine Base @ 58MB), and it works great for transcribing English.

Surprisingly, it produced a better output (at least I liked its version) than the recommended but heavy model (Parakeet V3 @ 478 MB).

reply
smcleod
45 minutes ago
[-]
Handy is nothing short of fantastic, really brilliant when combined with Parakeet v2!
reply
vogtb
5 hours ago
[-]
Handy rocks. I recently had minor surgery on my shoulder that required me to be in a sling for about a month, and I thought I'd give Handy a try for dictating notes and so on. It works phenomenally well for most text-to-speech use cases - homonyms included.
reply
irrationalfab
4 hours ago
[-]
Handy is genuinely great and it supports Parakeet V3. It’s starting to change how I "type" on my computer.
reply
hendersoon
6 hours ago
[-]
Yes, I also use Handy. It supports local transcription via Nvidia Parakeet TDT2, which is extremely fast and accurate. I also use gemini 2.5 flash lite for post-processing via the free AI studio API (post-processing is optional and can also use a locally-hosted LM).
reply
stavros
6 hours ago
[-]
I use handy as well, and love it.
reply
p0w3n3d
6 hours ago
[-]
There's also an offline-running software called VoiceInk for macos. No need for groq or external AI.

https://github.com/Beingpax/VoiceInk

reply
parhamn
6 hours ago
[-]
+1, my experience improved quite a bit when I switched to the parakeet model, they should definitely use that as the default.
reply
zackify
6 hours ago
[-]
My favorite too. I use the parakeet model
reply
sathish316
3 hours ago
[-]
To build your own STT (speech-to-text) with a local model and and modify it, just ask Claude code to build it for you with this workflow.

F12 -> sox for recording -> temp.wav -> faster-whisper -> pbcopy -> notify-send to know what’s happening

https://github.com/sathish316/soupawhisper

I found a Linux version with a similar workflow and forked it to build the Mac version. It look less than 15 mins to ask Claude to modify it as per my needs.

F12 Press → arecord (ALSA) → temp.wav → faster-whisper → xclip + xdotool

https://github.com/ksred/soupawhisper

Thanks to faster-whisper and local models using quantization, I use it in all places where I was previously using Superwhisper in Docs, Terminal etc.

reply
vittore
59 minutes ago
[-]
For macos i found https://github.com/rselbach/jabber and was lately use that, but the iOS where I still need replacement.
reply
drooby
4 hours ago
[-]
I just vibe coded a my own NaturalReader replacement. The subscription was $110/year... and I just canceled it.

Chatterbox TTS (from Resemble AI) does the voice generation, WhisperX gives word-level timestamps so you can click any word to jump, and FastAPI ties it all together with SSE streaming so audio starts playing before the whole thing is done generating.

There's a ~5s buffer up front while the first chunk generates, but after that each chunk streams in faster than realtime. So playback rarely stalls.

It took about 4 hours today... wild.

reply
kombinar
7 hours ago
[-]
Sounds like there's plenty of interest in those kind of tools. I'm not a huge fun API transcriptions given great local models.

I build https://github.com/bwarzecha/Axii to keep EVERYTHING locally and be fully open source - can be easily used at any company. No data send anywhere.

reply
vesterde
6 hours ago
[-]
Since many are asking about apps with simillar capabilities I’m very happy with MacWhisper. Has Parakeet, near instant transcription of my lengthy monologues. All local.

Edit: Ah but Parakeet I think isn’t available for free. But very worthwhile single purchase app nonetheless!

reply
muratsu
5 hours ago
[-]
For those using something like this daily, what key combinations do you use to record and cancel. I’m using my capslock right now but was curious about others
reply
adanto6840
2 hours ago
[-]
Great question. I'd love to know if anyone has had any success with handheld buttons/bluetooth remotes or similar, too.
reply
qingcharles
3 hours ago
[-]
Someone told me the other day I should use a foot pedal, and then I remembered I already had an Elgato one under my desk connected with my Stream Deck. I got it very cheap used on eBay. So, that's an option too.
reply
michaelbuckbee
3 hours ago
[-]
I have a Stream Deck and made a dedicated button for this. So I tap the button speak and then tap it again and it pastes into wherever my cursor was at.

And then I set the button right below that as the enter key so it feels mostly handsoff the keyboard.

reply
Doman
4 hours ago
[-]
Scroll Lock is really good key for that in my opinion. If your keyboard does not have it exposed then you can use some remapping program like https://github.com/jtroo/kanata
reply
Brajeshwar
2 hours ago
[-]
Can you please teach me how to use the CAPS LOCK key as a push-to-talk?
reply
atestu
3 hours ago
[-]
Right option. Push to talk
reply
tacotime
2 hours ago
[-]
I also use the right option key on Mac, never miss it.
reply
threekindwords
3 hours ago
[-]
i've used macwhisper (paid), superwhisper (paid), and handy (free) but now prefer hex (free):

https://github.com/kitlangton/Hex

for me it strikes the balance of good, fast, and cheap for everyday transcription. macwhisper is overkill, superwhisper too clever, and handy too buggy. hex fits just right for me (so far)

reply
shostack
1 hour ago
[-]
Tried to use it, installed, enabled permissions, downloaded the parakeet model for English and then it crashed every time I released the button after dictating. Completely unusable.
reply
knob
4 hours ago
[-]
This thread is a beautiful intro into our near future. Yet more and more custom coded software. Takes me back to the days of late 90s. Loving this!
reply
corlinp
5 hours ago
[-]
I created Voibe which takes a slightly different direction and uses gpt-4o-transcribe with a configurable custom prompt to achieve maximum accuracy (much better than Whisper). Requires your own OpenAI API key.

https://github.com/corlinp/voibe

I do see the name has since been taken by a paid service... shame.

reply
rabf
4 hours ago
[-]
https://github.com/rabfulton/Auriscribe

My take for X11 Linux systems. Small and low dependency except for the model download.

reply
dcreater
2 hours ago
[-]
Why do people find the need to market as "free alternative to xyz" when its a basic utility? I take it as an instant signal that the dev is a copycat and mostly interested in getting stars and eyeballs rather than making a genuinely useful high quality product.

Just use handy: https://github.com/cjpais/Handy

reply
egonschiele
2 hours ago
[-]
Really good to know Handy exists; it's the first I'm hearing about it. I use a speech-to-text app that I built for myself, and I know at least one co-worker pays $10 a month for (I think) Wispr. I think it's possible there was no intention to market, and the creator simply didn't know about Handy, just like me.
reply
Fidelix
7 hours ago
[-]
MacOS only. May this help you skip a click.
reply
spelk
6 hours ago
[-]
Whispering [0] is Windows compatible and has gotten a lot better on Windows despite being extremely rough around the edges at first.

[0] https://github.com/EpicenterHQ/epicenter

reply
9999gold
4 hours ago
[-]
Not sure why you got downvoted. I wish this was a tag or something.
reply
sonu27
7 hours ago
[-]
Nice! I vibe coded the same this weekend but for OpenAI however less polished https://github.com/sonu27/voicebardictate
reply
manmal
6 hours ago
[-]
Also look into voxtral, their new model is good and half the price if you can live without streaming.
reply
johnbatch
4 hours ago
[-]
Do any of these works as an iOS keyboard to replace the awful voice transcription Apple is currently shipping?
reply
copperx
2 hours ago
[-]
utter (utter.to) does.
reply
lemming
6 hours ago
[-]
Is it possible to customise the key binding? Most of these services let you customise the binding, and also support toggle for push-to-talk mode.
reply
spelk
6 hours ago
[-]
Does anyone know of an effective alternative for Android?
reply
uncharted9
5 hours ago
[-]
I have been using VoiceFlow. It works incredibly well and uses Groq to transcribe using the Whisper V3 Turbo model. You can also use it in an offline scenario with an on-device model, but I am mostly connected to the internet whenever I am transcribing.
reply
jskherman
6 hours ago
[-]
Check out the FUTO keyboard or FUTO voice input apps. It only uses the whisper models though so far.
reply
xnx
5 hours ago
[-]
Does the Android keyboard transcription not work for your needs?
reply
arcologies1985
7 hours ago
[-]
Could you make it use Parakeet? That's an offline model that runs very quickly even without a GPU, so you could get much lower latency than using an API.
reply
zachlatta
7 hours ago
[-]
I love this idea, and originally planned to build it using local models, but to have post-processing (that's where you get correctly spelled names when replying to emails / etc), you need to have a local LLM too.

If you do that, the total pipeline takes too long for the UX to be good (5-10 seconds per transcription instead of <1s). I also had concerns around battery life.

Some day!

reply
s0l
7 hours ago
[-]
https://github.com/cjpais/Handy

It’s free and offline

reply
zachlatta
6 hours ago
[-]
Wow, Handy looks really great and super polished. Demo at https://handy.computer/
reply
SomaticPirate
5 hours ago
[-]
Seeing this thread, sounds a blog post comparing the offerings would be useful
reply
copperx
3 hours ago
[-]
Good idea at first glance, but it would get outdated in hours.
reply
baxtr
6 hours ago
[-]
Is there a tool that preserves the audio? I want both, the transcript and the audio.
reply
heyalexej
5 hours ago
[-]
Quick glance; FreeFlow already saves WAV recordings for every transcript to ~/Lib../App../FreeFlow/audio/ with UUIDs linking them to pipeline history entries in CoreData. Audio files are automatically deleted though, when their associated history entries are deleted. Shall be a quick fix. Recently did the same for hyprvoice, for debugging and auditing.
reply
hodanli
5 hours ago
[-]
title lacks: for Mac
reply
DevX101
5 hours ago
[-]
Anything similar for iOS?
reply
copperx
3 hours ago
[-]
Utter uses your OpenAI key (~$1/month). https://utter.to/. Has an iPhone app.
reply
Zopieux
3 hours ago
[-]
Saved you a click: Mac only and actually Grok; local inference too slow.

Won't be free when xAI starts charging.

reply
_blackhawk_
5 hours ago
[-]
Spokenly?
reply