Show HN: Whispering – Open-source, local-first dictation you can trust
170 points
4 hours ago
| 20 comments
| github.com
| HN
Hey HN! Braden here, creator of Whispering, an open-source speech-to-text app.

I really like dictation. For years, I relied on transcription tools that were almost good, but they were all closed-source. Even a lot of them that claimed to be “local” or “on-device” were still black boxes that left me wondering where my audio really went.

So I built Whispering. It’s open-source, local-first, and most importantly, transparent with your data. All your data is stored locally on your device. For me, the features were good enough that I left my paid tools behind (I used Superwhisper and Wispr Flow before).

Productivity apps should be open-source and transparent with your data, but they also need to match the UX of paid, closed-software alternatives. I hope Whispering is near that point. I use it for several hours a day, from coding to thinking out loud while carrying pizza boxes back from the office.

Here’s an overview: https://www.youtube.com/watch?v=1jYgBMrfVZs, and here’s how I personally am using it with Claude Code these days: https://www.youtube.com/watch?v=tpix588SeiQ.

There are plenty of transcription apps out there, but I hope Whispering adds some extra competition from the OSS ecosystem (one of my other OSS favorites is Handy https://github.com/cjpais/Handy). Whispering has a few tricks up its sleeve, like a voice-activated mode for hands-free operation (no button holding), and customizable AI transformations with any prompt/model.

Whispering used to be in my personal GH repo, but I recently moved it as part of a larger project called Epicenter (https://github.com/epicenter-so/epicenter), which I should explain a bit...

I’m basically obsessed with local-first open-source software. I think there should be an open-source, local-first version of every app, and I would like them all to work together. The idea of Epicenter is to store your data in a folder of plaintext and SQLite, and build a suite of interoperable, local-first tools on top of this shared memory. Everything is totally transparent, so you can trust it.

Whispering is the first app in this effort. It’s not there yet regarding memory, but it’s getting there. I’ll probably write more about the bigger picture soon, but mainly I just want to make software and let it speak for itself (no pun intended in this case!), so this is my Show HN for now.

I just finished college and was about to move back with my parents and work on this instead of getting a job…and then I somehow got into YC. So my current plan is to cover my living expenses and use the YC funding to support maintainers, our dependencies, and people working on their own open-source local-first projects. More on that soon.

Would love your feedback, ideas, and roasts. If you would like to support the project, star it on GitHub here (https://github.com/epicenter-so/epicenter) and join the Discord here (https://go.epicenter.so/discord). Everything’s MIT licensed, so fork it, break it, ship your own version, copy whatever you want!

chrisweekly
1 hour ago
[-]
> "I think there should be an open-source, local-first version of every app, and I would like them all to work together. The idea of Epicenter is to store your data in a folder of plaintext and SQLite, and build a suite of interoperable, local-first tools on top of this shared memory. Everything is totally transparent, so you can trust it."

Yes! This. I have almost no experience w/ tts, but if/when I explore the space, I'll start w/ Whispering -- because of Epicenter. Starred the repo, and will give some thought to other apps that might make sense to contribute there. Bravo, thanks for publishing these and sharing, and congrats on getting into YC! :)

reply
spullara
11 minutes ago
[-]
IF you do want to then ALSO have a cloud version, you can just use the AgentDB API and upload them there and just change where the SQL runs.
reply
wkcheng
2 hours ago
[-]
Does this support using the Parakeet model locally? I'm a MacWhisper user and I find that Parakeet is way better and faster than Whisper for on-device transcription. I've been using push-to-transcribe with MacWhisper through Parakeet for a while now and it's quite magical.
reply
braden-w
3 hours ago
[-]
For those checking out the repo this morning, I'm in the middle of a release that adds Whisper C++ support!

https://github.com/epicenter-so/epicenter/pull/655

After this pushes, we'll have far more extensive local transcription support. Just fixing a few more small things :)

reply
marcodiego
1 hour ago
[-]
> I’m basically obsessed with local-first open-source software.

We all should be.

reply
ayushrodrigues
38 minutes ago
[-]
I've been interested in a tool like this for a while. I currently have tried whisprflow and aqua voice but wanted to use my API key and store more context locally. How does all the data get stored and how can I access it?
reply
michael-sumner
1 hour ago
[-]
How does this compare to VoiceInk which is also open-source and been there much longer and supports all the features that you have? https://github.com/Beingpax/VoiceInk
reply
phainopepla2
47 minutes ago
[-]
One thing that immediately stands out is VoiceInk is macOS only, while Whispering supports Linux and Windows in addition to macOS
reply
oulipo
11 minutes ago
[-]
I really like VoiceInk!

For the Whispering dev: would it be possible to set "right shift" as a toggle? also do it like VoiceInk which is:

- either short right shift press -> then it starts, and short right shift press again to stop - or "long right shift press" (eg when at pressed at least for 0.5s) -> then it starts and just waits for you to release right shift to stop

it's quite convenient

another really cool stuff would be to have the same "mini-recorder" which pops-up on screen like VoiceInk when you record, and once you're done it would display the current transcript, and any of your "transformation" actions, and let you choose which one (or multiple) you want to apply, each time pasting the result in the clipboard

reply
satvikpendem
44 minutes ago
[-]
All these all just Whisper wrappers? I don't get it, the underlying model still isn't as good as paid custom models from companies, is there an actual open source / weights alternative to Whisper for speech to text? I know only of Parakeet.
reply
oulipo
14 minutes ago
[-]
Really nice!

For OsX there is also the great VoiceInk which is similar and open-source https://github.com/Beingpax/VoiceInk/

reply
jiehong
6 minutes ago
[-]
Very similar and works well. It’s a bring your own API key if you want/need. Also with local whisper.
reply
dumbmrblah
2 hours ago
[-]
I’ve been using whispering for about a year now, it has really changed how I interact with the computer. I make sure to buy mice or keyboards that have programmable hotkeys so that I can use the shortcuts for whispering. I can’t go back to regular typing at this point, just feels super inefficient. Thanks again for all your hard work!
reply
mrs6969
2 hours ago
[-]
am I not getting it correctly; it says local is possible but can't find any information about how to run it without any api key?

I get the whispers models, and do what? how to run in a device without internet, no documentation about it...

reply
braden-w
1 hour ago
[-]
Commented this earlier, but I'm in the middle of a release that adds Whisper C++ support! https://github.com/epicenter-so/epicenter/pull/655

After this pushes, we'll have far more extensive local transcription support. Just fixing a few more small things :)

reply
rpdillon
1 hour ago
[-]
The docs are pretty clear that you need to use speaches if you want entirely local operation.

https://speaches.ai/

reply
yunohn
7 minutes ago
[-]
It’s not very clear, rather just a small mention. Given OP’s extensive diatribe about local-first, the fact that it prefers online providers is quite a big miss tbh.
reply
tummler
1 hour ago
[-]
Related, just as a heads up. I've been using this for 100% local offline transcription for a while, works well: https://github.com/pluja/whishper
reply
dllthomas
18 minutes ago
[-]
Can it tell voices apart?
reply
glial
2 hours ago
[-]
This is wonderful, thank you for sharing!

Do you have any sense of whether this type of model would work with children's speech? There are plenty of educational applications that would value a privacy-first locally deployed model. But, my understanding is that Whisper performs pretty poorly with younger speakers.

reply
solarkraft
3 hours ago
[-]
Cool! I just started becoming interested in local transcription myself.

If you add Deepgram listen API compatibility, you can do live transcription via either Deepgram (duh) or OWhisper: https://news.ycombinator.com/item?id=44901853

(I haven’t gotten the Deepgram JS SDK working with it yet, currently awaiting a response by the maintainers)

reply
braden-w
3 hours ago
[-]
Thank you for checking it out! Coincidentally, it's on the way:

https://github.com/epicenter-so/epicenter/pull/661

In the middle of a huge release that sets up FFMPEG integration (OWhisper needs very specifically formatted files), but hoping to add this after!

reply
Johnny_Bonk
3 hours ago
[-]
Great work! I've been using Willow Voice but I think I will migrate to this (much cheaper) but they do have a great UI or UX just by hitting a key to start recording and the context goes into whatever text input you want. I haven't installed whispering yet but will do so. P.S
reply
braden-w
2 hours ago
[-]
Amazing, thanks for giving it a try! Let me know how it goes and feel free to message me any time :) happy to add any features that you miss from closed-source altneratives!
reply
newman314
2 hours ago
[-]
Does Whispering support semantic correction? I was unable to find confirmation while doing a quick search.
reply
braden-w
2 hours ago
[-]
Hmm, we support prompts at both 1. the model level (the Whisper supports a "prompt" parameter that sometimes works) and 2. transformations level (inject the transcribed text into a prompt and get the output from an LLM model of your choice). Unsure how else semantic correction can be implemented, but always open expand the feature set greatly over the next few weeks!
reply
joshred
2 hours ago
[-]
They might not now how whisper works. I suspect that the answer to their question is 'yes' and the reason they can't find a straightforward answer through your project is that the answer is so obvious to you that it's hardly worth documenting.

Whisper for transcription tries to transform audio data into LLM output. The transcripts generally have proper casing, punctuation and can usually stick to a specific domain based on the surrounding context.

reply
ideashower
1 hour ago
[-]
Is there speaker detection?
reply
random3
1 hour ago
[-]
are there any non-Whisper-based voice models/tech/APIs?
reply
satisfice
2 hours ago
[-]
Windows Defender says it is infected.
reply
braden-w
38 minutes ago
[-]
Ahh that's unfortunate. This most likely is related to the rust `enigo` create, which we use to write text to the cursor. You can see the lines in question here: https://github.com/epicenter-so/epicenter/blob/60f172d193d88...

If it's still an issue, feel free to build it locally on your machine to ensure your supply chain is clean! I'll add more instructions in the README in the future.

reply
barryfandango
2 hours ago
[-]
I'm no expert, but since it acts as a keyboard wedge it's likely to be unpopular with security software.
reply
sa-code
2 hours ago
[-]
This needs to be higher, the installer on the README has a trojan.
reply
fencepost
26 minutes ago
[-]
More details please? Which installer?

---7.3.0--- This release popped up just a few minutes ago, so VirusTotal results for the 7.3.0 EXE and MSI installers

EXE (still running behavior checks but Arctic Wolf says Unsafe and AVG & Avast say PUP): https://www.virustotal.com/gui/file/816b21b7435295d0ac86f6a8...

MSI nothing flags immediately, still running behavior checks (https://www.virustotal.com/gui/file/e022a018c4ac6f27696c145e...)

---7.2.2/7.2.1 below--- I do note one bit of weirdness, the Windows downloads show 7.2.2 but the download links themselves are 7.2.1. 7.2.1 is also what shows on the release from 3 days ago even though it's numbered 7.2.2.

I didn't check the Mac or Linux installers, but for Windows VirusTotal flags nothing on the 7.2.1/7.2.2 MSI (https://www.virustotal.com/gui/file/7a2d4fec05d1b24b7deda202...) and 3 flags on the EXE (ArcticWolf Unsafe, AVG & Avast PUP) (https://www.virustotal.com/gui/file/a30388127ad48ca8a42f9831...)

reply
hexfish
35 minutes ago
[-]
What does Virustotal say?
reply
codybontecou
2 hours ago
[-]
Now we just need text to speech so we can truly interact with our computers hands free.
reply
PyWoody
59 minutes ago
[-]
If you're on Mac, you can use `say`, e.g.,

    say "This is a test message" --voice="Bubbles"
EDIT: I'm having way too much fun with this lol

    say "This is a test message" --voice="Organ"
    say "This is a test message" --voice="Good News"
    say "This is a test message" --voice="Bad News"
    say "This is a test message" --voice="Jester"
reply
braden-w
42 minutes ago
[-]
LOL that's pretty funny, thank you for the share!
reply