I really like dictation. For years, I relied on transcription tools that were almost good, but they were all closed-source. Even a lot of them that claimed to be “local” or “on-device” were still black boxes that left me wondering where my audio really went.
So I built Whispering. It’s open-source, local-first, and most importantly, transparent with your data. All your data is stored locally on your device. For me, the features were good enough that I left my paid tools behind (I used Superwhisper and Wispr Flow before).
Productivity apps should be open-source and transparent with your data, but they also need to match the UX of paid, closed-software alternatives. I hope Whispering is near that point. I use it for several hours a day, from coding to thinking out loud while carrying pizza boxes back from the office.
Here’s an overview: https://www.youtube.com/watch?v=1jYgBMrfVZs, and here’s how I personally am using it with Claude Code these days: https://www.youtube.com/watch?v=tpix588SeiQ.
There are plenty of transcription apps out there, but I hope Whispering adds some extra competition from the OSS ecosystem (one of my other OSS favorites is Handy https://github.com/cjpais/Handy). Whispering has a few tricks up its sleeve, like a voice-activated mode for hands-free operation (no button holding), and customizable AI transformations with any prompt/model.
Whispering used to be in my personal GH repo, but I recently moved it as part of a larger project called Epicenter (https://github.com/epicenter-so/epicenter), which I should explain a bit...
I’m basically obsessed with local-first open-source software. I think there should be an open-source, local-first version of every app, and I would like them all to work together. The idea of Epicenter is to store your data in a folder of plaintext and SQLite, and build a suite of interoperable, local-first tools on top of this shared memory. Everything is totally transparent, so you can trust it.
Whispering is the first app in this effort. It’s not there yet regarding memory, but it’s getting there. I’ll probably write more about the bigger picture soon, but mainly I just want to make software and let it speak for itself (no pun intended in this case!), so this is my Show HN for now.
I just finished college and was about to move back with my parents and work on this instead of getting a job…and then I somehow got into YC. So my current plan is to cover my living expenses and use the YC funding to support maintainers, our dependencies, and people working on their own open-source local-first projects. More on that soon.
Would love your feedback, ideas, and roasts. If you would like to support the project, star it on GitHub here (https://github.com/epicenter-so/epicenter) and join the Discord here (https://go.epicenter.so/discord). Everything’s MIT licensed, so fork it, break it, ship your own version, copy whatever you want!
Yes! This. I have almost no experience w/ tts, but if/when I explore the space, I'll start w/ Whispering -- because of Epicenter. Starred the repo, and will give some thought to other apps that might make sense to contribute there. Bravo, thanks for publishing these and sharing, and congrats on getting into YC! :)
https://github.com/epicenter-so/epicenter/pull/655
After this pushes, we'll have far more extensive local transcription support. Just fixing a few more small things :)
We all should be.
For the Whispering dev: would it be possible to set "right shift" as a toggle? also do it like VoiceInk which is:
- either short right shift press -> then it starts, and short right shift press again to stop - or "long right shift press" (eg when at pressed at least for 0.5s) -> then it starts and just waits for you to release right shift to stop
it's quite convenient
another really cool stuff would be to have the same "mini-recorder" which pops-up on screen like VoiceInk when you record, and once you're done it would display the current transcript, and any of your "transformation" actions, and let you choose which one (or multiple) you want to apply, each time pasting the result in the clipboard
For OsX there is also the great VoiceInk which is similar and open-source https://github.com/Beingpax/VoiceInk/
I get the whispers models, and do what? how to run in a device without internet, no documentation about it...
After this pushes, we'll have far more extensive local transcription support. Just fixing a few more small things :)
Do you have any sense of whether this type of model would work with children's speech? There are plenty of educational applications that would value a privacy-first locally deployed model. But, my understanding is that Whisper performs pretty poorly with younger speakers.
If you add Deepgram listen API compatibility, you can do live transcription via either Deepgram (duh) or OWhisper: https://news.ycombinator.com/item?id=44901853
(I haven’t gotten the Deepgram JS SDK working with it yet, currently awaiting a response by the maintainers)
https://github.com/epicenter-so/epicenter/pull/661
In the middle of a huge release that sets up FFMPEG integration (OWhisper needs very specifically formatted files), but hoping to add this after!
Whisper for transcription tries to transform audio data into LLM output. The transcripts generally have proper casing, punctuation and can usually stick to a specific domain based on the surrounding context.
If it's still an issue, feel free to build it locally on your machine to ensure your supply chain is clean! I'll add more instructions in the README in the future.
---7.3.0--- This release popped up just a few minutes ago, so VirusTotal results for the 7.3.0 EXE and MSI installers
EXE (still running behavior checks but Arctic Wolf says Unsafe and AVG & Avast say PUP): https://www.virustotal.com/gui/file/816b21b7435295d0ac86f6a8...
MSI nothing flags immediately, still running behavior checks (https://www.virustotal.com/gui/file/e022a018c4ac6f27696c145e...)
---7.2.2/7.2.1 below--- I do note one bit of weirdness, the Windows downloads show 7.2.2 but the download links themselves are 7.2.1. 7.2.1 is also what shows on the release from 3 days ago even though it's numbered 7.2.2.
I didn't check the Mac or Linux installers, but for Windows VirusTotal flags nothing on the 7.2.1/7.2.2 MSI (https://www.virustotal.com/gui/file/7a2d4fec05d1b24b7deda202...) and 3 flags on the EXE (ArcticWolf Unsafe, AVG & Avast PUP) (https://www.virustotal.com/gui/file/a30388127ad48ca8a42f9831...)
say "This is a test message" --voice="Bubbles"
EDIT: I'm having way too much fun with this lol say "This is a test message" --voice="Organ"
say "This is a test message" --voice="Good News"
say "This is a test message" --voice="Bad News"
say "This is a test message" --voice="Jester"