How does the accuracy of yours compare to Whisper-based models?
However, Wispr Flow does post-processing, so its output might be more useful, as it removes fluff from your speech.
I think it's possible to implement fast, local post-processing using the Gemma models. So I will give it a shot. If it works, then the output will be as good as the best paid options available.
Needless to say, if you speak very precisely, then my project is all that you need. It's almost 100% accuracy, I haven't seen a mistake yet (crazy, I know).
https://www.caard.net/profile/propheciple/ce562ce7-a75d-4a0e...
It doesn't have as good of a UX as this, but it should help unless you find a better option.