I have to say there's one exception for me and that's Whisper. I actually do use Whisper a lot. But I just don't use local LLMs. They're just really, really bad compared to cloud GPUs.
And I don't know why, because for me it seems that having a speech-to-text model is much more challenging to create than just a model that creates text.
But it seems that they really cannot remove the differences and have it run on consumer computers. And so I also go back to cloud LLMs, all privacy aside.
On the other hand, we need to talk specifics. Measure up, how and regarding which benchmark.