Disfluencies aren’t necessarily bad even if the word starts with “dis”!
I also don't care for writing that could have been made a lot more concise. It's a lot of work to make things shorter, but I think it's worthwhile.
To me they just indicate lack of confidence on the part of the speaker.
it's... exact opposite?
the main (attempted) use for ummms is to keep continuation of speech despite the pause. And the main complaint is exactly that it ruins the focus and doesn't give respite
If you speak with disfluencies, you probably didn't sufficiently rehearse your speech. If you didn't rehearse enough, you probably didn't put much effort into writing it either, so why should I put much effort into listening? It's the same principle as AI slop.
Many people can speak off the cuff fluently and confidently, avoiding "like", "um", and other filler words. And even if you're not speaking fluently, leaving silences as punctuation is more effective, IMO.
Many impressive speakers I've met actually cite Toastmasters! So their obsessive zeal actually does work.
More rehearsal does work too sometimes, but it does sometimes lead to speeches "sounding too rehearsed".
> It leaves um, uh, er and elongated versions (ummmm, uhhhhh) alone. Those sound like fillers but they’re doing real work in the sentence, and cutting them automatically would change what someone said. The rule erm follows: only remove things that are sound, not language.
> It also doesn’t touch repeated words, false starts, or long thinking pauses. Those aren’t noise on top of the speech; they are the speech, just messier than the speaker would like. Cleaning them up is an editorial decision about which take to keep, and erm doesn’t have an opinion about that.
Think about it. Cleaning these things-that-can-be-just-sounds-but-can-also-very-much-be-load-bearing up is an editorial decision. At the very least, you need to judge based on the surrounding content whether the removal of an um would change the meaning at all; and I don’t think text alone is adequate for that.
Something's already gone wrong here. Uh and er refer to the same sound. Uh is the American spelling. Er is British; to them a following "r" like that is just a kind of vowel.
(Also, in case it wasn’t clear: I was quoting from the start of the article in that sentence.)
But not in any other sense.
> in case it wasn’t clear: I was quoting from the start of the article in that sentence.
You don't seem to be quoting from the article at all, actually. You've combined two different sentences in a way that grossly misrepresents what the article says. But that's not really relevant to the point here.
While it's a commercial product with a subscription, I spent a long time on the free tier not even hitting their limits until I started using it so extensively that I wanted to pay for it.
And I've used Whisper in the past, mostly for tinkering. I tried it for a couple of use cases but haven't touched the base project in a while. But I do regularly use Faster-Whisper-XXL, an open source project based on Whisper, for subtitle generation.
Though, for subtitle generation, I decided to support the project and mainly use the non-public build of Faster-Whisper-XXL Pro built for donators to the open source project.
The extra features smooth out the subtitle editing process very substantially. Toss in "--roformer_overlap 0.125 --roformer_vram 16 --best_of 15 --ff_vocal_extract mb-roformer --vad_method pyannote_v3" to the cli parameters (and sometimes --realign) and you have much less work to do in SubtitleEdit or Tero Subtitler afterwards to clean it up.
Ideally it would slice the video in the timeline without actually removing anything, so you can scrub through your video and try with and without each disfluency (thank you - awesome word) & decide case by case which to keep!
A trivial example is "umm... well... (sigh) okay" versus just "okay". Not okay!