Ask HN: Why is uptalk intonation so prevalent in ChatGPT voices?
48 points
8 days ago
| 18 comments
| HN
I’ve tried asking it to set voice with an even tone and less of the annoying uptalk but lately it just continues in this way. It hurts to listen to.
slillibri
8 days ago
[-]
They are still working on a realistic vocal fry?
reply
mikrl
8 days ago
[-]
When can I get the GPT that sounds like Boomhauer crossed with Gerald from Clarksons farm?
reply
throwaway889900
8 days ago
[-]
Just gotta pipe it through a granular synth afterwards
reply
AStonesThrow
8 days ago
[-]
The new office mate’s prank will be to switch your AI’s voice to a Kardashian, Fran Drescher, Pauly Shore...

Plus I am sure that LLM engines could command a premium by shrewdly licensing such talent as James Earl Jones, Majel Barrett, or Milla Jovovich?

But it seems like the current trend is novel/generic voices in order to avoid suits/fees and pioneer new territory. Isn‘t Siri‘s personality a recognizable celebrity by now?

reply
seydor
7 days ago
[-]
they can use Sam's voice
reply
ctrlp
8 days ago
[-]
Why is it so prevalent in people generally?
reply
rglover
8 days ago
[-]
Because they think it makes them sound Smart? Because it's safer to fit in with the crowd than to Not? Because having a genuine personality is Difficult?
reply
ctrlp
8 days ago
[-]
I would assume the opposite. People use it to sound dumb, not smart, so as to sound non-threatening, but also so as to sound non-assertive. It's used to defuse potential conflict or perceived disagreeableness. Generally the affect of a late-stage conflict-adverse institutionalism that punishes assertive or dominant behaviors. Uptalk is the "I'm showing my belly" of English affects.
reply
nativeit
8 days ago
[-]
More damning reasons for why we should simply go back to literally showing our bellies.
reply
_DeadFred_
8 days ago
[-]
The point of language is to have shared communication. That people adopt language standards isn't a moral failing, and accent is a legitimate part of it as it's more organic/instinctual to human interaction than grammar/diction classes.
reply
rglover
8 days ago
[-]
True, but this particular style was born out of a region where people are largely known to be smug and arrogant about their intelligence/"betterness" than other people. Not sure I'd call it a moral failing as much as a marker of who to avoid (much like a dangerous plant having bright colors or scents).
reply
ctrlp
8 days ago
[-]
It's not an accent, it's an affect.
reply
alabastervlog
8 days ago
[-]
As far as I can tell, it's an "I am not done talking yet" thing in many contexts. That's why people employing it usually drop the uptalk for their last sentence in a string of sentences (unless it's an actual question).

I reckon it took off because of telephones.

reply
AStonesThrow
8 days ago
[-]
I freakin' hate telephones, and I always have, and the Dowager Countess of Grantham was correct in surmising that they are torture devices of the highest caliber.

I always try to prefer virtual meetings with video and all the trimmings, especially screen-share capability. If not, in-person. Telephoning is like a last resort. I've pondered using TTY/TDD even though I have no physical impairments, but most agencies seem hostile to that sort of barrier and most of the time, I find myself needing to pass voice-recognizing gatekeeper AIs first.

When I was promoted to "sysadmin" in my first job they set me at a desk with a computer and a PBX telephone. (It was an office at a gov't defense contractor...) The computer I could relate to on an intimate level; the telephone scared the living bejesus out of me, and I prayed it never to ring.

A disembodied voice in your ear, especially one that interrupts everything you're doing and demanding a real-time conversation across space and time, that's incredibly rude, yet normalized for 100 years. I have PTSD and other disabilities, and I can get completely rattled and downright hostile/aggro when someone calls me and I'm not prepared, I'm not at my desk, no notes in front of me, even if I'm relaxing at home I just can't deal. I may answer the phone while on a bus or train, but I certainly won't sustain a conversation that way. Outgoing calls are sometimes OK but often end in disaster as I become increasingly frustrated.

Now it's not uncommon for someone who's standing at my front door to telephone me. What the hell, I'm less than 3 yards away in here, please don't send me across Cyberspace when you're manifesting in meatspace...

On a phone call I have no idea what office I've reached, what is a person's job title or role with a company, what their face looks like, when they may be taking a breath or using other physical cues to aid the conversation. So much metadata is lost in phone conversations that they are wholly dehumanized and require a revolution in etiquette, just to keep the peace.

Here's your uptalk origin: https://music.youtube.com/watch?v=Qb21lsCQ3EM&si=3j3bYd_0rPa... "Valley Girl, she's a Valley Girl!"

I'm not autistic and I can read expressions and body language, but perhaps autists are the evolutionary result of disembodied telephone conversations taking out the human element from verbal communication. It is simply amazing that even with apps and ubiquitous Internet and website portals that I still require the telephone to get important business done (and I'm always, always ticking "it's something else/other/Live Agent/Oh god this is an exception to all your rules please send me a human being").

reply
muzani
8 days ago
[-]
It turns a sentence into a suggestion rather than a command.
reply
chc4
8 days ago
[-]
nothing much, what's uptalk with you?
reply
nativeit
8 days ago
[-]
I'm having a rough couple of months, I'll be honest. [that should be read in the least sincere "my pleasure to serve you at the window" voice you can muster]
reply
swyx
8 days ago
[-]
Made in California. next question
reply
hbarka
8 days ago
[-]
There’s no shortage of YouTube videos aware of the uptalk or upspeak annoyances but I found this one from 1994. It seems to have spread from California Valleygirl-speak and then nationwide to college campuses. How does LLM training get influenced (weighted) by pop culture speaking styles?

https://youtu.be/z756L_CkakU

reply
swyx
8 days ago
[-]
pretrain data + rlhf
reply
ryandrake
8 days ago
[-]
I wish it were only a California thing. The Valley Girl uptalk/vocal fry thing seems to have spread across the country. Turn on the local news station in any region of the country and you'll hear it. Everyone is for some reason trying to sound like the Real Housewives Of Orange County.
reply
alabastervlog
8 days ago
[-]
NPR's even been full of it for more than a decade now. I think at some point (the '00s?) they really relaxed their elocution standards for hosts & reporters.

It definitely makes reporting feel trustworthy and serious? When almost every statement sounds like a tentative question?

reply
TRiG_Ireland
8 days ago
[-]
"Vocal fry" is not really a thing, as David Peterson explains. https://www.youtube.com/watch?v=qIJyEc07w2Q
reply
PaulHoule
8 days ago
[-]
I like vocal fry as a vocal adjustment because (1) you can do it with no talent, and (2) since it is atonal you can use it and have no risk of your vocal tone slipping upward because you're under stress.
reply
saltcured
8 days ago
[-]
More like trained by a certain generation and socioeconomic strata...

I've gotten old enough to now wonder if my dialect sounds like something from another world and era to younger folks in my region.

The way I felt about most of the Hollywood actors I heard from before technicolor was the norm.

reply
orblivion
8 days ago
[-]
In San Francisco I had a coworker originally from Italy who used upspeak while speaking in an Italian accent.
reply
hbarka
8 days ago
[-]
How did they manage that? The Italian accent is beautifully affirmative and confidently downspeak when making a statement.
reply
carabiner
8 days ago
[-]
Made in California? Next question?
reply
barbazoo
8 days ago
[-]
Made in California! Next question!
reply
nottorp
8 days ago
[-]
Oh cmon. Californians are too expensive for training AI. Maybe it's from some malaysian accent?
reply
wongarsu
8 days ago
[-]
Unless you train on twitch streamers. There seems to be an unspoken rule that any successful streamer has to move to LA. If you train on youtubers you get a surprising Mormon bias instead.

I would be surprised if the majority of the training data is licensed from the speakers.

reply
muzani
8 days ago
[-]
Malaysian here. Uptalk is used more to turn a sentence into a suggestion. Something like "Hey, there's one dim sum left," to suggest that I'm taking this but you can challenge it. I could see why ChatGPT would adopt it. It's trying to be polite.

Often it's in a tonal particle, "One dim sum left meh." But it's possible in trying to artificially combine tone and text, the uptalk is moved up.

But the tell tones of a Malaysian accent is it's clipped. Instead of "I don't like that idea," it becomes "Don't like it." ChatGPT may be written American, so as an accent, it would sound closer to, "I- don't like, that idea."

And sentences often end in an elongated manner, "I wrote that is essay you wanted~". The elongated ends are quite common in many SEA accents as well, especially Thai.

reply
nottorp
8 days ago
[-]
Oh sorry. Didn't mean to pick on malaysians specifically. I was just pointing out training would be outsourced to somewhere.

Like ChatGPT's written english is or was close to nigerian business english...

reply
muzani
8 days ago
[-]
No offense taken. It got me curious. I actually do train AI to code in Singaporean English in my spare time, so I try to be aware, lol.

I'm not aware of voice training though. x.AI outsources lots of stuff to Malaysia. Google has some, but has had this data like for TTS, STT, and Google Translate, for a decade or so.

reply
breckinloggins
8 days ago
[-]
This a highly problematic comment?

/s

reply
uoaei
8 days ago
[-]
What are you getting at? What's the joke?
reply
ViktorRay
8 days ago
[-]
I believe OpenAI wants ChaptGPT to have a tone that is more casual and less professional or uptight than it was before....

And so ChatGPT relies on the training data to know what that means so it leads to it talking like this as this is what the training data is filled with.

reply
gtirloni
3 days ago
[-]
I'm sure your intonation sounds annoying to some other generation too.
reply
ergonaught
8 days ago
[-]
Just telling it "Avoid upward inflection" or "Use a flat tone" prevents the lilt for me. Perhaps it doesn't "stick"? Perhaps it varies by voice.
reply
hbarka
8 days ago
[-]
It partially works but it doesn’t “stick”, as you say. I’ve tried setting it on Preferences but it isn’t consistent.
reply
luluthefirst
8 days ago
[-]
It generates more engagement than a monotonous tone.
reply
psygn89
8 days ago
[-]
If only there was something in between the two.
reply
muzani
8 days ago
[-]
AI accents are incredibly good these days, especially eleven labs. ChatGPT is not a leader in this. I spent about $20 on this before just because I like the sound of its voice.
reply
treetalker
8 days ago
[-]
No doubt the next feature will be starting every paragraph, if not every sentence, with "tsk!" (that wet clicking sound many people do with their tongue and teeth as they inhale before speaking). Pure anecdata, but I notice women doing it much more than men, and it has made its way into newscasts.
reply
rawgabbit
8 days ago
[-]
I personalized mine to "Cove" and instructed it to speak with a Scottish accent, slow and warm. I am hoping to refine it so it will more closely sound like Sean Connery, but this is the best I could do.
reply
qoez
8 days ago
[-]
Jarvis, less vocal fry please
reply
parisisles
8 days ago
[-]
I'm not sure why, but the voices in voice mode are different than the voices used to dictate responses provided in "normal" mode. The latter are far better / less annoying.
reply
hn_user82179
8 days ago
[-]
I haven't noticed anything "off" with the ChatGPT voices. But I know I personally do speak with uptalk intonation as well
reply
bradgranath
7 days ago
[-]
Because they trained them on Vloggers and TikTokkers and Podcasters without asking any of them permission.
reply
carabiner
8 days ago
[-]
I don't see the problem with uptalk?
reply
cosinetau
8 days ago
[-]
If we can't create an artificial out group, who are we gonna dunk on?
reply
mhurron
8 days ago
[-]
Some people have real problems with change.
reply
4b11b4
8 days ago
[-]
yeah, like, i dunno, the thing where, you know what I mean?
reply
SoftTalker
8 days ago
[-]
Prompt it to talk like an NPR announcer.
reply
floren
8 days ago
[-]
Lip smacks and audible intakes of breath?
reply
adrianmonk
8 days ago
[-]
I believe that's the Neumann U87 microphones that are standard at NPR. You can get a similar sound if you don't mind spending $3600 on a microphone.

https://current.org/2015/06/a-top-audio-engineer-explains-np...

reply
JTbane
8 days ago
[-]
I listen to NPR during my commute and these tics have a calming effect

Is this ASMR?

reply
paulcole
7 days ago
[-]
> It hurts to listen to.

Does it really? Isn’t this the kind of thing it would be good to practice tuning out rather than complaining about?

reply
mvdtnz
8 days ago
[-]
Americans. Ever listen to an American podcast?
reply
alabastervlog
8 days ago
[-]
Do they also add "right?" all the time for no reason? Need that for an accurate rendering.
reply
netsharc
8 days ago
[-]
"I'm going to go ahead and...". "Why don't you go ahead and...".

People who use this moronic phrase should go ahead and jump off a tall building...

reply
mvdtnz
8 days ago
[-]
And start the answer to every question with "So..." or "Yeah, so...".
reply
rectang
8 days ago
[-]
Or if you're a politician, "Listen...", or "Look...".
reply
cstrahan
7 days ago
[-]
George Carlin once gave a masterclass on how to talk like politician at the National Press Club: https://www.youtube.com/watch?v=jG9rkiMe2nA
reply