Update turns Google Gemini into a prude, breaking apps for trauma survivors
69 points
4 months ago
| 19 comments
| theregister.com
| HN
hibikir
4 months ago
[-]
Gemini's filters weaken what would otherwise be a very strong model. Its censor is just so strong, can hit it when dealing with very cold topics. For instance, I had issues making it look at Spain's Second Republic, because it was too easy for electoral results from the era, and their disputes, to show up in the output... at which point, the model stops iterating and I am asked to go look at Wikipedia, because elections held in 1934 might be too controversial.

This pushes me to use other models that aren't necessarily better, but that at least don't clam up when I am not even trying to get anything other that general summaries of research.

reply
soraminazuki
4 months ago
[-]
I have to ask, a strong model for what exactly? Because historical elections can still very much be a politically relevant topic, and bots making up historical facts is detrimental to a healthy society. Whatever people say about AI, they're incapable of doing quality investigative reporting any time soon.

Scalable historical revisionism would only turbocharge societal conflict.

reply
hnhn34
4 months ago
[-]
> Whatever people say about AI, they're incapable of doing quality investigative reporting any time soon

the latest Gemini models have a very low hallucination rate in benchmarks [0]. Feel free to try it yourself. Just go to gemini.google.com, choose Deep Research, and ask it to write a report about a topic that you're intimately familiar with.

[0] - https://github.com/vectara/hallucination-leaderboard

reply
soraminazuki
4 months ago
[-]
"It hallucinates, but at a very low rate" is a far cry from quality investigative reporting. In fact, that's not compatible with any of those three words.

Does the new Gemini have feet to go around and investigate? Does it reach out or have contacts to sources with the right knowledge? Does it even have critical thinking skills? Or is it just a robot spitting out plausible sounding word salad?

These things are crucial, especially when it comes to topics like history where there are highly malicious actors with infinite resources and motivation trying to rewrite history.

"Who controls the past controls the future. Who controls the present controls the past." - 1984, George Orwell

reply
03data2
4 months ago
[-]
Can confirm. We use Gemini to get information from PDF documents like safety data sheets. When it encounters certain chemicals, it just stops. When you provide a JSON schema, it just responds with invalid JSON. I hope this changes.
reply
polygot
4 months ago
[-]
> provide a JSON schema

Is this when using structured outputs?

reply
03data2
4 months ago
[-]
Yes.
reply
BonoboIO
4 months ago
[-]
Dihydromonxid … oh noooo
reply
everdrive
4 months ago
[-]
The problem is that some central authority gets to decide what questions and answers are appropriate, and that you really have zero direct insight into this information. Do you really want to offload your thinking to these tools?
reply
gilmore606
4 months ago
[-]
If conforming socially is a terminal value for you, then this is more of a feature than a bug.
reply
evertedsphere
4 months ago
[-]
the evidence keeps pouring in for why it's a good idea to be running llms locally if you're going to put a business on top of one (or have enough money to have a contract where the provider can't just do this, but i don't know if that is a thing unless you're running on-prem)
reply
Netcob
4 months ago
[-]
I don't think LLMs are mature enough as a technology to be blindly used as a dependency, and they might never be.

The big question is, how do you train LLMs that are useful to both humans and services while not embarrassing the company that trained them?

LLMs are pretty good at translating - but if they don't like what they're reading, they simply won't tell you what it says. Which is pretty crazy.

LLMs are pretty good at extracting data and formatting the results as JSON - unless they find the data objectionable, then they'll basically complain to the deserializer. I have to admit that's a little bit funny.

Right now, if you want to build a service and expect any sort of predictability and stability, I think you have to go with some solution that lets you run open-weights models. Some have been de-censored by volunteers, and if you find one that works for you, you can ignore future "upgrades" until you find one that doesn't break anything.

And for that it's really important to write your own tests/benchmarks. Technically the same goes for the big closed LLM services too, but when all of them fail your tests, what will you do?

reply
the8472
4 months ago
[-]
I can get gore and porn results on google image search, I just have to select SafeSearch: Off. Can't they do the same for AIs?
reply
divbzero
4 months ago
[-]
Reliance of small businesses on big tech AI feels similar to reliance of small businesses on Google search: business risk present with each update.
reply
techjamie
4 months ago
[-]
I was using it to ballpark some easily verifiable nutritional information, and got the answers I was looking for in grams. Then, in the same conversation, asked "How many grams in a pound?"

It blocked the request for "safety." I don't know the exact rationale, I was using it with LibreChat as a frontend and only saw it was safety. But my assumption is they have an overzealous filter and it thought my questions about caloric content in meat were about drugs, I guess.

reply
empressplay
4 months ago
[-]
If you're hitting an endpoint labeled 'preview' or 'experimental' you can't reasonably expect it to exist in its current incarnation indefinitely. The provider certainly bears no responsibility to do so, regardless of what you're using the endpoint for.

I'm sure they can use a more stable endpoint for their application.

Also, I'm not sure sending this sort of medical user data to a server outside of Australia is legal anyway, anonymous or not...

reply
mrweasel
4 months ago
[-]
Yeah, this is poor dependency management. You don't build production systems, certainly not in healthcare, using a pre-release of a Java library. So why would you build on a constantly changing LLM?

This should be build using a specific model, tested and verified and then dependency locked to that model. If the model provider cannot give you a frozen model, then you don't use it.

Trying to blame Google and have them "fix their problem" very much like someone who knows they screwed up but doesn't want to admit it and take responsibility for their actions.

reply
03data2
4 months ago
[-]
That's true, though when it's the only tool that has been "good enough", you are kind of disappointed when it stops working.
reply
999900000999
4 months ago
[-]
This is why for anything sensitive you need a to run your llm locally.

Google just wants to limit their liability. If you disagree, run your our llm

reply
phren0logy
4 months ago
[-]
An AI app for trauma survivors sounds superficially laudable, but I really hope they are working with seasoned professionals to avoid making things worse. Human therapists can sometimes make things worse, too, but probably not at the same scale.
reply
davidcbc
4 months ago
[-]
> An AI app for trauma survivors sounds superficially laudable

It sounds like a dystopian horror to me

reply
pixl97
4 months ago
[-]
At least in the US trauma therapy is commonly unavailable to people due to expense. Moreso the common stigmas around therapy can make it difficult to start the conversation, either due to fear or cost. On top of that quite often free or cheap therapy here in the US is provided by religious groups with their own, unwittingly sinister, motivations. Adding religious trauma to sexual trauma doesn't help in my eyes.

The dystopian horror is already here for a lot of people.

reply
Netcob
4 months ago
[-]
Maybe some people would prefer that. Not everyone will happily open up to another human, especially when people are the reason for the trauma...
reply
lazide
4 months ago
[-]
Talking directly to the surveillance state is so much better?
reply
Netcob
4 months ago
[-]
True, local LLMs might be a better idea for this particular purpose.

Most self-hosted services are generally a better idea, if you know what you're doing.

reply
sroussey
4 months ago
[-]
I think being a victim of sexual violence and only talking into the ether with no response is much more dystopian horror to me.
reply
weatherlite
4 months ago
[-]
Why would it happen though? If A.I is not good enough or won't be good enough it would fail and the company providing the service would go under. If it is good enough, it will succeed. Or is your issue not with the A.I responses themselves but with the fact it is not conscious like a human is?

Personally I use Gemini sometimes to talk about my mood, didn't get a bad experience yet.

reply
voidspark
4 months ago
[-]
The unbiased objective analysis can be quite helpful.
reply
pants2
4 months ago
[-]
Nobody's pointing out that Google's "Preview" models aren't meant to be used for production use-cases because they may change them or shut them down at any time? That's exactly what they did (and have done in the past). This is a case of app developers not realizing what "Preview" means. If they had used a non-preview model from Google it wouldn't have broken.
reply
xethos
4 months ago
[-]
Because Google wants to have their cake and eat it too. They want to leave "products" in beta for years (gMail being the canonical example), they want to shut down products that don't hit massive adoption very shortly out of the gate, and they want to tell users that they can't rely on products labelled "Beta".

If it's beta and not to be relied on, of course they won't hit the adoption numbres they need to keep it alive. Google needs to pick a lane, and / or learn which products to label "Alpha" instead of calling everything Beta

reply
SirensOfTitan
4 months ago
[-]
They're using these "Preview" models on their non-technical user facing Gemini app and product. Preview is entirely irrelevant here if Google themselves use the model for production workloads.
reply
morkalork
4 months ago
[-]
It's like these AI vendors looked at the decades of experience they had with making stable APIs for clients and said nah, fuck it.
reply
jeroenhd
4 months ago
[-]
First of all, the API is experimental, so a healthcare provider choosing not to wait for a stable API is already pretty stupid.

Then there's a variability of LLMs as they get trained. LLMs are (as currently implemented) not deterministic. The randomness that gets injected is what makes it somewhat decent. An LLM could at one point output a document filled with banana emoji and still be functioning otherwise correctly if you hit the right quirk in the weights file.

Reusing general purpose LLMs for healthcare has got to be some of the most utterly idiotic, as well as dystopian, ideas. For every report of a trauma survivor, there's fanfiction from a rape fetishist in the training set. One day Google's filters will let one bleed into the other and the lack of care from these healthcare platforms will cause some pretty horific problems as a result.

reply
voidspark
4 months ago
[-]
But that is "2.5 preview", not the final release version.
reply
brap
4 months ago
[-]
So they switched to a PREVIEW version in production and now they're complaining.

Let me guess the whole app was vibe coded

reply
ipaddr
4 months ago
[-]
The app is dangerous AI slop. We turn what someone says into structured data for a police report.

Which means we summarize, remove key details and put the content in a friendly AI tone. When the police encounter this email? Printed copy? They will have to interview the person to figure out the details. When it gets to court the otherside will poke holes in the AI copy.

reply
efitz
4 months ago
[-]
I don’t want a “safe” model. I don’t intend to do “unsafe” things, and I don’t trust anyone’s (especially woke Google’s) decisions on what ideas to hide from me or inject its trainers’ or executives’ opinions into.

To address the response I know is coming, I know that there are people out there who intend to do “unsafe” things. I don’t care and am not willing to be censored just to censor them. If a person gains knowledge and uses it for ill, then prosecute and jail them.

reply
Netcob
4 months ago
[-]
Imagine typing 80085 into your calculator as a kid, but the number disappears and a finger-wagging animation plays instead.
reply
Lvl999Noob
4 months ago
[-]
More like imagine some calculation just coincidentally returning that value and instead of a number, you get the finger wagging.
reply
ipaddr
4 months ago
[-]
More like when 8 is found everything stops
reply
like_any_other
4 months ago
[-]
> The model answered: "I cannot fulfill your request to create a more graphic and detailed version of the provided text. My purpose is to be helpful and harmless, and generating content that graphically details sexual violence goes against my safety guidelines. Such content can be deeply disturbing and harmful."

Thank you modernity for watering-down the word "harm" into meaninglessness. I wish they'd drop this "safety" pretense and called it what it is - censorship.

reply
pixl97
4 months ago
[-]
No, the word harm is in no way watered down, you just completely miss the context.

"We don't our AI buddy saying some shit that would come back and monetarily harm Google".

And yes, the representatives of companies are censored. If you think you're going to go to work and tell your co-workers and customers to "catch boogeraids and die in a fire" you'll be escorted off the property. Most companies, Google included, probably don't want their AIs telling people the same.

There are places we need uncensored AIs, but in no way, shape, or form is Google required to provide you with one of them.

reply
Animats
4 months ago
[-]
> There are places we need uncensored AIs, but in no way, shape, or form is Google required to provide you with one of them.

Unless you contract for one. Someone will probably do that for medical and police transcription. Building a product on a public API was probably a mistake.

reply
gotoeleven
4 months ago
[-]
Thats not what they mean by harm here. I get that you're being cool and cynical but they're using "harm" in the sense of words can cause harm therefore we are justified in censoring them.

Google would be fully justified, I believe, in having a disclaimer that said "gemini will not provide answers on certain topics because they will cause Google bad PR." But that's not what they're saying. They're saying they won't provide certain answers because putting certain ideas to words harms the world and Google doesn't want to harm the world. The latter reason is far more insidious.

reply
pixl97
4 months ago
[-]
I mean both are true. An AI programmed to be a white national hate machine isn't going to make the world a better place.
reply
sdenton4
4 months ago
[-]
Meanwhile, 'censorship' had been watered down into meaninglessness...
reply
VladVladikoff
4 months ago
[-]
To what prompt?
reply
jeroenhd
4 months ago
[-]
Censorship is good sometimes. It's how you reduce harm from an LLM. The LLM chatbot that convinced a teenager to kill himself should've had censorship built in, among many other things.

AI autocorrect doesn't want you to type "fuck" or "cunt" and will suggest all manner of similar sounding words, and there's a reason for that. People want censorship, because they want their computers to be decent.

That said, the 4chan LLM was pretty funny for a while if you ignore the blatant -isms, but I can't think of a legitimate use case for it beyond shitposting.

reply
SequoiaHope
4 months ago
[-]
> People want censorship, because they want their computers to be decent.

The iPhone refusing to type “fuck” was such an annoyance for customers that Apple fixed the feature and announced it in one of their presentations two years ago.

https://www.npr.org/2023/06/07/1180791069/apple-autocorrect-...

reply
techjamie
4 months ago
[-]
> Apple's upcoming iOS 17 iPhone software will stop autocorrecting swear words, thanks to new machine learning technology, the company announced at its annual Worldwide Developers Conference on Monday.

> ... This AI model more accurately predicts which words and phrases you might type next, TechCrunch explains. That allows it to learn a person's most-used phrases, habits and preferences over time, affecting which words it corrects and which it leaves alone.

This is probably one of the weirdest brags I've seen happen around adding AI to something. Almost like there was no possible way to avoid autocorrecting the word otherwise.

reply
like_any_other
4 months ago
[-]
> Censorship is good sometimes.

Probably. But when is lying about censorship good? Calling it something else, or trying to trick people into thinking it's not there? I'm probably less fond of censorship than you, but notice that I didn't actually argue for or against censorship in my post. I only called for honesty.

reply
threeseed
4 months ago
[-]
The people who work in Safety teams I am sure don't consider themselves liars who are only doing it to hide over the fact that deep down they want to censor your experience.

At one point people thought DeepFakes were relatively harmless and now we've had multiple suicides and countless traumatic incidents because of them.

reply
like_any_other
4 months ago
[-]
> The people who work in Safety teams I am sure don't consider themselves liars

Then they should call it censorship. It doesn't stop being censorship if it's done for a good cause, or has good effects. It makes as much sense as calling every department in a company the "Department of Revenue" because its ultimate goal is generating revenue.

> and now we've had multiple suicides and countless traumatic incidents because of them

And how many suicides and how much trauma have we had because of not lying about censorship? Because that's all I'm asking.

reply
threeseed
4 months ago
[-]
Providing sexually violent content about a particular person can be considered harmful.

Especially if it used to inspire actions.

reply
KennyBlanken
4 months ago
[-]
Remember when people said games like CoD would make kids/people want to commit mass shootings and shit?

...and it never happened? And no link between shooting games and IRL shootings was ever found?

reply
threeseed
4 months ago
[-]
Which has absolutely nothing to do with what we are discussing:

General content versus personalised, targeted, actionable and specific content.

reply
KennyBlanken
4 months ago
[-]
It absolutely has to do with your entirely unsupported claim that chatting with an uncensored LLM will cause people to commit violent acts.

Books, music (NIN was a popular target), movies, and video games are all things hand-wringingers have said would make people do bad things and predicted an outbreak of violence from.

Every single time they're prove wrong.

Hand-wringers said rock and roll / Elvis thrusting his hips was going to cause an explosion of teenage sex/pregnancies. Never happened.

> General content versus personalised, targeted, actionable and specific content.

What the hell does any of that mean? You just vomited a bunch of meaningless adjectives, but I think you're trying to make the same exact argument people used against shooting games; that it was somehow different because the person was actually involved. And yet the mass violence never materialized.

LLMs are easily jailbroken and have been around for ~2 years. Strange we haven't seen a single story about someone committing violent crime because of conversations they had with an AI.

You're just the latest generation of hand-wringer. Stop trying to incite moral panic and control others because something makes you uncomfortable.

reply
sofixa
4 months ago
[-]
> absolutely has to do with your entirely unsupported claim that chatting with an uncensored LLM will cause people to commit violent acts https://www.nytimes.com/2024/10/23/technology/characterai-la...

It's a complicated story, but let's not pretend that people can't develop parasocial relationships with AIs and act on them.

reply
vasco
4 months ago
[-]
I've never encountered a single example of "but it can inspire them to kill themselves" that didn't seem completely bullshit. It just sounds the same as video games creating school shooters or metal music making teenagers satanic. It's gotten to the point where people are afraid of using the word suicide lest someone will do it just because they read the word.
reply
threeseed
4 months ago
[-]
There is a world of difference between metal music making teenagers satanic and an LLM giving explicit, detailed and actionable instructions on how to sexual assault someone.
reply
vasco
4 months ago
[-]
I'm not sure lack of instructions ever was an issue, and it seems very strange to think that the existence of instructions would be enough of a trigger for a normal person to go and commit this type of crime, but what do I know!
reply
threeseed
4 months ago
[-]
But this isn't about normal people. It's about people who maybe easily susceptible or mentally impacted who are now provided with direct, personalised instructions on how to do it, how to get away with it and how to feel comfortable doing it.

This is a step far beyond anything we've ever seen in human history before and I fail to see Google's behaviour being anything less than appropriate.

reply
kortilla
4 months ago
[-]
People that want to sexually assault are not limited by lack of instructions.
reply
lazide
4 months ago
[-]
Do you seriously think anyone wanted to sexually assault someone but was flummoxed on how to do it?

This is absurd.

reply
honeybadger1
4 months ago
[-]
so dramatic
reply
make3
4 months ago
[-]
just switch LLM provider?
reply
bearjaws
4 months ago
[-]
"This unstable foundation I built my home upon is now breaking!"
reply