This pushes me to use other models that aren't necessarily better, but that at least don't clam up when I am not even trying to get anything other that general summaries of research.
Scalable historical revisionism would only turbocharge societal conflict.
the latest Gemini models have a very low hallucination rate in benchmarks [0]. Feel free to try it yourself. Just go to gemini.google.com, choose Deep Research, and ask it to write a report about a topic that you're intimately familiar with.
Does the new Gemini have feet to go around and investigate? Does it reach out or have contacts to sources with the right knowledge? Does it even have critical thinking skills? Or is it just a robot spitting out plausible sounding word salad?
These things are crucial, especially when it comes to topics like history where there are highly malicious actors with infinite resources and motivation trying to rewrite history.
"Who controls the past controls the future. Who controls the present controls the past." - 1984, George Orwell
Is this when using structured outputs?
The big question is, how do you train LLMs that are useful to both humans and services while not embarrassing the company that trained them?
LLMs are pretty good at translating - but if they don't like what they're reading, they simply won't tell you what it says. Which is pretty crazy.
LLMs are pretty good at extracting data and formatting the results as JSON - unless they find the data objectionable, then they'll basically complain to the deserializer. I have to admit that's a little bit funny.
Right now, if you want to build a service and expect any sort of predictability and stability, I think you have to go with some solution that lets you run open-weights models. Some have been de-censored by volunteers, and if you find one that works for you, you can ignore future "upgrades" until you find one that doesn't break anything.
And for that it's really important to write your own tests/benchmarks. Technically the same goes for the big closed LLM services too, but when all of them fail your tests, what will you do?
It blocked the request for "safety." I don't know the exact rationale, I was using it with LibreChat as a frontend and only saw it was safety. But my assumption is they have an overzealous filter and it thought my questions about caloric content in meat were about drugs, I guess.
I'm sure they can use a more stable endpoint for their application.
Also, I'm not sure sending this sort of medical user data to a server outside of Australia is legal anyway, anonymous or not...
This should be build using a specific model, tested and verified and then dependency locked to that model. If the model provider cannot give you a frozen model, then you don't use it.
Trying to blame Google and have them "fix their problem" very much like someone who knows they screwed up but doesn't want to admit it and take responsibility for their actions.
Google just wants to limit their liability. If you disagree, run your our llm
It sounds like a dystopian horror to me
The dystopian horror is already here for a lot of people.
Most self-hosted services are generally a better idea, if you know what you're doing.
Personally I use Gemini sometimes to talk about my mood, didn't get a bad experience yet.
If it's beta and not to be relied on, of course they won't hit the adoption numbres they need to keep it alive. Google needs to pick a lane, and / or learn which products to label "Alpha" instead of calling everything Beta
Then there's a variability of LLMs as they get trained. LLMs are (as currently implemented) not deterministic. The randomness that gets injected is what makes it somewhat decent. An LLM could at one point output a document filled with banana emoji and still be functioning otherwise correctly if you hit the right quirk in the weights file.
Reusing general purpose LLMs for healthcare has got to be some of the most utterly idiotic, as well as dystopian, ideas. For every report of a trauma survivor, there's fanfiction from a rape fetishist in the training set. One day Google's filters will let one bleed into the other and the lack of care from these healthcare platforms will cause some pretty horific problems as a result.
Let me guess the whole app was vibe coded
Which means we summarize, remove key details and put the content in a friendly AI tone. When the police encounter this email? Printed copy? They will have to interview the person to figure out the details. When it gets to court the otherside will poke holes in the AI copy.
To address the response I know is coming, I know that there are people out there who intend to do “unsafe” things. I don’t care and am not willing to be censored just to censor them. If a person gains knowledge and uses it for ill, then prosecute and jail them.
Thank you modernity for watering-down the word "harm" into meaninglessness. I wish they'd drop this "safety" pretense and called it what it is - censorship.
"We don't our AI buddy saying some shit that would come back and monetarily harm Google".
And yes, the representatives of companies are censored. If you think you're going to go to work and tell your co-workers and customers to "catch boogeraids and die in a fire" you'll be escorted off the property. Most companies, Google included, probably don't want their AIs telling people the same.
There are places we need uncensored AIs, but in no way, shape, or form is Google required to provide you with one of them.
Unless you contract for one. Someone will probably do that for medical and police transcription. Building a product on a public API was probably a mistake.
Google would be fully justified, I believe, in having a disclaimer that said "gemini will not provide answers on certain topics because they will cause Google bad PR." But that's not what they're saying. They're saying they won't provide certain answers because putting certain ideas to words harms the world and Google doesn't want to harm the world. The latter reason is far more insidious.
AI autocorrect doesn't want you to type "fuck" or "cunt" and will suggest all manner of similar sounding words, and there's a reason for that. People want censorship, because they want their computers to be decent.
That said, the 4chan LLM was pretty funny for a while if you ignore the blatant -isms, but I can't think of a legitimate use case for it beyond shitposting.
The iPhone refusing to type “fuck” was such an annoyance for customers that Apple fixed the feature and announced it in one of their presentations two years ago.
https://www.npr.org/2023/06/07/1180791069/apple-autocorrect-...
> ... This AI model more accurately predicts which words and phrases you might type next, TechCrunch explains. That allows it to learn a person's most-used phrases, habits and preferences over time, affecting which words it corrects and which it leaves alone.
This is probably one of the weirdest brags I've seen happen around adding AI to something. Almost like there was no possible way to avoid autocorrecting the word otherwise.
Probably. But when is lying about censorship good? Calling it something else, or trying to trick people into thinking it's not there? I'm probably less fond of censorship than you, but notice that I didn't actually argue for or against censorship in my post. I only called for honesty.
At one point people thought DeepFakes were relatively harmless and now we've had multiple suicides and countless traumatic incidents because of them.
Then they should call it censorship. It doesn't stop being censorship if it's done for a good cause, or has good effects. It makes as much sense as calling every department in a company the "Department of Revenue" because its ultimate goal is generating revenue.
> and now we've had multiple suicides and countless traumatic incidents because of them
And how many suicides and how much trauma have we had because of not lying about censorship? Because that's all I'm asking.
Especially if it used to inspire actions.
...and it never happened? And no link between shooting games and IRL shootings was ever found?
General content versus personalised, targeted, actionable and specific content.
Books, music (NIN was a popular target), movies, and video games are all things hand-wringingers have said would make people do bad things and predicted an outbreak of violence from.
Every single time they're prove wrong.
Hand-wringers said rock and roll / Elvis thrusting his hips was going to cause an explosion of teenage sex/pregnancies. Never happened.
> General content versus personalised, targeted, actionable and specific content.
What the hell does any of that mean? You just vomited a bunch of meaningless adjectives, but I think you're trying to make the same exact argument people used against shooting games; that it was somehow different because the person was actually involved. And yet the mass violence never materialized.
LLMs are easily jailbroken and have been around for ~2 years. Strange we haven't seen a single story about someone committing violent crime because of conversations they had with an AI.
You're just the latest generation of hand-wringer. Stop trying to incite moral panic and control others because something makes you uncomfortable.
It's a complicated story, but let's not pretend that people can't develop parasocial relationships with AIs and act on them.
This is a step far beyond anything we've ever seen in human history before and I fail to see Google's behaviour being anything less than appropriate.
This is absurd.