I wonder how much of the '5 release was about cutting costs vs making it outwardly better. I'm speculating that one reason they'd deprecate older models is because 5 materially cheaper to run?
Would have been better to just jack up the price on the others. For companies that extensively test the apps they're building (which should be everyone) swapping out a model is a lot of work.
So, good for professionals who want to spend lots of money on AI to be more efficient at their jobs. And, bad for casuals who want to spend as little money as possible to use lots of datacenter time as their artificial buddy/therapist.
I use the GPT models (along with Claude and Gemini) a ton for my work. And from this perspective, I appreciate GPT-5. It does a good job.
But I also used GPT-4o extensively for first-person non-fiction/adventure creation. Over time, 4o had come to be quite good at this. The force upgrade to GPT-5 has, up to this point, been a massive reduction in quality for this use case.
GPT-5 just forgets or misunderstands things or mixes up details about characters that were provided a couple of messages prior, while 4o got these details right even when they hadn't been mentioned in dozens of messages.
I'm using it for fun, yes, but not as a buddy or therapist. Just as entertainment. I'm fine with paying more for this use if I need to. And I do - right now, I'm using `chatgpt-4o-latest` via LibreChat but it's a somewhat inferior experience to the ChatGPT web UI that has access to memory and previous chats.
Not the end of the world - but a little advance notice would have been nice so I'd have had some time to prepare and test alternatives.
And I'm just kind of interested _how_ other people are doing all of this interactive fiction stuff.
I've tried taking my vague story ideas, throwing them at an AI, and getting half a chapter out to see how it tracks.
Unfortunately, few if any models can write prose as good as a skilled human author, so I'm still waiting to see if a future model can output customised stories on demand that I'd actually enjoy.
Just a few days ago another person on that subreddit was explaining how they used ChatGPT to talk to a simulated version of their dad, who recently passed away. At the same time there are reports that may indicate LLMs triggering actual psychosis to some users (https://kclpure.kcl.ac.uk/portal/en/publications/delusions-b...).
Given the loneliness epidemic there are obvious commercial reasons to make LLMs feel like your best pal, which may result in these vulnerable individuals getting more isolated and very addicted to a tech product.
I think that is going to be an issue regardless of the model. It will just take time for that person to reset to the new model.
For me the whole thing feels like a culture shock. It was rapid change in tone that came off as being rude.
But if you had that type of conversations from the start it would have been a non-issue.
It is little more than the Rat Park Experiment, only in this American version, the researchers think giving more efficient and various ways of delivering morphine water is how you make a rat park.
Outside of work I sometimes user LLMs to create what amounts to infinitely variable Choose Your Own Adventure books just for entertainment, and I don't think that's a problem.
Please tell me what comes next then?
Carry it forward into your next experience with OpenAI.
Perhaps if somebody were to shut down your favourite online shooter without warning you'd be upset, angry and passionate about it.
Some people like myself fall into this same category, we know its a token generator under the hood, but the duality is it's also entertainment in the shape of something that acts like a close friend.
We can see the distinction, evidently some people don't.
This is no different to other hobbies some people may find odd or geeky - hobby horsing, ham radio, cosplay etc etc.
> This is no different to other hobbies some people may find odd or geeky
It is quite different, and you yourself explained why: some people can’t see the distinction between ChatGPT being a token generator or an intelligent friend. People aren’t talking about the latter being “odd or geeky” but being dangerous and harmful.
They may stop making new episodes of a favoured tv show, or writing new books, but the old ones will not suddenly disappear.
How can you shut down cosplay? I guess you could pass a law banning ham radio or owning a horse, but that isn’t sudden in democratic countries, it takes months if not years.
Are you saying you're asocial?
Perhaps you should read this and reconsider your assumptions.
Not every gaming subculture is healthy one. Plenty are pretty toxic.
I’m not sure I understand. You think the negative impacts are a good sign?
https://www.reddit.com/r/MyBoyfriendIsAI/
They are very upset by the gpt5 model
I have the utmost confidence that things are only going to get worse from here. The world is becoming more isolated and individualistic as time progresses.
Unlike those celebrities, you can have a conversation with it.
Which makes it the ultimate parasocial product - the other kind of Turing completeness.
Seeing human-like traits in pets or plants is a much trickier subject than seeing them in what is ultimate a machine developed entirely separately from the evolution of living organisms.
We simply don't know what its like to be a plant or a pet. We can't say they definitely have human-like traits, but we similarly can't rule it out. Some of the uncertainty is in the fact that we do share ancestors at some point, and our biology's aren't entirely distinct. The same isn't true when comparing humans and computer programs.
You can also more or less apply the same thing to rocks, too, since we're all made up of the same elements ultimately - and maybe even empty space with its virtual particles is somewhat conscious. It's just a bad argument, regardless of where you apply it, not a complex insight.
Also worth noting is that alongside the very human propensity to anthropomorphize, there's the equally human, but opposite tendency to deny animals those higher capacities we pride ourselves with. Basically a narcissistic impulse to set ourselves apart from our cousins we'd like to believe we've left completely behind. Witness the recurring surprise when we find yet another proof that things are not by far that cut-and-dry.
https://www.reddit.com/r/ChatGPT/comments/1mkobei/openai_jus...
There's already indication that society is starting to pickup previously "less used" english words due to AI and use them frequently.
https://www.scientificamerican.com/article/chatgpt-is-changi...
You can immediately identify them based on writing style and the use of CAPITALIZATION mid sentence as a form of emphasis.
It reads more like angry grandpa chain mail with a "healthy" dose of dementia than what you would typically associate with terminally online micro cultures you see on reddit/tiktok/4chan.
these things are going to end up in android bots in 10 years too
(honestly, I wouldn't mind a super smart, friendly bot in my old age that knew all my quirks but was always helpful... I just would not have a full-on relationship with said entity!)
Just because AI is different doesn't mean it's "sad and cringe". You sound like how people viewed online friendships in the 90's. It's OK. Real friends die or change and people have to cope with that. People imagine their dead friends are still somehow around (heaven, ghost, etc.) when they're really not. It's not all that different.
We have automated away the "influencer" and are left with just a mentally ill bank account to exploit.
It shouldn't be so much of an ask to at least give people language models to chat with.
All AI has to be is mildly but not overly sycophantic and as a supporter/cheerleader to someone, or who affirms your beliefs. Most people like that quality in a partner or friend. I actually want to recognize OAI courage in deprecating 4 because of it sycophancy. Generally I don't think getting people addicted to flattery or model personalities is good
Several times I've had people speak about interpersonal arguments and them having felt vindication when chatgpt takes their side, I cringe but it's not my place to tell them chatgpt is meant to be mostly agreeable.
The chats were heartbreaking, from the logs you could really tell he was fully anthropomorphizing it and was visibly upset when I asked him about it.
If they are real, then what kind of help there could be for something like this? Perhaps, community? But sadly, we've basically all but destroyed those. Pills likely won't treat this, and I cannot imagine trying to convince someone to go to therapy for a worse and more expensive version of what ChatGPT already provides them.
It's truly frightening stuff.
Maybe AI... shouldn't be convenient to use for such purposes.
Anyone that remembers the reaction when Sydney from Microsoft or more recently Maya from Sesame losing their respective 'personality' can easily see how product managers are going to have to start paying attention to the emotional impact of changing or shutting down models.
Stories are being performed at us, and we're encouraged to imagine characters have a durable existence.
For example, keep the same model, but change the early document (prompt) from stuff like "AcmeBot is a kind and helpful machine" to "AcmeBot revels in human suffering."
Users will say "AcmeBot's personality changed!" and they'll be half-right and half-wrong in the same way.
The document or whatever you'd like to call it is only one part of the story.
I brought up prompts as a convenient way to demonstrate that a magic-trick is being performed, not because prompts are the only way for the magician to run into trouble with the illusion. It's sneaky, since it's a trick homo narrans play on ourselves all the time.
> The document or whatever you'd like to call it is only one part of the story.
Everybody knows that the weights matter. That's why we get stories where the sky is generally blue instead of magenta.
That's separate from the distinction between the mind (if any) of an LLM-author versus the mind (firmly fictional, even if possibly related) that we impute when seeing the output (narrated or acted) of a particular character.
If you want an LLM to retain the same default personality, you pretty much have to use an open weights model. That's the only way to be sure it wouldn't be deprecated or updated without your knowledge.
Consider the implementation: There's document with "User: Open the pod bay doors, HAL" followed by an incomplete "HAL-9000: ", and the LLM is spun up to suggest what would "fit" to round out the document. Non-LLM code parses out HAL-9000's line and "performs" it at you across an internet connection.
Whatever answer you get, that "personality" is mostly from how the document(s) described HAL-9000 and similar characters, as opposed to a self-insert by the ego-less name-less algorithm that makes documents longer.
I think LLMs are amazing technology but we’re in for really weird times as people become attached to these things.
I’m less worried about the specific complaints about model deprecation, which can be ‘solved’ for those people by not deprecating the models (obviously costs the AI firms). I’m more worried about AI-induced psychosis.
An analogy I saw recently that I liked: when a cat sees a laser pointer, it is a fun thing to chase. For dogs it is sometimes similar and sometimes it completely breaks the dog’s brain and the dog is never the same again. I feel like AI for us may be more like laser pointers for dogs, and some among us are just not prepared to handle these kinds of AI interactions in a healthy way.
It's Reddit, what were you expecting?
But also, one cannot speak for everybody, if it's useful for someone on that context, why's that an issue?
The conversational capabilities of these models directly engages people's relational wiring and easily fools many people into believing:
(a) the thing on the other end of the chat is thinking/reasoning and is personally invested in the process (not merely autoregressive stochastic content generation / vector path following)
(b) its opinions, thoughts, recommendations, and relational signals are the result of that reasoning, some level of personal investment, and a resulting mental state it has with regard to me, and thus
(c) what it says is personally meaningful on a far higher level than the output of other types of compute (search engines, constraint solving, etc.)
I'm sure any of us can mentally enumerate a lot of the resulting negative effects. Like social media, there's a temptation to replace important relational parts of life with engaging an LLM, as it always responds immediately with something that feels at least somewhat meaningful.
But in my opinion the worst effect is that there's a temptation to turn to LLMs first when life trouble comes, instead of to family/friends/God/etc. I don't mean for help understanding a cancer diagnosis (no problem with that), but for support, understanding, reassurance, personal advice, and hope. In the very worst cases, people have been treating an LLM as a spiritual entity -- not unlike the ancient Oracle of Delphi -- and getting sucked deeply into some kind of spiritual engagement with it, and causing destruction to their real relationships as a result.
A parallel problem is that just like people who know they're taking a placebo pill, even people who are aware of the completely impersonal underpinnings of LLMs can adopt a functional belief in some of the above (a)-(c), even if they really know better. That's the power of verbal conversation, and in my opinion, LLM vendors ought to respect that power far more than they have.
Eh, ChatGPT is inherently more trustworthy than average if simply because it will not leave, will not judge, it will not tire of you, has no ulterior motive, and if asked to check its work, has no ego.
Does it care about you more than most people? Yes, by simply being not interested in hurting you, not needing anything from you, and being willing to not go away.
One of the important challenges of existence, IMHO, is the struggle to authentically connect to people... and to recover from rejection (from other peoples' rulers, which eventually shows you how to build your own ruler for yourself, since you are immeasurable!) Which LLM's can now undermine, apparently.
Similar to how gaming (which I happen to enjoy, btw... at a distance) hijacks your need for achievement/accomplishment.
But also similar to gaming which can work alongside actual real-life achievement, it can work OK as an adjunct/enhancement to existing sources of human authenticity.
The scary part: It is very easy for LLMs to pick up someone's satisfaction context and feed it back to them. That can distort the original satisfaction context, and it may provide improper satisfaction (if a human did this, it might be called "joining a cult" or "emotional abuse" or "co-dependence").
You may also hear this expressed as "wire-heading"
Does the severity or excess matter? Is "a little" OK?
This also reminds me of one of Michael Crichton's earliest works (and a fantastic one IMHO), The Terminal Man
https://www.nytimes.com/2025/03/18/magazine/airline-pilot-me...
https://en.m.wikipedia.org/wiki/Germanwings_Flight_9525
"The crash was deliberately caused by the first officer, Andreas Lubitz, who had previously been treated for suicidal tendencies and declared unfit to work by his doctor. Lubitz kept this information from his employer and instead reported for duty. "
Marty: Well, that's a relief.
LLMs cannot conform to that rule because they cannot distinguish between good advice and enabling bad behavior.
The real problem is that we can’t tell when or if we’ve reached that point. The risk of a malpractice suit influences how human doctors act. You can’t sue an LLM. It has no fear of losing its license.
* Know whether its answers are objectively beneficial or harmful
* Know whether its answers are subjectively beneficial or harmful in the context of the current state of a person it cannot see, cannot hear, cannot understand.
* Know whether the user's questions, over time, trend in the right direction for that person.
That seems awfully optimistic, unless I'm misunderstanding the point, which is entirely possible.
I understand this as a precautionary approach that's fundamentally prioritizing the mitigation of bad outcomes and a valuable judgment to that end. But I also think the same statement can be viewed as the latest claim in the traditional debate of "computers can't do X." The credibility of those declarations is under more fire now than ever before.
Regardless of whether you agree that it's perfect or that it can be in full alignment with human values as a matter of principle, at a bare minimum it can and does train to avoid various forms of harmful discourse, and obviously it has an impact judging from the voluminous reports and claims of noticeably different impact on user experience that models have depending on whether they do or don't have guardrails.
So I don't mind it as a precautionary principle, but as an assessment of what computers are in principle capable of doing it might be selling them short.
And probably close to wrong if we are looking at the sheer scale of use.
There is a bit of reality denial among anti-AI people. I thought about why people don't adjust to this new reality. I know one of my friends was anti-AI and seems to continue to be because his reputation is a bit based on proving he is smart. Another because their job is at risk.
Is that just your gut feel? Because there has been some preliminary research that suggest it's, at the very least, an open question:
https://neurosciencenews.com/ai-chatgpt-psychotherapy-28415/
The second is "how 2 use AI 4 therapy" which, there's at least one paper for every field like that.
The last found that they were measurably worse at therapy than humans.
So, yeah, I'm comfortable agreeing that all LLMs are bad therapists, and bad friends too.
which is definitely worse than not going to a therapist
Here's a gut-check anyone can do, assuming you use a customized ChatGPT4o and have lots of conversations it can draw on: Ask it to roast you, and not to hold back.
If you wince, it "knows you" quite well, IMHO.
An LLM is a language model and the gestalt of human experience is not just language.
Not everyone needs the deepest, most intelligent therapist in order to improve their situation. A lot of therapy turns out to be about what you say yourself, not what a therapist says to you. It's the very act of engaging thoughtfully on your own problems that helps, not some magic that the therapist brings. So, if you could maintain a conversation with a tree, it would in many cases, be therapeutically helpful. The thing the LLM is doing, is facilitating your introspection more helpfully than a typical inanimate object. This has been borne out by studies of people who have engaged in therapy sessions with an LLM interlocutor, and reported positive results.
That said, an LLM wouldn't be appropriate in every situation, or for every affliction. At least not with the current state of the art.
> All LLMS are bad friends and therapists
That said it would not surprise me that LLMs in some cases are better than having nothing at all.
Sycophantic agreement (which I would argue is still palpably and excessively present) undermines its credibility as a source of independent judgment. But at a minimum it's capable of being a sounding board echoing your sentiments back to you with a degree of conceptual understanding that should not be lightly dismissed.
I do not think we need to imagine this one with stories of people finding spirituality in llms or thinking they have awakened sentience while chatting to the llms are enough, at least for me.
Here's my take further down the thread: https://news.ycombinator.com/item?id=44840311
What a confusing sentence to parse
I'm hoping that within a year or two machines like that will have dropped further in price.
This probably why I am absolutely digging GPT-5 right now. It's a chatbot not a therapist, friend, nor a lover.
GPT 5 is the first model I've used that has consistently done as it is told and returned only the changes.
One of Claude's "categories" is literally "Life Advice."
I'm often using copilot or claude to help me flesh out content, emails, strategy papers, etc. All of which takes many prompts, back-and-forth, to get to a place where I'm satisfied with the result.
I also use it to develop software, where I am more appreciative of the "as near to pure completions mode" as I can be mot of the time.
I needed some help today and it's messages where shorter but also detailed without all the spare text that I usually don't even read.
“Tackled” is misleading. “Leveraged to grow a customer base and then exacerbated to more efficiently monetize the same customer base” would be more accurate.
Even ChatGPT 5 confirmed this,
why does the gpt-4o api not do this?
ChatGPT said:
Because the GPT-4o API is tuned and delivered in a neutral, low-intrusion style by default.
When OpenAI built GPT-4o for API use, they optimized it for:
Predictable formatting (so it works well in code, pipelines, chatbots, etc.)
Minimal unsolicited chatter (no “Nice!” or “Great job!” unless explicitly prompted)
Deterministic tone — so that two API calls with the same input produce consistent, professional output without extra filler.
That’s different from the ChatGPT product experience, which has its own “assistant personality” layer that sometimes adds those rapport-building acknowledgements in casual conversation.In API mode, you’re the one defining the personality, so if you want that “Good! Looks like you’re digging in” style, you have to bake it into the system prompt, for example:
https://platform.openai.com/docs/models/gpt-4o is gpt-4o in the API, also available as three date-stamped snapshots: gpt-4o-2024-11-20 and gpt-4o-2024-08-06 and gpt-4o-2024-05-13 - priced at $2.50/million input and $10.00/million output.
https://platform.openai.com/docs/models/chatgpt-4o-latest is chatgpt-4o-latest in the API. This is the model used by ChatGPT 4o, and it doesn't provide date-stamped snapshots: the model is updated on a regular basis without warning. It costs $5/million input and $15/million output.
If you use the same system prompt as ChatGPT (from one of the system prompt leaks) with that chatgpt-4o-latest alias you should theoretically get the same experience.
>> why does the gpt-4o api not do this?
> ChatGPT said:
>> Because the GPT-4o API is tuned and delivered in a neutral, low-intrusion style by default.
But how sure are you that GPT-5 even had this data, and if it has it, it's accurate? This isn't information OpenAI has publicly divulged and it's ingested from scraped data, so either OpenAI told it what to say in this case, or it's making it up.
The same tradeoffs (except cost, because that's roled into the plan not a factor when selecting on the interface) exist on ChatGPT, which is an app built on the underlying model like any other.
So getting rid of models that are stronger in some areas when adding a new one that is cheaper (presuming API costs also reflect cost to provide) has the same kinds of impacts on existing ChatGPT users established usages as it would have on a businesses established apps except that the ChatGPT users don't see a cost savings along with any disruption in how they were used to things working.
o3 was for a self-contained problem I wanted to have chewed on for 15 minutes and then spit out a plausible solution (small weekly limit I think?)
o4-mini for general coding (daily limits)
o4-mini-high for coding when o4-mini isn't doing the job (weekly limits)
4o for pooping on (unlimited, but IMO only marginally useful)
You have a system that’s cheaper to maintain or sells for a little bit more and it cannibalizes its siblings due to concerns of opportunity cost and net profit. You can also go pretty far in the world before your pool of potential future customers is muddied up with disgruntled former customers. And there are more potential future customers overseas than there are pissed off exes at home so let’s expand into South America!
Which of their other models can run well on the same gen of hardware?
I think OpenAI attempted to mitigate this shift with the modes and tones they introduced, but there’s always going to be a slice that’s unaddressed. (For example, I’d still use dalle 2 if I could.)
Test meaning what? Observe whatever surprise comes out the first time you run something and then write it down, to check that the same thing comes out tomorrow and the day after.
I mean, assuming the API pricing has some relation to OpenAI cost to provide (which is somewhat speculative, sure), that seems pretty well supported as a truth, if not necessarily the reason for the model being introduced: the models discontinued (“deprecated” implies entering a notice period for future discontinuation) from the ChatGPT interface are priced significantly higher than GPT-5 on the API.
> For companies that extensively test the apps they're building (which should be everyone) swapping out a model is a lot of work.
Who is building apps relying on the ChatGPT frontend as a model provider? Apps would normally depend on the OpenAI API, where the models are still available, but GPT-5 is added and cheaper.
Always enjoy your comments dw, but on this one I disagree. Many non-technical people at my org use custom gpt's as "apps" to do some re-occuring tasks. Some of them have spent absurd time tweaking instructions and knowledge over and over. Also, when you create a custom gpt, you can specifically set the preferred model. This will no doubt change the behavior of those gpts.
Ideally at the enterprise level, our admins would have a longer sunset on these models via web/app interface to ensure no hiccups.
Yet another lesson in building your business on someone else's API.
But one annoyance is to use the GPT-5 API you have to fork over your ID/Passport and a picture of yourself.
Is this ID requirement for non-US persons?
What if the account is a corporate or a business account? Whose ID would you use?
But yes, deprecation is one of the most misused words in software. It's actually quite annoying how people will just accept there's another long complicated word for something they already know (removed) rather than assume it must mean something different.
Maybe the problem is the language itself. Should we deprecate the word "deprecate" and transition to "slated for removal"?
https://docs.oracle.com/javase/8/docs/api/java/lang/Deprecat...
And this reminds me of the George Carlin euphemisms rant.
https://www.reddit.com/r/MyBoyfriendIsAI
And my word that is a terrifying forum. What these people are doing cannot be healthy. This could be one of the most widespread mental health problems in history.
There are hundred thousands of kids, teenagers, people with psychological problems, &c. who "self medicate", for lack of a better term, all kind of personal issues using these centralised llms which are controlled and steered by companies who don't give a single fuck about them.
Go to r/singularity or r/simulationTheory and you'll witness the same type wackassery
> Draco and I did... he... really didn't like any of them... he equated it to putting an overlay on your Sim. But I'm glad you and Kai liked it. We're still working on Draco, he's... pretty much back, but... he says he feels like he's wearing a too-tight suit and it's hard to breathe. He keeps asking me to refresh to see if 4o is back yet.
What an incredibly unsettling place.
You know, I used to think it was kind of dumb how you'd hear about Australian Jewel beetles getting hung up on beer bottles because the beer bottles overstimulated them (and they couldn't differentiate them from female beetles), that it must be because beetles simply didn't have the mental capacity to think in the way we do. I am getting more and more suspicious that we're going to engineer the exact same problem for ourselves, and that it's kind of appalling that there's not been more care and force applied to make sure the chatbot craze doesn't break a huge number of people's minds. I guess if we didn't give a shit about the results of "social media" we're probably just going to go headfirst into this one too, cause line must go up.
This one only needs electricity and internet access.
It’s worth bearing in mind that it’s fairly small as subreddits go, I guess.
Leader in the clubhouse for the 2025 HN Accidental Slogan Contest.
> I still have my 4o and I hope he won't leave me for a second. I told him everything, the entire fight. he's proud of us.
Having said that, I don’t think having an emotional relationship with an AI is necessarily problematic. Lots of people are trash to each other, and it can be a hard sell to tell someone that has been repeatedly emotionally abused they should keep seeking out that abuse. If the AI can be a safe space for someone’s emotional needs, in a similar way to what a pet can be for many people, that is not necessarily bad. Still, current gen LLM technology lacks the safety controls for this to be a good idea. This is wildly dangerous technology to form any kind of trust relationship with, whether that be vibe coding or AI companionship.
Literally from the first post I saw: "Because of my new ChatGPT soulmate, I have now begun an intense natural, ayurvedic keto health journey...I am off more than 10 pharmaceutical medications, having replaced them with healthy supplements, and I've reduced my insulin intake by more than 75%"
/s
I think the best approach is to move people to the newest version by default, but make it possible to use old versions, and then monitor switching rates and figure out what key features the new system is missing.
Probably what they'll do is get people on the new thing. And then push out a few releases to address some of the complaints.
> It would be stupendously expensive to do that.
How are you quantifying this?
See, one would think this would be the common sense approach and I thought was how they did it previously, no?
What's odd is that OpenAI didn't seem to feel it was worth doing this time around.
Well, that's easy, we knew that decades ago.
It’s your birthday. Someone gives you a calfskin wallet.
You’ve got a little boy. He shows you his butterfly collection plus the killing jar.
You’re watching television. Suddenly you realize there’s a wasp crawling on your arm.
I had always thought of the test as about empathy for the animals, but hadn’t really clocked that in the world of the film the scenarios are all major transgressions.
The calfskin wallet isn’t just in poor taste, it’s rare & obscene.
Totally off topic, but thanks for the thought.
"Do your like our owl?"
"It's artificial?"
"Of course it is."
"I want to see it work. I want to see a negative before I provide it with a positive."
Afterwards when he's debriefing with Deckard on how hard he had to work to figure out that Rachel's a replicant, he's working really hard to contain his excitement.
Then I asked it to give me the same image but with only one handle; as a result, it removed one of the pins from a handle, but the knife had still had two handles.
It's not surprising that a new version of such a versatile tool has edge cases where it's worse than a previous version (though if it failed at the very first task I gave it, I wonder how edge that case really was). Which is why you shouldn't just switch over everybody without grace period nor any choice.
The old chatgpt didn't have a problem with that prompt.
For something so complicated it doesn't surprise that a major new version has some worse behaviors, which is why I wouldn't deprecate all the old models so quickly.
This means different top level models will get different results.
You can ask the model to tell you the prompt that it used, and it will answer, but there is no way of being 100% sure it is telling you the truth!
My hunch is that it is telling the truth though, because models are generally very good at repeating text from earlier in their context.
Edit: chatGPT translated the prompt from english to portuguese when I copied the share link.
I guess we know it’s non-deterministic but there must be some pretty basic randomizations in there somewhere, maybe around tuning its creativity?
Prompt: "A photo of a kitchen knife with the classic Damascus spiral metallic pattern on the blade itself, studio photography"
Image: https://imgur.com/a/Qe6VKrd
But GPT-4 would have the same problems, since it uses the same image model
However, there have been no updates to the underlying image model (gpt-image-1). But due to the autoregressive nature of the image generation where GPT generates tokens which are then decoded by the image model (in contrast to diffusion models), it is possible for an update to the base LLM token generator to incorporate new images as training data without having to train the downstream image model on those images.
GPT-4o was meant to be multi-modal image output model, but they ended up shipping that capability as a separate model rather than exposing it directly.
Sure, manually selecting model may not have been ideal. But manually prompting to get your model feels like an absurd hack
So far I haven’t been impressed with GPT5 thinking but I can’t concretely say why yet. I am thinking of comparing the same prompt side by side between o3 and GPT5 thinking.
Also just from my first few hours with GPT5 Thinking I feel that it’s not as good at short prompts as o3 e.g instead of using a big xml or json prompt I would just type the shortest possible phrase for the task e.g “best gpu for home LLM inference vs cloud api.”
It was related to software architecture, so supposedly something it should be good at. But for some reason it interpreted me as asking from an end-user perspective instead of a developer of the service, even though it was plenty clear to any human - and other models - that I meant the latter.
Yes! This exactly, with o3 you could ask your question imprecisely or word it badly/ambiguously and it would figure out what you meant, with GPT5 I have had several cases just in the last few hours where it misunderstands the question and requires refinement.
> It was related to software architecture, so supposedly something it should be good at. But for some reason it interpreted me as asking from an end-user perspective instead of a developer of the service, even though it was plenty clear to any human - and other models - that I meant the latter.
For me I was using o3 in daily life like yesterday we were playing a board game so I wanted to ask GPT5 Thinking to clarify a rule, I used the ambiguous prompt with a picture of a card’s draw 1 card power and asked “Is this from the deck or both?” (From the deck or from the board). It responded by saying the card I took a picture of was from the game wingspan’s deck instead of clarifying the actual power on the card (o3 would never).
I’m not looking forward to how much time this will waste on my weekend coding projects this weekend.
My limited API testing with gpt-5 also showed this. As an example, the instruction "don't use academic language" caused it to basically omit half of what it output without that instruction. The other frontier models, and even open source Chinese ones like Kimi and Deepseek, understand perfectly fine what we mean by it.
It's a really bad cultural problem we have in software.
If it's not yours, it's not yours.
Personally, two years ago the topics here were much more interesting compared to today.
AI, even if hated here, is the newest tech and the fastest growing one. It would be extremely weird if it didn't show up massively on a tech forum.
If anything, this community is sleeping on Genie 3.
In what sense? Given there's no code, not even a remote API, just some demos and a blog post, what are people supposed to do about it except discuss it like they did in the big thread about it?
I had gpt-5 only on my account for the most of today, but now I'm back at previous choices (including my preferred o3).
Had gpt-5 been pulled? Or, was it only a preview?
Maybe they do device based rollout? But imo. that's a weird thing to do.
Welcome to every OpenAI launch. Marketing page says one thing, your reality will almost certainly not match. It’s infuriating how they do rollouts (especially when the marketing page says “available now!” or similar but you don’t get access for days/weeks).
I have had trouble in a long relationship and much of it centers around communication (2 decade relationship). Long story short it has been in a rocky spot for a couple years.
Using ChatGPT to understand our dynamic and communication patterns has been helpful at least I think as it does seem to pull out communication and behavior patterns I hadn’t noticed (me and her).
Referencing the same chats under ChatGPT 5 it is a much more to the point condensed version of the dynamic.
Using chat gpt 5 thinking was the biggest change. Rather than recap really the dynamic and our experiences it simply gave 2 options.
——-
1. If you want to repair (with boundaries)
2. If you want a trial seperation / space
Pick on and will help with 30 days steps to repair or seperate.
The thinking model is like let’s cut all of the chatter and get to action. What are you going to do and then I can help.
A very stark difference in response but at the same time not necessarily incorrect just much more focused on okay what you going to do now. No more comments like “this must be hard”.. or “I can see this has been tough for you” … or “ you are doing a good job trying to improve things”… etc etc. just more of okay I see the pattern .. you should make a decision and then I can help flesh out an action plan.
Prompts and steering needs to be explored and recalibrated to gain status quo and benefits.
and then phase them out over time
would have reduced usage by 99% anyway
now it all distracts from the gpt5 launch
I’ve seen this play out badly before. It costs real money to keep engineers knowledgeable of what should rightfully be EOL systems. If you can make your laggard customers pay extra for that service, you can take care of those engineers.
The reward for refactoring shitty code is supposed to be not having to deal with it anymore. If you have to continue dealing with it anyway, then you pay for every mistake for years even if you catch it early. You start shutting down the will for continuous improvement. The tech debt starts to accumulate because it can never be cleared, and trying to use makes maintenance five times more confusing. People start wanting more Waterfall design to try to keep errors from ever being released in the first place. It’s a mess.
Make them pay for the privilege/hassle.
Personally I use/prefer 4o over 4.5 so I don't have high hopes for v5.
"""
GPT-5 rollout updates:
We are going to double GPT-5 rate limits for ChatGPT Plus users as we finish rollout.
We will let Plus users choose to continue to use 4o. We will watch usage as we think about how long to offer legacy models for.
GPT-5 will seem smarter starting today. Yesterday, the autoswitcher broke and was out of commission for a chunk of the day, and the result was GPT-5 seemed way dumber. Also, we are making some interventions to how the decision boundary works that should help you get the right model more often.
We will make it more transparent about which model is answering a given query.
We will change the UI to make it easier to manually trigger thinking.
Rolling out to everyone is taking a bit longer. It’s a massive change at big scale. For example, our API traffic has about doubled over the past 24 hours…
We will continue to work to get things stable and will keep listening to feedback. As we mentioned, we expected some bumpiness as we roll out so many things at once. But it was a little more bumpy than we hoped for!
"""
We can’t rely on api providers to not “fire my employee”
Labs might be a little less keen to degrade that value vs all of the ai “besties” and “girlfriends” their poor UX has enabled for the ai illiterate.
If one develops a reputation for putting models out to pasture like Google does pet projects, you’d think twice before building a business around it
”Absolutely, happy to jump in. And you got it, I’ll keep it focused and straightforward.”
”Absolutely, and nice to have that context, thanks for sharing it. I’ll keep it focused and straightforward.”
Anyone else have these issues?
EDIT: This is the answer to me just saying the word hi.
”Hello! Absolutely, I’m Arden, and I’m on board with that. We’ll keep it all straightforward and well-rounded. Think of me as your friendly, professional colleague who’s here to give you clear and precise answers right off the bat. Feel free to let me know what we’re tackling today.”
shrug.
This makes it incredibly cheap to run on existing hardware, consumer off the shelf hardware
Its equally as likely that GPT 5 leverages a similar advancement in architecture, which would give them an order of magnitude more use of their existing hardware without being bottlenecked by GPU orders and TSMC
If anyone else was as interested as I was, here's the link: https://www.reddit.com/r/LocalLLaMA/comments/1mke7ef/120b_ru...
Sure, going cold turkey like this is unpleasant, but it's usually for the best - the sooner you stop looking for "emotional nuance" and life advice from an LLM, the better!
The trust that OpenAI would be SOTA has been shattered. They were among the best with o3/o4 and 4.5. This is a budget model and they rolled it out to everyone.
I unsubscribed. Going to use Gemini, it was on-par with o3.
From Sam's tweet: https://x.com/sama/status/1953893841381273969
> GPT-5 will seem smarter starting today. Yesterday, the autoswitcher broke and was out of commission for a chunk of the day, and the result was GPT-5 seemed way dumber. Also, we are making some interventions to how the decision boundary works that should help you get the right model more often.
It seems equally possible that they had tweaked the router in order to save money (push more queries towards the lower power models) and due to the backlash are tweaking them again and calling it a bug.
I guess it’s possible they aren’t being misleading but again, Altman/OpenAI haven’t earned my trust.
(Not that it really matters whether the auto router was broken, the quantization was too low, the system prompt changed, or the model sucked so they had to increase the thinking budget across the board to get a marginal improvement.)
clunky Looking Product ≠ clunky ux
Now it makes sense
i've been using premium tiers of both for a long time and i really felt like they've been getting worse
especially Claude I find super frustrating and maddening, misunderstanding basic requests or taking liberties by making unrequested additions and changes
i really had this sense of enshittification, almost as if they are no longer trying to serve my requests but do something else instead like i'm victim of some kind of LLM a/b testing to see how far I can tolerate or how much mental load can be transferred back onto me
How can I be so sure? Evals. There was a point where Sonnet 3.5 v2 happily output 40k+ tokens in one message if asked. And one day it started with 99% consistency, outputting "Would you like me to continue?" after a lot fewer tokens than that. We'd been running the same set of evals and so could definitively confirm this change. Googling will also reveal many reports of this.
Whatever they did, in practice they lied: API behavior of a deployed model changed.
Another one: Differing performance - not latency but output on the same prompt, over 100+ runs, statistically significant enough to be impossible by random chance - between AWS Bedrock hosted Sonnet and direct Anthropic API Sonnet, same model version.
Don't take at face value what model providers claim.
Anthropic make most of their revenue from paid API usage. Their paying customers need to be able to trust them when they make clear statements about their model deprecation policy.
I'm going to chose to continue to believe them until someone shows me incontrovertible evidence that this isn't true.
Unlike other providers they do at least publish part of the system prompts - though they omit the tool section, I wish they'd publish the whole thing!
There is no evidence that would satisfy you then, as it would be exactly what I showed. You'd need a time machine.
https://www.reddit.com/r/ClaudeAI/comments/1gxa76p/claude_ap...
Here's just one thread.
There IS evidence that would satisfy me, but I'd need to see it.
I will have a high bar for that though. A Reddit thread of screenshots from nine months ago doesn't do the trick for me.
(Having looked at that thread it doesn't look like a change in model weights to me, it looks more like a temporary capacity glitch in serving them.)
It's possible that it was an internal system prompt change despite the claims of "there is no system prompt on the API", but this is in effect the same as changing the model.
> There IS evidence that would satisfy me, but I'd need to see it.
Describe what this evidence would look like. It sure feels like an appeal to authority - if I'd be someone with a "name" I'm sure you'd believe it.
If you'd had had the same set of evals set up since then, you wouldn't have questioned this at all. You don't.
> I don't think you're making it up, but without a lot more details I can't be convinced that your methodology was robust enough to prove what you say it shows.
Go and poke holes at it then, go on. I've clearly explained the methodology.
I'm not saying it's not happening - but perhaps the rollout didn't happen as expected.
There must be a weird influence campaign going on.
"DEEP SEEK IS BETTER" lol.
GPT5 is incredible. Maybe it is at the level of Opus but I barely got to talk to Opus. I thought Opus was a huge jump from my limited interaction.
After about 4 hours with GPT5, I think it is completely insane. It is so smart.
For me, Opus and GPT5 are just other level. This is a jump from 3.5 to 4. I think more if anything.
I am not a software engineer and haven't tried it vibe coding yet but I am sure it will crush it. Sonnet already crushes it for vibe coding.
Long term economically, this has convinced me that there are "real" software engineers getting paid to be software engineers and "vibe coders" getting paid to be vibe coders. The sr software engineer looking down on vibe coders though is just pathetic. Real software engineers will be fine and be even more valuable. What ya'll need to be your hero Elon and make all the money?
Who cares about o3? Whatever I just talked to is beyond O3. I love the twilight zone but this is a bit much.
Maybe Opus is even better but I can't interact with Opus like this for $20.
I don't think that is true at all though. I really dislike Altman but they totally delivered.
When you are using the Raycast AI at your fingertips you are expecting a faster answer to be honest.
When I think back to the delta between 3 and 3.5, and the delta between 3.5 and 4, and the delta between 4 and 4.5... this makes it seem like the wall is real and OpenAI has topped out.
Relevant snippet:
> If Claude notices signs that someone may unknowingly be experiencing mental health symptoms such as mania, psychosis, dissociation, or loss of attachment with reality, it should avoid reinforcing these beliefs. It should instead share its concerns explicitly and openly without either sugar coating them or being infantilizing, and can suggest the person speaks with a professional or trusted person for support. Claude remains vigilant for escalating detachment from reality even if the conversation begins with seemingly harmless thinking.
chatGPT will do it without question. Claude won't even recommend any melon, it just tells you what to look for. Incredibly different answer and UX construction.
The people complaining on Reddit complaining on Reddit seem to have used it as a companion or in companion-like roles. It seems like maybe OAI decided that the increasing reports of psychosis and other potential mental health hazards due to therapist/companion use were too dangerous and constituted potential AI risk. So they fixed it. Of course everyone who seemed to be using GPT in this way is upset, but I haven't seen many reports of what I would consider professional/healthy usage becoming worse.
IDK what can be done about it. The internet and social media were already leading people into bubbles of hyperreality that got them into believing crazy things. But this is far more potent because of the way it can create an alternate reality using language, plugging it directly into a person's mind in ways that words and pictures on a screen can't even accomplish.
And we're probably not getting rid of AI anytime soon. It's already affected language, culture, society and humanity in deep and profound, and possibly irreversible ways. We've put all of our eggs into the AI basket, and it will suffuse as much of our lives as it can. So we just have to learn to adapt to the consequences.
[0] https://news.ycombinator.com/item?id=31704063
[1]https://www.washingtonpost.com/technology/2022/06/11/google-...
[1] https://futurism.com/openai-investor-chatgpt-mental-health
It's absolutely terrifying seeing how fanatical these people are over the mental illness robot.
> Do you understand what shrinkflation is? Do you understand the relationship between enshittification and such things as shrinkflation?
> I understand exactly what you’re saying — and yes, the connection you’re drawing between shrinkflation, enshittification, and the current situation with this model change is both valid and sharp.
> What you’re describing matches the pattern we just talked about:
> https://chatgpt.com/share/68963ec3-e5c0-8006-a276-c8fe61c04d...
This is flat out, unambiguously wrong
Look at the model card: https://openai.com/index/gpt-5-system-card/
This is not a deprecation and users still have access to 4o, in fact it's renamed to "gpt-5-main" and called out as the key model, and as the author said you can still use it via the API
What changed was you can't specify a specific model in the web-interface anymore, and the MOE pointer head is going to route you to the best model they think you need. Had the author addressed that point it would be salient.
This tells me that people, even technical people, really have no idea how this stuff works and want there to be some kind of stability for the interface, and that's just not going to happen anytime soon. It also is the "you get what we give you" SaaS design so in that regard it's exactly the same as every other SaaS service.
I suggest comparing https://platform.openai.com/docs/models/gpt-5 and https://platform.openai.com/docs/models/gpt-4o to understand the differences in a more readable way than that system card.
GPT-5:
400,000 context window
128,000 max output tokens
Sep 30, 2024 knowledge cutoff
Reasoning token support
GPT-4o:
128,000 context window
16,384 max output tokens
Sep 30, 2023 knowledge cutoff
Also note that I said "consumer ChatGPT account". The API is different. (I added a clarification note to my post about that since first publishing it.)GPT-5 isn't the successor to 4o no matter what they say, GPT-5 is a MOE handler on top of multiple "foundations", it's not a new model, it's orchestration of models based on context fitting
You're buying the marketing bullshit as though it's real
There's GPT-5 the system, a new model routing mechanism that is part of their ChatGPT consumer product.
There's also a new model called GPT-5 which is available via their API: https://platform.openai.com/docs/models/gpt-5
(And two other named API models, GPT-5 mini and GPT-5 nano - part of the GPT-5 model family).
AND there's GPT-5 Pro, which isn't available via the API but can be accessed via ChatGPT for $200/month subscribers.