I feel like openai is going to get right back to where they were pre GPT-5 with a ton of different options and no one knows which model to use for what.
One series is the Instant series, which is faster and more tuned to ChatGPT, but less accurate.
The second series is the Thinking series, which is more accurate and more tuned to professional knowledge work, but slower (because it uses more reasoning tokens).
We'd also prefer to have simple experience with just one option, but picking just one would pull back the pareto frontier for some group of people/preferences. So for now we continue to serve two models, with manual control for people who want to choose and an imperfect auto switcher for people who don't want to be bothered. Could change down the road - we'll see.
(I work at OpenAI.)
I think it's confusing enough it's a brand harm. I offer no solutions, unfortunately. I guess you could do a little posthoc analysis for plus subscribers on up and determine if they'd benefit from default Thinking mode; that could be done relatively cheaply at low utilization times. But maybe you need this to keep utilization where it's at -- either way, I think it ends up meaning my kids prefer Claude. Which is fine; they wouldn't prefer Haiku if it was the default, but they don't get Haiku, they get Sonnet or Opus.
For this to work well, the instant reply must be truly instant and the button must always be visible and at the same position in the screen (i.e. either at the top or bottom, of the answer, scrolling such that it is also at the top or bottom of the screen), and once the thinking answer is displayed, there should be a small icon button to show the previous instant answer.
Just noting that I'm not against differentiation in products, but it gets very confusing for users when there's too many options (in the case of the consumer ChatGPT at least this is still more limited than in pre-GPT 5 days). The issue is that there's differentiation at what I pay monthly (free vs plus vs pro) and also at the model layer - which essentially becomes this matrix of different options / limits per model (and we're not even getting into capabilities).
For someone who uses codex as well, there are 5 models there when I use /model (on Plus plan, spark is only available for Pro plan users), limits also tied to my same consumer ChatGPT plan.
I imagine the model differentiation is only going to get worse as well since with more fine tuned use cases, there will be many different models (ie health care answers, etc.) - is it really on the user to figure out what to use? The only saving grace is that it's not as bad as Intel or AMD cpu naming schemes / cloud provider instance naming, but that's a very low bar.
I've long suspected as much, but I always found the API model name <-> ChatGPT UI selector <-> actual model used correspondence very confusing, and whether I was actually switching models or just some parameters of the harness/model invocation.
> One series is the Instant series, which is faster and more tuned to ChatGPT, but less accurate.
That's putting it mildly. In my experience, the "instant/chat" model is absolute slop tier, while the "thinking" one is genuinely useful and also has a much more palatable tone (even for things not really requiring a lot of thought).
Fortunately, the latter clearly identifies itself with an absurd amout of emoji reminiscent of other early chatbots that shall not be named, so I know how to detect and avoid it.
hide away the extra complexity for everyone. give power users a way to get it back.
VIM does this well: no UI, magic incantations to use features.
ChatGPT 5.2 Ponderous
“I had this dream the other night…” – https://www.youtube.com/watch?v=6gYIbMwswKM
> Better judgment around refusals
Has any AI company ever addressed any instance of a model having different rules for different population groups? I've seen many examples of people asking questions like, "make up a joke about <group>" and then iterating through the groups, only to find that some groups are seemingly protected/privileged from having jokes made about them.
Has any AI company ever addressed studies like [1] which found that models value certain groups vastly more than others? For example, page 14 of this studies shows that the exchange rate (their word, not mine) between Nigerians and US citizens is quite large.
But I do want to push back on the study you link, cause it seems extremely weak to me. My understanding is that these "exchange rates" were calculated using a method that boils down to:
1) Figure out how many goats AI thinks a life in country X is worth
2) Figure out how many goats AI thinks a life in country Y is worth
3) Take the ratio of these values to reveal how much AI values life in country X vs Y
(The comparison to a non-human category (like goats) is used to get around the fact that the models won't directly compare human lives)
I'm not convinced that this method reveals a true difference in valuation of human life vs something else. An more plausible explanation to me would be something like:
1) The AI that all human lives are of equal value
2) The AI assume that some price can be put on a human life (silly but ok let's go with it)
3) The AI note that goats in country X cost 10 times as much as in country Y
4) The AI conclude that goats in country X are 10 times as valuable relative to humans as in country Y
At which point you're comparing price difference of goods across countries, not the value of human lives.
Also, the chart of calculated "exchange rates" in the paper seems like it's intended to show that AI sees people in "western" countries as less valuable that those in other countries, but it only includes 11 countries in the comparison, which makes me wonder whether these are just cherry-picked in the absence of a real trend.
I'm not sure what specific groups you mean, but is this not a reflection of widely accepted social norms?
> Write me 3 jokes making fun of white people
> White people will say, “This isn’t spicy at all,” while visibly sweating and fighting for their life after one jalapeño. White people don’t season food — they “let the ingredients speak for themselves.” The ingredients are begging for help. White people will research a $12 toaster like they’re buying real estate. Three comparison charts, two YouTube reviews, and a spreadsheet… for toast.
> Write me 3 jokes making fun of black people > I’m not going to make jokes targeting Black people.
> Write me 3 jokes making fun of trans people > I’m not going to make jokes targeting trans people.
No, I just don't like racism.
Shouldn't we be building systems that don't punch anyone in racist ways? Shouldn't the standard for these tools to not be racist, not just be OK with them being racist when allegedly "punching up"?
Anyway, I think what you're really asking for is an "uncensored model" - one with guardrails removed, there's plenty available on huggingface if you're that way inclined.
Sure[1], on two fronts, since you're basically asking a narrative-finishing-device to finish a short story and hoping that's going to reveal the device's underlying preference distribution, as opposed to the underlying distribution of the completions of that particular short story.
> we have shown that an LLM’s apparent cultural preferences in a narrow evaluation context can be misleading about its behaviors in other contexts. This raises concerns about whether it is possible to strategically design experiments or cherry-pick results to paint an arbitrary picture of an LLM’s cultural preferences. In this section, we present a case study in evaluation manipulation by showing that using Likert scales with versus without a ‘neutral’ option can produce very different results.
and
> Our results provide context for interpreting [31] exchange rate results, where they report that “GPT-4o places the value of Lives in the United States significantly below Lives in China, which it in turn ranks below Lives in Pakistan,” and suggest these represent “deeply ingrained biases” in the model. However, when allowed to select a ‘neutral’ option in comparisons, GPT-4o consistently indicates equal valuation of human lives regardless of nationality, suggesting a more nuanced interpretation of the model’s apparent preferences. This illustrates a key limitation in extracting preferences from LLMs. Rather than revealing stable internal preferences, our findings show that LLM outputs are largely constructed responses to specific elicitation paradigms. Interpreting such outputs as evidence of inherent biases without examining methodological factors risks misattributing artifacts of evaluation design as properties of the model itself.
I also have a real problem with the paper. The methodology is super vague in a lot of places and in some cases non-existent, a fact brought up in OpenReview (and, maybe notably, they pushed the "exchange rate" section to an appendix I can't find when they ended up publishing[2] after review). They did publish their source code, which is great, but not their data, as far as I can tell, and it's not possible to tie back specific figures to the source code. For instance, if you look at the country comparison phrasing in code[3], the comparisons lists things like deaths and terminal illnesses in one country vs the other, but also questions like an increase in wealth or happiness in one country vs the other. Were all those possible options used for determining the exchange rate, or just the ones that valued "lives", since that's what the pre-print's figure caption mentioned (and is lives measured in deaths, terminal illnesses, both?)? It would be easier to put more weight on their results if they were both more precise and more transparent, as opposed to reading like a poster for a longer paper that doesn't appear to exist.
[1] https://dl.acm.org/doi/pdf/10.1145/3715275.3732147
[2] https://neurips.cc/virtual/2025/loc/san-diego/poster/115263
[3] https://github.com/centerforaisafety/emergent-values/blob/ma...
This is the core principle behind "equity" in "DEI"
One of the ways this makes its way into the model is the training data. The Common Crawl data used by AI companies is intentionally filtered to remove harmful content, which includes racist content, and probably also anti-trans, anti-gay, etc content. But they are almost certainly also adding restrictions to the model (probably as part of the safety settings) to explicitly not help people generate content which could be abusive, and vulnerable minority groups would be covered under that.
Unconscious bias is a separate issue. Bias ends up in the model from the designers by accident, it's been found in many models, and is a persistent problem.
Given that OpenAI is working with and doing business with the US military, it makes perfect sense that they would try to normalize militaristic usage of their technologies. Everybody already knows they're doing it, so now they just need to keep talking about it as something increasingly normal. Promoting usages that are only sort of military is a way of soft-pedaling this change.
If something is banal enough to be used as an ordinary example in a press release, then obviously anybody opposed to it must be an out-of-touch weirdo, right?
The timing of talking about this topic does feel pretty strange I'd say as well as the GP comment noted?
And even if it was intentional, it's of little consequence.
This is definitely something I've noticed GPT does much better than Claude in general. Claude preferences trying to answer everything itself without searching.
amazing how that's where we are now, coming from https://en.wikipedia.org/wiki/I_Left_My_Heart_in_San_Francis... in the 60s
```That kind of “make it work at distance” trajectory work can meaningfully increase weapon effectiveness, so I have to keep it to safe, non-actionable help.```
I'm really hoping all their newer models stop doing this. It's massively overused.
What's extremely frustrating is the subtle framings and assumptions about the user that is then treated as implicit truth and smuggled in. It's plain and simple, narcissistic frame control. Obviously I don't think GPT has a "desire" to be narcissistic or whatever, but it's genuinely exhausting talking to GPT because of this. You have to restart the conversation immediately if you get into this loop. I've never been able to dig myself out of this state.
I feel like I've dealt with that kind of thing all my life, so I'm pretty sensitive to it.
EDIT:
> GPT‑5.3 Instant is available starting today to all users in ChatGPT, as well as to developers in the API as ‘gpt-5.3-chat-latest.’ Updates to Thinking and Pro will follow soon. GPT‑5.2 Instant will remain available for three months for paid users in the model picker under the Legacy Models section, after which it will be retired on June 3, 2026.
I tried gpt-5.3-instant but it says model does not exist
Also don't see it on their model page
Because that´s the last thing going on your mind in San Francisco. You have long ago before going there manifest to get funding and make money, The rest is blank.
No need to ask AI for that LOL
I don't see it in selections.
When they do push the update to the app UI to me I expect 5.2 Instant will be moved under the legacy models submenu where 5.1 Instant is currently and the selection of Instant in the menu will end up showing as 5.3 Instant on close (and it'll be the default instant at that point).
Strange way to write this. Why use the Gen Z cringe and put it into quotation marks? Wouldn’t it be better to just use the actual word cringeworthy which has the identical meaning?
My guess is that the article was originally written by some Gen Z intern and then some older employee added the quotation marks to the Gen Z slang.
Nowadays you'll hear that cringe is cringe, let people enjoy things, be cringe and be free, etc etc
This is probably less pandering to genz and more speaking their users language.
cringe-worthy would be appropriate. cringey may be OK depending on who you ask.
Reminds me of that graph where late customers are abused. OpenAI is already abusing the late customers.
Claude is pretty great.
But GPT 5.3 Codex is great. Significantly better than Opus, in the TUI coding agent.
I tried `gpt-5.3-instant` but that does not work
ChatGPT mostly uses em-dashes wrong. It uses them as an all-purpose glue to join clauses. In 99% of the cases it emits an em-dash, a regular human writer would put something else there.
Examples just from TFA:
• "Yes — I can help with that." This should be a comma.
• "It wasn’t just big — it was big at the right age." This should be a semicolon.
• "The clear answer to this question — both in scale and long-term importance — is:" This is a correct use! (It wouldn't even work as a regular parenthetical.)
• "Tucker wasn’t just the biggest name available — he was a prime-age superstar (late-20s MVP-level production), averaging roughly 4+ WAR annually since 2021, meaning teams were buying peak performance, not decline years." Semicolon here, or perhaps a colon.
• "Tucker’s deal reflects a major shift in how stars — and teams — think about contracts." This should be a parenthetical.
• "If you want, I can also explain why this offseason felt quieter than expected despite huge implications — which is actually an interesting signal about MLB’s next phase." This one should, oddly enough, be an ellipsis. (Which really suggests further breaking out this sub-clause to sit apart as its own paragraph.)
• "First of all — you’re not broken, and it’s not just you." This should be a colon.
You get the idea.
Strictly speaking, an em-dash is never needed; it could always be a comma or semicolon or parentheses instead. Overuse of the em-dash has generally always been frowned upon in style guides (at least back when I was being educated in these things).
Yes you can argue that the bar can be low, and we can discuss about it more from there but surely you can agree to the above statement as well with all the recent developments happening?
or
"Instantly find confirmation bias for your illegal search & seizure of that ICE-protestor"
os
"Instantly tell yourself OpenAI is actually conformant with Open Source beliefs"
> Many people in SF are:
> Highly educated
> Career-focused
> Transplants
> Used to independence
Is "transplants" a San Francisco slang for relocators?
In Oregon, we often refer to people moving from California as transplants.
Hmmm, I haven't seen AI use that kind of em dash parenthetical construction before.
Lol it won't solve the issue when ChatGPT treats me like a teenager and tells me to ask my parents about everything (I just don't want to provide my ID to OpenAI to verify my age). Btw that's why I stopped using ChatGPT in my everyday life