FilterHN

Is it just me, or Opus 4.6 is sounding bit dumb lately

7 points

by rambrrest

3 days ago

| past

| 5 comments

| HN

Going round and round in one of the harness I use.

▲

UK-Al05

5 hours ago

[-]

Probably reducing its capabilities to make the new model look better.

▲

muzani

3 hours ago

[-]

They're always peak when released, then degrade over the next few months. Which doesn't matter because in 5 months, there's a new record-breaking model.

Gemini Pro, once voted the best model in 2025 by Polymarket, just spouts nonsense half the time now. I say this as an AI Pro subscriber. It built this amazing system for me in a weekend and suddenly it acts like an old horse.

OpenAI often has the opposite effect - they're disappointing when released and everyone goes back to them whenever Gemini/Grok/Claude breaks down and discover that hey, it's pretty good.

▲

Imustaskforhelp

3 days ago

[-]

Not using opus 4.6 but I have heard the same things.

https://old.reddit.com/r/LocalLLaMA/comments/1sgd7fp/its_ins...

There was also another post about how the perceived qualities of these models is going insanely down, something not reflected in benchmarks

I feel like it might be because the costs of GPU is reflecting back up and they might be having a more diluted model which makes it more dumb while still taking the 100$

I personally feel like this theory of these models slowing going down in intelligence until a new model which isn't bogged down intentionally might be of more interest than people think because my experience with even claude sonnet 3.7 when it had first launched was genuinely fascinating and gemini 3.1 premium and it really aligns with my personal experience tinkering with these models.

The AI industry feels quite scam-my to be honest and we would all be forced by IPO or index funds bending backwards to be left holding the bags :-/

It really feels like a great deception being played against the masses.

▲

Areena_28

3 days ago

[-]

Yeah, i noticed it too. Something feels off with the reasoning on complex multi-step tasks compared to a few months ago. hard to tell if it's actual regression or just expectations creeping up as you use it more.

Been mixing Opus with Sonnet depending on the task. Sonnet handles most things well enough and Opus for anything that genuinely needs deeper reasoning. Try it out, may be you find it useful

▲

uberman

3 days ago

[-]

Isn't it the expected thing that LLMs degrade over time?