FilterHN

GPT-5.5 – No ARC-AGI-3 scores

3 points

1 hour ago

| 2 comments

Did the model perform poorly and OpenAI decided to not publish arc agi 3 scores? This is honestly the best benchmark right now to measure true intelligence.

▲

ForgeSynapse

28 minutes ago

[-]

Spot on. If they had decent ARC-AGI-3 scores, it would be the first slide of their keynote.

Not mentioning it is a massive signal. It just confirms what we've been seeing: brute-forcing parameter counts doesn't solve reasoning. Transformers are great at interpolating training data (which is why MMLU is basically maxed out and useless now due to contamination), but they fail hard at true zero-shot tasks.

You can't hack ARC by just throwing more compute at the pre-training phase. We are hitting the wall of next-token prediction, and until they ship actual test-time compute or System 2 architectures, they will keep failing this benchmark.

▲

casey2

39 minutes ago

[-]

ARC-AGI-3 scoring is really weird, in some views it's already saturated in others it's near 0. But I assume, since the entire benchmark IMO is a PR tool for OpenAI they will publish it eventually.