Erdos 281 solved with ChatGPT 5.2 Pro
188 points
by nl
5 hours ago
| 21 comments
| twitter.com
| HN
xeeeeeeeeeeenu
4 hours ago
[-]
> no prior solutions found.

This is no longer true, a prior solution has just been found[1], so the LLM proof has been moved to the Section 2 of Terence Tao's wiki[2].

[1] - https://www.erdosproblems.com/forum/thread/281#post-3325

[2] - https://github.com/teorth/erdosproblems/wiki/AI-contribution...

reply
nl
3 hours ago
[-]
Interesting that in Terrance Tao's words: "though the new proof is still rather different from the literature proof)"

And even odder that the proof was by Erdos himself and yet he listed it as an open problem!

reply
TZubiri
3 hours ago
[-]
Maybe it was in the training set.
reply
magneticnorth
3 hours ago
[-]
I think that was Tao's point, that the new proof was not just read out of the training set.
reply
rzmmm
1 hour ago
[-]
The model has multiple layers of mechanisms to prevent carbon copy output of the training data.
reply
Den_VR
40 minutes ago
[-]
Unfortunately.
reply
TZubiri
1 hour ago
[-]
forgive the skepticism, but this translates directly to "we asked the model pretty please not to do it in the system prompt"
reply
mikaraento
53 minutes ago
[-]
That might be somewhat ungenerous unless you have more detail to provide.

I know that at least some LLM products explicitly check output for similarity to training data to prevent direct reproduction.

reply
ffsm8
1 hour ago
[-]
It's mind boggling if you think about the fact they're essential "just" statistical models

It really contextualizes the old wisdom of Pythagoras that everything can be represented as numbers / math is the ultimate truth

reply
GrowingSideways
17 minutes ago
[-]
How so? Truth is naturally an apriori concept; you don't need a chatbot to reach this conclusion.
reply
efskap
49 minutes ago
[-]
Would it really be infeasible to take a sample and do a search over an indexed training set? Maybe a bloom filter can be adapted
reply
cubefox
1 hour ago
[-]
This illustrates how unimportant this problem is. A prior solution did exist, but apparently nobody knew because people didn't really care about it. If progress can be had by simply searching for old solutions in the literature, then that's good evidence the supposed progress is imaginary. And this is not the first time this has happened with an Erdős problem.

A lot of pure mathematics seems to consist in solving neat logic puzzles without any intrinsic importance. Recreational puzzles for very intelligent people. Or LLMs.

reply
MattGaiser
1 hour ago
[-]
There is still enormous value in cleaning up the long tail of somewhat important stuff. One of the great benefits of Claude Code to me is that smaller issues no longer rot in backlogs, but can be at least attempted immediately.
reply
cubefox
53 minutes ago
[-]
The difference is that Claude Code actually solves practical problems, but pure (as opposed to applied) mathematics doesn't. Moreover, a lot of pure mathematics seems to be not just useless, but also without intrinsic epistemic value, unlike science. See https://news.ycombinator.com/item?id=46510353
reply
jstanley
42 minutes ago
[-]
Applications for pure mathematics can't necessarily be known until the underlying mathematics is solved.

Just because we can't imagine applications today doesn't mean there won't be applications in the future which depend on discoveries that are made today.

reply
teiferer
20 minutes ago
[-]
It's hard to know beforehand. Like with most foundational research.

My favorite example is number theory. Before cyptography came along it was pure math, an esoteric branch for just number nerds. defund Turns out, super applicable later on.

reply
amazingman
23 minutes ago
[-]
It's unclear to me what point you are making.
reply
doctoboggan
4 hours ago
[-]
Can anyone give a little more color on the nature of Erdos problems? Are these problems that many mathematicians have spend years tackling with no result? Or do some of the problems evade scrutiny and go un-attempted for most of the time?

EDIT: After reading a link someone else posted to Terrance Tao's wiki page, he has a paragraph that somewhat answers this question:

> Erdős problems vary widely in difficulty (by several orders of magnitude), with a core of very interesting, but extremely difficult problems at one end of the spectrum, and a "long tail" of under-explored problems at the other, many of which are "low hanging fruit" that are very suitable for being attacked by current AI tools. Unfortunately, it is hard to tell in advance which category a given problem falls into, short of an expert literature review. (However, if an Erdős problem is only stated once in the literature, and there is scant record of any followup work on the problem, this suggests that the problem may be of the second category.)

from here: https://github.com/teorth/erdosproblems/wiki/AI-contribution...

reply
QuesnayJr
1 hour ago
[-]
Erdos was an incredibly prolific mathematician, and one of his quirks is that he liked to collect open problems and state new open problems as a challenge to the field. Many of the problems he attached bounties to, from $5 to $10,000.

The problems are a pretty good metric for AI, because the easiest ones at least meet the bar of "a top mathematician didn't know how to solve this off the top of his head" and the hardest ones are major open problems. As AI progresses, we will see it slowly climb the difficulty ladder.

reply
pessimist
4 hours ago
[-]
From Terry Tao's comments in the thread:

"Very nice! ... actually the thing that impresses me more than the proof method is the avoidance of errors, such as making mistakes with interchanges of limits or quantifiers (which is the main pitfall to avoid here). Previous generations of LLMs would almost certainly have fumbled these delicate issues.

...

I am going ahead and placing this result on the wiki as a Section 1 result (perhaps the most unambiguous instance of such, to date)"

The pace of change in math is going to be something to watch closely. Many minor theorems will fall. Next major milestone: Can LLMs generate useful abstractions?

reply
radioactivist
4 hours ago
[-]
Seems like the someone dug something up from the literature on this problem (see top comment on the erdosproblems.com thread)

"On following the references, it seems that the result in fact follows (after applying Rogers' theorem) from a 1936 paper of Davenport and Erdos (!), which proves the second result you mention. ... In the meantime, I am moving this problem to Section 2 on the wiki (though the new proof is still rather different from the literature proof)."

reply
sequin
4 hours ago
[-]
FWIW, I just gave Deepseek the same prompt and it solved it too (much faster than the 41m of ChatGPT). I then gave both proofs to Opus and it confirmed their equivalence.

The answer is yes. Assume, for the sake of contradiction, that there exists an \(\epsilon > 0\) such that for every \(k\), there exists a choice of congruence classes \(a_1^{(k)}, \dots, a_k^{(k)}\) for which the set of integers not covered by the first \(k\) congruences has density at least \(\epsilon\).

For each \(k\), let \(F_k\) be the set of all infinite sequences of residues \((a_i)_{i=1}^\infty\) such that the uncovered set from the first \(k\) congruences has density at least \(\epsilon\). Each \(F_k\) is nonempty (by assumption) and closed in the product topology (since it depends only on the first \(k\) coordinates). Moreover, \(F_{k+1} \subseteq F_k\) because adding a congruence can only reduce the uncovered set. By the compactness of the product of finite sets, \(\bigcap_{k \ge 1} F_k\) is nonempty.

Choose an infinite sequence \((a_i) \in \bigcap_{k \ge 1} F_k\). For this sequence, let \(U_k\) be the set of integers not covered by the first \(k\) congruences, and let \(d_k\) be the density of \(U_k\). Then \(d_k \ge \epsilon\) for all \(k\). Since \(U_{k+1} \subseteq U_k\), the sets \(U_k\) are decreasing and periodic, and their intersection \(U = \bigcap_{k \ge 1} U_k\) has density \(d = \lim_{k \to \infty} d_k \ge \epsilon\). However, by hypothesis, for any choice of residues, the uncovered set has density \(0\), a contradiction.

Therefore, for every \(\epsilon > 0\), there exists a \(k\) such that for every choice of congruence classes \(a_i\), the density of integers not covered by the first \(k\) congruences is less than \(\epsilon\).

\boxed{\text{Yes}}

reply
CGamesPlay
2 hours ago
[-]
> I then gave both proofs to Opus and it confirmed their equivalence.

You could have just rubber-stamped it yourself, for all the mathematical rigor it holds. The devil is in the details, and the smallest problem unravels the whole proof.

reply
yosefk
1 hour ago
[-]
How dare you question the rigor of the venerable LLM peer review process! These are some of the most esteemed LLMs we are talking about here.
reply
Davidzheng
1 hour ago
[-]
"Since \(U_{k+1} \subseteq U_k\), the sets \(U_k\) are decreasing and periodic, and their intersection \(U = \bigcap_{k \ge 1} U_k\) has density \(d = \lim_{k \to \infty} d_k \ge \epsilon\)."

Is this enough? Let $U_k$ be the set of integers such that their remainder mod 6^n is greater or equal to 2^n for all 1<n<k. Density of each $U_k$ is more than 1/2 I think but not the intersection (empty) right?

reply
Klover
2 hours ago
[-]
Here's kimi-k2-thinking with the reasoning block included: https://www.kimi.com/share/19bcfe2e-d9a2-81fe-8000-00002163c...
reply
nsoonhui
3 hours ago
[-]
I am not familiar with the field, but any chance that the deepseek is just memorizing the existing solution? Or different.

https://news.ycombinator.com/item?id=46664976

reply
utopiah
2 hours ago
[-]
Sure but if so wouldn't ChatGPT 5.2 Pro also "just memorizing the existing solution?"?
reply
nsoonhui
2 hours ago
[-]
No it's not, you can refer to my link and subsequent discussion.
reply
utopiah
2 hours ago
[-]
I don't see what's related there but anyway unless you have access to information from within OpenAI I don't see how you can claim what was or wasn't in the training data of ChatGPT 5.2 Pro.

On the contrary for DeepSeek you could but not for a non open model.

reply
nsoonhui
2 hours ago
[-]
I am basing on Terrence Tao comment here: https://news.ycombinator.com/item?id=46665168

It says that the OpenAI proof is a different one from the published one in the literature.

Whereas whether the Deepseek proof is the same as the published one, I dont know enough of the math to judge.

That was what I meant.

reply
logicchains
1 hour ago
[-]
Opus isn't a good choice for anything math-related; it's worse at math than the latest ChatGPT and Gemini Pro.
reply
amluto
4 hours ago
[-]
I find it interesting that, as someone utterly unfamiliar with ergodic theory, Dini’s theorem, etc, I find Deepseek’s proof somewhat comprehensible, whereas I do not find GPT-5.2’s proof comprehensible at all. I suspect that I’d need to delve into the terminology in the GPT proof if I tried to verify Deepseek’s, so maybe GPT’s is being more straightforward about the underlying theory it relies on?
reply
carbocation
4 hours ago
[-]
The erdosproblems thread itself contains comments from Terence Tao: https://www.erdosproblems.com/forum/thread/281
reply
energy123
1 hour ago
[-]
A surprising % of these LLM proofs are coming from amateurs.

One wonders if some professional mathematicians are instead choosing to publish LLM proofs without attribution for career purposes.

reply
kristopolous
1 hour ago
[-]
It's probably from the perennial observation

"This LLM is kinda dumb in the thing I'm an expert in"

reply
Davidzheng
1 hour ago
[-]
I'm actually not sure what the right attribution method would be. I'd lean towards single line on acknowledgements? Because you can use it for example @ every lemma during brainstorming but it's unclear the right convention is to thank it at every lemma...

Anecdotally, I, as a math postdoc, think that GPT 5.2 is much stronger qualitatively than anything else I've used. Its rate of hallucinations is low enough that I don't feel like the default assumption of any solution is that it is trying to hide a mistake somewhere. Compared with Gemini 3 whose failure mode when it can't solve something is always to pretend it has a solution by "lying"/ omitting steps/making up theorems etc... GPT 5.2 usually fails gracefully and when it makes a mistake it more often than not can admit it when pointed out.

reply
redbluered
4 hours ago
[-]
Has anyone verified this?

I've "solved" many math problems with LLMs, with LLMs giving full confidence in subtly or significantly incorrect solutions.

I'm very curious here. The Open AI memory orders and claims about capacity limits restricting access to better models are interesting too.

reply
bpodgursky
4 hours ago
[-]
Terence Tao gave it the thumbs up. I don't think you're going to do better than that.
reply
bparsons
3 hours ago
[-]
It's already been walked back.
reply
energy123
3 hours ago
[-]
Not in the sense of being a "subtly or significantly incorrect solution".
reply
dust42
1 hour ago
[-]
Personally, I'd prefer if the AI models would start with a proof of their own statements. Time and again, SOTA frontier models told me: "Now you have 100% correct code ready for production in enterprise quality." Then I run it and it crashes. Or maybe the AI is just being tongue-in-cheek?

Point in case: I just wanted to give z.ai a try and buy some credits. I used Firefox with uBlock and the payment didn't go through. I tried again with Chrome and no adblock, but now there is an error: "Payment Failed: p.confirmCardPayment is not a function." The irony is, that this is certainly vibe-coded with z.ai which tries to sell me how good they are but then not being able to conclude the sale.

And we will get lots more of this in the future. LLMs are a fantastic new technology, but even more fantastically over-hyped.

reply
becquerel
1 hour ago
[-]
You get AIs to prove their code is correct in precisely the same ways you get humans to prove their code is correct. You make them demonstrate it through tests or evidence (screenshots, logs of successful runs).
reply
ashleyn
4 hours ago
[-]
I guess the first question I have is if these problems solved by LLMs are just low-hanging fruit that human researchers either didn't get around to or show much interest in - or if there's some actual beef here to the idea that LLMs can independently conduct original research and solve hard problems.
reply
utopiah
2 hours ago
[-]
That's the first warning from the wiki : <<Erdős problems vary widely in difficulty (by several orders of magnitude), with a core of very interesting, but extremely difficult problems at one end of the spectrum, and a "long tail" of under-explored problems at the other, many of which are "low hanging fruit" that are very suitable for being attacked by current AI tools.>> https://github.com/teorth/erdosproblems/wiki/AI-contribution...
reply
dyauspitr
4 hours ago
[-]
There is still value on letting these LLMs loose on the periphery and knocking out all the low hanging fruit humanity hasn’t had the time to get around to. Also, I don’t know this, but if it is a problem on Erdos I presume people have tried to solve it atleast a little bit before it makes it to the list.
reply
utopiah
2 hours ago
[-]
Is there though? If they are "solved" (as in the tickbox mark them as such, through a validation process, e.g. another model confirming, formal proof passing, etc) but there is no human actually learning from them, what's the benefit? Completing a list?

I believe the ones that are NOT studied are precisely because they are seen as uninteresting. Even if they were to be solved in an interesting way, if nobody sees the proof because they are just too many and they are again not considered valuable then I don't see what is gained.

reply
niemandhier
1 hour ago
[-]
Is there explainability research for this type of model application? E.g. a sparse auto encoder or something similar but more modern.

I would love to know which concepts are active in the deeper layers of the model while generating the solution.

Is there a concept of “epsilon” or “delta”?

What are their projections on each other?

reply
a_tartaruga
4 hours ago
[-]
Out of curiosity why has the LLM math solving community been focused on the Erdos problems over other open problems? Are they of a certain nature where we would expect LLMs to be especially good at solving them?
reply
krackers
4 hours ago
[-]
I guess they are at a difficulty where it's not too hard (unlike millennium prize problems), is fairly tightly scoped (unlike open ended research), and has some gravitas (so it's not some obscure theorem that's only unproven because of it's lack of noteworthiness).
reply
Davidzheng
1 hour ago
[-]
I actually don't think the reason is that they are easier than other open math problems. I think it's more that they are "elementary" in the sense that the problems usually don't require a huge amount of domain knowledge to state.
reply
becquerel
1 hour ago
[-]
People like checking items off of lists.
reply
beders
2 hours ago
[-]
Has anyone confirmed the solution is not in the training data? Otherwise it is just a bit information retrieval LLM style. No intelligence necessary.
reply
dernett
4 hours ago
[-]
This is crazy. It's clear that these models don't have human intelligence, but it's undeniable at this point that they have _some_ form of intelligence.
reply
brendyn
3 hours ago
[-]
If LLMs weren't created by us but where something discovered in another species' behaviour it would be 100% labelled intelligence
reply
qudat
4 hours ago
[-]
My take is that a huge part of human intelligence is pattern matching. We just didn’t understand how much multidimensional geometry influenced our matches
reply
keeda
3 hours ago
[-]
Yes, it could be that intelligence is essentially a sophisticated form of recursive, brute force pattern matching.

I'm beginning to think the Bitter Lesson applies to organic intelligence as well, because basic pattern matching can be implemented relatively simply using very basic mathematical operations like multiply and accumulate, and so it can scale with massive parallelization of relatively simple building blocks.

reply
bob1029
1 hour ago
[-]
Intelligence is almost certainly a fundamentally recursive process.

The ability to think about your own thinking over and over as deeply as needed is where all the magic happens. Counterfactual reasoning occurs every time you pop a mental stack frame. By augmenting our stack with external tools (paper, computers, etc.), we can extend this process as far as it needs to go.

LLMs start to look a lot more capable when you put them into recursive loops with feedback from the environment. A trillion tokens worth of "what if..." can be expended without touching a single token in the caller's context. This can happen at every level as many times as needed if we're using proper recursive machinery. The theoretical scaling around this is extremely favorable.

reply
sdwr
4 hours ago
[-]
I don't think it's accurate to describe LLMs as pattern matching. Prediction is the mechanism they use to ingest and output information, and they end up with a (relatively) deep model of the world under the hood.
reply
visarga
2 hours ago
[-]
The "pattern matching" perspective is true if you zoom in close enough, just like "protein reactions in water" is true for brains. But if you zoom out you see both humans and LLMs interact with external environments which provide opportunity for novel exploration. The true source of originality is not inside but in the environment. Making it be all about the model inside is a mistake, what matters more than the model is the data loop and solution space being explored.
reply
D-Machine
3 hours ago
[-]
"Pattern matching" is not sufficiently specified here for us to say if LLMs do pattern matching or not. E.g. we can say that an LLM predicts the next token because that token (or rather, its embedding) is the best "match" to the previous tokens, which form a path ("pattern") in embedding space. In this sense LLMs are most definitely pattern matching. Under other formulations of the term, they may not be (e.g. when pattern matching refers to abstraction or abstracting to actual logical patterns, rather than strictly semantic patterns).
reply
keeda
3 hours ago
[-]
Yes, the world model building is achieved via pattern matching and happens during ingestion and training, but that is also part of the intelligence.
reply
DrewADesign
3 hours ago
[-]
Which is even more true for humans.
reply
csomar
1 hour ago
[-]
Intelligence is hallucination that happens to produce useful results in the real world.
reply
eru
2 hours ago
[-]
Well, Alpha Go and Stockfish can beat you at their games. Why shouldn't these models beat us at math proofs?
reply
_fizz_buzz_
13 minutes ago
[-]
Chess and Go have very restrictive rules. It seems a lot more obvious to me why a computer can beat a human at it. They have a huge advantage just by being able to calculate very deep lines in a very short time. I actually find it impressive for how long humans were able to beat computers at go. Math proofs seem a lot more open ended to me.
reply
thfuran
1 hour ago
[-]
Alpha go and stockfish were specifically designed and trained to win at those games.
reply
Davidzheng
1 hour ago
[-]
And we can train models specifically at math proofs? I think only difference is that math is bigger....
reply
threethirtytwo
3 hours ago
[-]
I don't think they will ever have human intelligence. It will always be an alien intelligence.

But I think the trend line unmistakably points to a future where it can be MORE intelligent than a human in exactly the colloquial way we define "more intelligent"

The fact that one of the greatest mathematicians alive has a page and is seriously bench marking this shows how likely he believes this can happen.

reply
altmanaltman
4 hours ago
[-]
Depends on what you mean by intelligence, human intelligence and human
reply
ekianjo
4 hours ago
[-]
It's pattern matching. Which is actually what we measure in IQ tests, just saying.
reply
jadenpeterson
4 hours ago
[-]
There's some nuance. IQ tests measure pattern matching and, in an underlying way, other facets of intelligence - memory, for example. How well can an LLM 'remember' a thing? Sometimes Claude will perform compaction when its context window reaches 200k "tokens" then it seems a little colder to me, but maybe that's just my imagination. I'm kind of a "power user".
reply
rurban
4 hours ago
[-]
I call it matching. Pattern matching had a different meaning.
reply
ekianjo
3 hours ago
[-]
what are you referring to? LLMs are neural networks at their core and the most simple versions of neural networks are all about reproducing patterns observed during training
reply
rurban
1 hour ago
[-]
You need to understand the difference between general matching and pattern matching. Maybe should have read more older AI books. A LLM is a general fuzzy matcher. A pattern matcher is an exact matcher using an abstract language, the "pattern". A general matcher uses a distance function instead, no pattern needed.

Ie you want to find a subimage in a big image, possibly rotated, scaled, tilted, distorted, with noise. You cannot do that with a pattern matcher, but you can do that with a matcher, such as a fuzzy matcher, a LLM.

You want to find a go position on a go board. A LLM is perfect for that, because you don't need to come up with a special language to describe go positions (older chess programs did that), you just train the model if that position is good or bad, and this can be fully automated via existing literature and later by playing against itself. You train the matcher not via patterns but a function (win or loose).

reply
TZubiri
3 hours ago
[-]
As someone who doesn't understand this shit, and how it's always the experts who fiddle the LLMs to get good outputs, it feels natural to attribute the intelligence to the operator (or the training set), rather than the LLM itself.
reply
wewxjfq
58 minutes ago
[-]
The LLMs that take 10 attempts to un-zero-width a <div>, telling me that every single change totally fixed the problem, are cracking the hardest math problems again.
reply
renewiltord
1 hour ago
[-]
It’s funny. in some kind of twisted variant of Cunningham’s Law we have:

> the best way to find a previous proof of a seemingly open problem on the internet is not to ask for it; it's to post a new proof

reply
jrflowers
3 hours ago
[-]
Narrator: The solution had already appeared several times in the training data
reply
IAmGraydon
3 hours ago
[-]
This is showing as unresolved here, so I'm assuming something was retracted.

https://mehmetmars7.github.io/Erdosproblems-llm-hunter/probl...

reply
nl
3 hours ago
[-]
I think that just hasn't been updated.
reply
logicallee
1 hour ago
[-]
how did they do it? Was a human using the chat interface? Did they just type out the problem and immediately on the first reply received a complete solution (one-shot) or what was the human's role? What was ChatGPT's thinking time?
reply
phelm
1 hour ago
[-]
reply
logicallee
1 hour ago
[-]
very interesting. ChatGPT reasoned for 41 minutes about it! Also, this was one-shot - i.e. ChatGPT produced its complete proof with a single prompt and no more replies by the human, (rather than a chat where the human further guided it.)
reply
magicalist
3 hours ago
[-]
Funny seeing silicon valley bros commenting "you're on fire!" to Neel when it appears he copied and pasted the problem verbatim into chatGPT and it did literally all the other work here

https://chatgpt.com/share/696ac45b-70d8-8003-9ca4-320151e081...

reply
ares623
4 hours ago
[-]
This must be what it feels like to be a CEO and someone tells me they solved coding.
reply
mikert89
4 hours ago
[-]
I have 15 years of software engineering experience across some top companies. I truly believe that ai will far surpass human beings at coding, and more broadly logic work. We are very close
reply
sekai
7 minutes ago
[-]
> I have 15 years of software engineering experience across some top companies. I truly believe that ai will far surpass human beings at coding, and more broadly logic work. We are very close

Coding was never the hard part of software development.

reply
anonzzzies
4 hours ago
[-]
HN will be the last place to admit it; people here seem to be holding out with the vague 'I tried it and it came up with crap'. While many of us are shipping software without touching (much) code anymore. I have written code for over 40 years and this is nothing like no-code or whatever 'replacing programmers' before, this is clearly different judging from the people who cannot code with a gun to their heads but still are shipping apps: it does not really matter if anyone believes me or not. I am making more money than ever with fewer people than ever delivering more than ever.

We are very close.

(by the way; I like writing code and I still do for fun)

reply
utopiah
2 hours ago
[-]
Both can be correct : you might be making a lot of money using the latest tools while others who work on very different problems have tried the same tools and it's just not good enough for them.

The ability to make money proves you found a good market, it doesn't prove that the new tools are useful to others.

reply
fc417fc802
3 hours ago
[-]
> holding out with the vague 'I tried it and it came up with crap'

Isn't that a perfectly reasonable metric? The topic has been dominated by hype for at least the past 5 if not 10 years. So when you encounter the latest in a long line of "the future is here the sky is falling" claims, where every past claim to date has been wrong, it's natural to try for yourself, observe a poor result, and report back "nope, just more BS as usual".

If the hyped future does ever arrive then anyone trying for themselves will get a workable result. It will be trivially easy to demonstrate that naysayers are full of shit. That does not currently appear to be the case.

reply
danielbln
2 hours ago
[-]
What topic are you referring to? ChatGPT release was just over 3 years ago. 5 years ago we had basic non-instruct GPT-3.
reply
fc417fc802
2 hours ago
[-]
Wasn't transformer 2017? There's been constant AI hype since at least that far back and it's only gotten worse.

If I release a claim once a month that armageddon will happen next month, and then after 20 years it finally does, are all of my past claims vindicated? Or was I spewing nonsense the entire time? What if my claim was the next big pandemic? The next 9.0 earthquake?

reply
danielbln
2 hours ago
[-]
Transformers was 2017 and it had some implications on translation (which were in no way overstated), but it took GPT-2 and 3 to kick it off in earnest and the real hype machine started with ChatGPT.

What you are doing however is dismissing the outrageous progress on NLP and by extension code generation of the last few years just because people over hype it.

People over hyped the Internet in the early 2000s, yet here we are.

reply
fc417fc802
2 hours ago
[-]
Well I've been seeing an objectionable amount of what I consider to be hype since at least transformers.

I never dismissed the actual verifiable progress that has occurred. I objected specifically to the hype. Are you sure you're arguing with what I actually said as opposed to some position that you've imagined that I hold?

> People over hyped the Internet in the early 2000s, yet here we are.

And? Did you not read the comment you are replying to? If I make wild predictions and they eventually pan out does that vindicate me? Or was I just spewing nonsense and things happened to work out?

"LLMs will replace developers any day now" is such a claim. If it happens a month from now then you can say you were correct. If it doesn't then it was just hype and everyone forgets about it. Rinse and repeat once every few months and you have the current situation.

reply
visarga
2 hours ago
[-]
But the trend line is less ambiguous, models got better year over year, much much better.
reply
fc417fc802
2 hours ago
[-]
I don't dispute that the situation is rapidly evolving. It is certainly possible that we could achieve AGI in the near future. It is also entirely possible that we might not. Claims such as that AGI is close or that we will soon be replacing developers entirely are pure hype.

When someone says something to the effect of "LLMs are on the verge of replacing developers any day now" it is perfectly reasonable to respond "I tried it and it came up with crap". If we were actually near that point you wouldn't have gotten crap back when you tried it for yourself.

reply
daxfohl
4 hours ago
[-]
They already do. What they suck at is common sense. Unfortunately good software requires both.
reply
anonzzzies
3 hours ago
[-]
Most people also suck at common sense, including most programmers, hence most programmers do not write good software to begin with.
reply
523-asf1
3 hours ago
[-]
Even a 20 year old Markov chain could produce this banality.
reply
marktl
4 hours ago
[-]
Or is it fortunate (for a short period at least).
reply
523-asf1
3 hours ago
[-]
Gotta make sure that the investors read this message in an Erdos thread.
reply
AtlasBarfed
2 hours ago
[-]
Is this comment written by AI?
reply
user3939382
3 hours ago
[-]
They can only code to specification which is where even teams of humans get lost. Without much smarter architecture for AI (LLMs as is are a joke) that needle isn’t going to move.
reply