LLMs Are Currently Not Helpful at All for Math Research: Hamkins
27 points
1 day ago
| 9 comments
| officechai.com
| HN
demirbey05
1 day ago
[-]
It's hard to cut through the AI hype when there are billions of dollars at stake. I usually trust negative comments more, as long as the person isn't trying to sell a course. Even though Terence Tao is a respected scientist, I wonder if his recent comments are driven by a need for funding due to federal cuts. I’ve had similar experiences with LLMs—whenever I ask them about hard math or RL theory, they almost always give me the wrong answers.
reply
ben_w
1 day ago
[-]
I also care more about the failure modes than the successes, although in my case, it's because I keep finding them exceptionally useful at software development, and I:

1. Don't want to use them where they suck.

Think normalisation of deviance: "the problems haven't affected me therefore they don't exist" is a way to get really badly burned.

2. Want to train up in things they will still suck at by the time I've leared whatever it is.

I find LLMs seem kinda bad at writing sheet music, and Suno is kinda bad at werid instructions (like Stable Diffusion for images), but I expect them to get good before I can.

I also find them inconsistent at non-Euclidian problems: sometimes they can, sometimes they can't. I have absolutely no idea how to monetise that, but even if I could, "inconsistent" is itself an improvement on "cannot ever" which is what SOTA was a few years ago.

reply
j16sdiz
1 day ago
[-]
> The frustration, for Hamkins, goes beyond mere incorrectness—it’s the nature of the interaction itself that proves problematic.

I would assume pairing LLMs with a formal proof system would help a lot. At the very least, the system can know what is incorrect, without lengthy debates, which frustrate him most.

This won't help the system discover or solve new math, but it make the experience far more endurable.

reply
112233
1 day ago
[-]
That itself may be tricky. Suppose you proof system tells that the proof is correct — how do you verify it is proof of the assertion you want, unless you have beforehand written the "test harness" by hand? At least in programming, formally checking that the code does exactly what is required is orders of magnitude harder than simply writing code that compiles.
reply
MaysonL
1 day ago
[-]
Terry Tao has found them helpful in the past[0], and I’d trust him over Hamkins.

[0] https://terrytao.wordpress.com/2025/12/08/the-story-of-erdos...

reply
watchthedemo
1 day ago
[-]
For the sake of rigor it appears he found it interesting that someone else found them helpful in the past rather than "Terry Tao[himself] has found them helpful in the past."

> This was one further addition to a recent sequence of examples where an Erdős problem had been automatically solved in one fashion or another by an AI tool.[Aristotle] Like the previous cases, the proof turned out to not be particularly novel

He concludes:

> One striking feature of this story for me is how important it was to have a diverse set of people, literature, and tools to attack this problem. To be able to state and prove the precise formula for {c(n)} required multiple observations

Which looks like he wants to focus the praise for the solution on the "diverse set of people" relegating any "praise" for LLMs to the obfuscated and ambiguous category of "tools" behind literature.

reply
pfdietz
1 day ago
[-]
Tao has also used "Good Old Fashioned AI" (in the form of Mace4/Prover9) for math research.

https://terrytao.wordpress.com/2025/12/09/the-equational-the...

reply
measurablefunc
1 day ago
[-]
They do different kinds of mathematics. Tao works mainly in analytic number theory & that's why he got a Fields medal. Hamkins works in foundations of mathematics, logic, & computability theory.
reply
sph
1 day ago
[-]
Isaac Newton was deep into alchemy and occult. Even very smart people suffer from confirmation bias, Dunning-Kruger effect & co.
reply
NitpickLawyer
1 day ago
[-]
> “I’ve played around with it and I’ve tried experimenting, but I haven’t found it helpful at all. Basically zero. It’s not helpful to me. And I’ve used various systems and so on, the paid models and so on.”

Eh... I don't know. It's hard for me to believe such absolutist takes. Especially since other proeminent mathematicians (i.e. Tao) have spoken highly of the help and value they get from these systems.

I also see this kind of takes in my field. Especially from "senior" people, 20+ years of experience. The problem is that, when pressed, their "trying it out" is often the most basic, naive, technically atrocious type of testing. "I tried coding with it but it's useless" -> they tried once, 3 years ago, in a chatgpt web interface, and never touched it since.

I think there's a lot of disconnect between the overhypers and deniers. Both are wrong, IMO, and unfortunately both are extremely vocal while also being stubborn and unwilling to try different things. But my gut feeling is that if in 2025/26 someone says "basically zero" they are wrong. Or naive. Or make strawman arguments. And ignorable.

reply
mcv
1 day ago
[-]
I've been using it very recently. Not for mathematics , but for programming. And while Claude Opus is much more likely to admit mistake ("You're absolutely correct!" instead of "That's fine") when I correct it, it does require correcting, and has been incapable at grasping complex problems. I can't trust it to produce correct code for complex problems, because when I did, the solution turned out to be wrong. Plausible looking, and certainly producing some results, but not the correct ones.

It has still been useful, because parts of the problem have been solved before, and in some cases it can accurately reproduce those parts, which has been massively helpful. And it can help me understand some aspects of the problem. But it can't solve it for me. It has shown that that's simply beyond its capabilities.

reply
20kHz
1 day ago
[-]
Well folks, Tao says he finds AI useful so I guess we can flag this and other stories that say anything to the contrary, pack it up and go home.
reply
pfdietz
1 day ago
[-]
Not only said, but demonstrated. That counts for quite a lot.
reply
watchthedemo
1 day ago
[-]
I think you are missing the sarcasm of the parent comment.

I am also fairly certain you have yet to actually watch Tao "demonstrate" his interaction with these technologies because if you had you would know in those demonstrations he formally verifies theorems that already have proofs, and even more importantly, that he had already formally verified twice before 'off camera'. Despite these two realities he still spends a lot of time correcting the LLMs or completely ignoring their suggestions while expressing a mostly polite appraisal of the tech's use and capability.

Of course the hype machine takes his politeness, which has more to do with his specific personality than anything to do with the tech, as universal endorsement for the tech and all of its externalities.

I called it when he started getting in bed with these malevolent tech companies that they were laundering their misdeeds through his reputation, and here we are.

reply
pfdietz
1 day ago
[-]
Unless you claim Tao is outright lying, his statements say he's getting objective value out of AI tools. They are helping him accelerate actual math research.

Math seems like an ideal target for AI, as it provides a firm basis for identifying correctness (has something actually been proved.) I think the consequences of this are going to be enormous for mathematics, including the conversion of the entire historical math literature to formalized, checked proofs. Once that is done, once all that training data is available, the capabilities of math AIs are IMO likely to be extraordinary. Add to that the sort of Alpha Go-style self training as AI provers work on new problems.

reply
watchthedemo
1 day ago
[-]
> Unless you claim Tao is outright lying, his statements say he's getting objective value out of AI tools. They are helping him accelerate actual math research.

I lack the personal relationship with Tao necessary to determine if his words are truthful or otherwise (namely for me, paid for), but you also failed to provide "his statements" that say either "he's getting objective value out of AI tools" (however one may quantify such a statement) or that "they are helping him accelerate actual math research" choosing instead to simply claim those words as Tao's own.

All of the future tense coded "I think", "going to be", "Once that is done", and "IMO likely" in your second paragraph seem to be at conflict, in the very least tensewise, with your gp comment's past tense of "Not only said, but demonstrated."

reply
pfdietz
1 day ago
[-]
I think you're just engaging in passive-aggressive denial here. There's no arguing with that, so I'm going to stop. Good day!
reply
watchthedemo
1 day ago
[-]
Just regular ole skepticism in response to claims without substantiative evidence.
reply
erelong
1 day ago
[-]
As the kids would say, sounds like a "skill issue", but more kindly just that this user hasn't developed a use for the tools for his specific use cases, which is fine; however an implication that the tools aren't useful for other people seems to contradict others' experience reports on how they have been able to find uses for the tools
reply
jupitr
1 day ago
[-]
This is quite seriously missing the point: computational theorem-provers and contributions to mathlib (for Lean, anyway) allow for checking "correctness". The LLM is one of the many ways to search for tactics that help complete a proof for a theorem not yet in mathlib.

The LLM is not done until the theorem-prover says it is done - at which point we can be sure of the correctness of the assertion in the context. AFAIK there is no debate about inaccuracy with theorem-provers. Their real limitation continues to be converging in realistic timescales and efficient search over all combinations of proof tactics.

reply
watchthedemo
1 day ago
[-]
> The LLM is not done until the theorem-prover says it is done - at which point we can be sure of the correctness of the assertion in the context. AFAIK there is no debate about inaccuracy with theorem-provers.

Your "in the context" is doing a lot of heavy lifting here, see:

- The Math Is Haunted : https://news.ycombinator.com/item?id=44739315 : https://overreacted.io/the-math-is-haunted/

- Three ways formally verified code can go wrong in practice : https://news.ycombinator.com/item?id=45555727 : https://buttondown.com/hillelwayne/archive/three-ways-formal...

- Breaking “provably correct” Leftpad : https://news.ycombinator.com/item?id=45492274 : https://lukeplant.me.uk/blog/posts/breaking-provably-correct...

- what does a lean proof prove? : https://news.ycombinator.com/item?id=46286605 : https://supaiku.com/what-does-a-lean-proof-prove

> Their real limitation continues to be converging in realistic timescales and efficient search over all combinations of proof tactics.

I agree "efficient search over all combination of proof tactics" will be a defining moment in the field. I disagree these LLMs are it.

reply
imtringued
1 day ago
[-]
For me personally LLMs have made it way easier to discover obscure or unfamiliar math concepts. It's way harder to get into a topic if you have to read through hundreds of pages and then wonder if it was time well spent, compared to just finding the bits that are interesting to you and then getting deeper into the topic as you need.
reply
panny
1 day ago
[-]
"It is difficult to get a man to understand something when his salary depends on his not understanding it."

Well, this definitely won't make the front page despite upvotes. Too many people here are heavily invested and cannot stand for it.

reply
fragmede
1 day ago
[-]
It's currently #15 on the front page.

https://hnrankings.info/46496026/

reply
panny
1 day ago
[-]
I believe you, but also believe me when I tell you it is not on my frontpage.

ctrl-F llm

https://i.imgur.com/D5K6b9c.png

Not this story.

edit: Hi fragmede, I'd reply but Nazi @dang says I'm posting too fast again. I guess you won't be working on that fully free chromebook hardware either. I'll keep trying anyway. I'm not motivated by money like you guys. Keep chasing that carrot however.

reply
fragmede
1 day ago
[-]
HN rankings put it on the front page for 15 minutes, now it's dropped to 44th. Point is, not everything is a vast conspiracy against you, as much as it would make it easier for you to believe the world is run by some shadowy conspiracy, that there's someone in charge of things.
reply