When a codebase gets too large, you eventually can't understand all of it. Even code I wrote myself, I can't fully grasp it.
In those cases, we usually write tests.
But when tests get too big, we end up writing tests for the tests.
Eventually, it feels like we're heading into an era of proofs for proofs.
For me, this problem usually unfolds like this:
1.I can't trust SDKs or Stack Overflow code.
2.So I write tests.
3.But I can't trust the tests either.
4.So I use test coverage, mutation testing, property testing, and fuzzing.
5.If that's still not enough, I add formal verification.
6.And then the problem becomes: can I trust the verifier?
That's how it ends up. Wouldn't human work shift toward verifying the verification systems themselves?
> A clear explanation can be found in Alex Kontorovich’s account of his own learning curve with formalized mathematics. In a nutshell: Mathlib, the dominant Lean library, is a human-curated formalization of an ever-growing fraction of existing human mathematics. It exposes clean APIs and abstractions, without which no autoformalization could take place. By contrast, Math Inc’s autoformalized proof of Viazovska’s results exposes no intelligible interface. Who in their right mind would merge a 200,000-line unaudited vibe-coded blob into the master branch of global human science?
https://davidbessis.substack.com/p/the-fall-of-the-theorem-e...
So why don;t they use AI to write Lean programs? That should make the AI-proofs more readily human undersrndable.
Anyone who understands type theory and how theorem provers work? It's sort of akin to saying "how do you know that a massive C++ program that compiles to machine code compiled to the correct machine code that will actually run and it's just not a random string of bits!?!?!", you know because the compilation would have failed otherwise(this is different than saying the program behaves correctly, but that's precisely the difference between formal proofs and compiled programs).
The entire argument both you and Bessis are implying is that mathematics must be human intelligible. But there's absolutely no reason to assume that every mathematical statement must have a human intelligible representation. There is also not reason to assume that if we restrict ourselves to the subset of mathematical statements that are human intelligible that this is of any use.
Just because people who don't want this to be true, and I can understand the motivation, doesn't mean that it isn't still the case.
It remains all of those things, sitting alone in its own repository of 200kLOC; what benefit comes from merging it into mathlib?
> There is also not reason to assume that if we restrict ourselves to the subset of mathematical statements that are human intelligible that this is of any use.
This is obviously silly:
Things that aren’t human intelligible aren’t human usable, so the restriction is necessary to have a collection of things humans can utilize.
This is objectively false, people use things every single day they don't understand. We still have plenty of things about the world we don't understand but still find useful.
You are saying anything we know to be the case, but cannot understand why cannot be used? Can we just stop sleeping because we haven't reasoned why sleep is necessary even though we know it is necessary? I mean we still don't really understand gravity (we know how but not why)
They approximately use them with varying degrees of success, but also mistakes, broken inferences, etc.
My exact point is that your view reduces our ability to do mathematics to that broken, flawed usage and thereby undermines its utility for logical precision: mathematics is only useful because we cleanly understand it.
When you try to use mathematics without understanding, you cause disasters: stock market crashes from mispricing options, Amazon’s 2018 hiring freeze from misallocating $1B, etc.
Note: neither of your examples (sleep, gravity) are things that people intentionally use. They just happen to people.
I think it’s very telling you couldn’t think of an example.
Reasoning by analogy...
There are of course all the computer-assisted proofs (see 4 color theorem), as well as the partially-assisted ones (see Viazovska et al on packing problems in dimensions 8, 24). But even finding a solution numerically, then rigorously verifying its properties can leave a lingering sense of incompleteness, of a gap in understanding. I like this one quote by (allegedly) Wigner that illustrates it well:
"It is nice to know that the computer understands the problem, but I would like to understand the problem, too."
I don’t see that LLMs will fundamentally change this, but rather accelerate the speed of mathematical research.
Some computer generated proofs might of course be hard to understand, but at least their existence gives another data point work with.
Doing Mathematics is more than proving something, that’s just the end of a long road spent pondering at one’s desk about how things could work out.
But why should it be the case that this is always possible?
It's entirely reasonable that the set of useful mathematical proofs is a proper superset of human intelligible useful proofs.
In fact, to argue the contrary would imply there is something incredibly remarkable about human cognition.
Just that the set of proofs a human can interpret and the set of statements a human can understand overlap; conversely, you require that the statements/theorems humans can understand be a larger class than the proofs they can understand.
To me, it’s not obvious which of those should be true:
- can we only understand theorems for which we comprehend their proof?
- or can we understand theorems despite not comprehending the proof structure?
Within the mathematics community, opinions differ. But you’re elevating your perspective on that question into a law, without any evidence.
I don't know what your distinction between "understand" and "comprehend" but my point was not about these words, but about being "useful" and being "understandable".
I'm saying there's no relationship between a mathematical statement being useful and it being understandable.
If it is true that "understanding is a prerequisite for usefulness" (where "understanding" means that a statement can be proven in a way that is intelligible to humans) was a property of mathematical expressions, then this fact would certainly be useful (we could exclude any statements that no human understand from the world of useful mathematical expression). But, by that definition, we would need to understand that statement, so you would need to be able to prove that "understanding is a prerequisite for usefulness" in a human intelligible way.
Now what I just wrote is in itself not a proof that we can't know, but proving the above statement would involve expressing the claim in a mathematically verifiable way that was also understandable by humans, which does imply something remarkable about human cognition (something that would be intelligible no less!)
But a montage about weight lifting does not a body builder make.
If you don’t understand the problem you can’t be sure that the computer does.
But I also definitely don't understand the problem if I can't get the computer to understand it, with tests.
In some sense I always considered programming to be more trustworthy than maths arguments without the certainty of a solver proof.
With all of these questions in the air, epistemology might be making a comeback.
> In some sense I always considered programming to be more trustworthy than maths arguments without the certainty of a solver proof.
But programming is a subset of mathematics. They are both formal languages. I suspect the trustworthiness is more in your comfort level than the ability to verifyType theory can also be an independent synthetic foundation atop which you build mathematics.
A fundamental property of any formal proof is that it can be checked by a fairly stupid machine, automatically, because every step is a simple mechanical operation that names one of a handful of axioms and refers to a handful of earlier steps, the truth of which has already been established. So while coming up with a proof may require genius-level thinking, checking an existing fully fleshed out proof is simple -- just potentially very tedious because of the sheer number of steps.
That said, a typical human-written proof omits many steps considered "obvious" to a trained mathematician. Converting this to a formal proof involves interpreting what the original author "must have meant", which requires a lot of expertise and can go wrong -- or it may reveal that there is some inconsistency in the original claim itself.
The controversy around Mochizuki and the "abc Conjecture" proof is a contrary example.
> “Programming is understanding. If you don't understand what you are doing, you are not programming. You are generating text.”
Perhaps a proof without understanding is just generating numbers.
in medicine they use all kinds of drugs which they don't really understand how they work. anesthetics is a great example
Priests were interpreting the oracles (at least at a place like Delphi) according to the context of the people asking the questions aka participating in politics of that ancient times.
Subjectivity was a feature and I’m not sure that fits to mathematics though.
I wonder if mathematics as a science field moves more into engineering or if a different branch will emerge that is closer to that because to the point of the article, science is about understanding not just results.
This is a decidedly anti-enlightenment statement.
Would some one with tokens to burn mind checking that statement out and post back. Be sure to use long dashes too.
i.e. You have to be a good engineer to understand the well generated LLM code and a program
If mathematics is human understanding of logical consequences, understanding is the priority. But if AI proves something we can't understand but can utilize, that is a different sort of useful.
We are getting awfully close to "the answer of the universe is 42" and having it not be a joke...
I couldn’t build an internal combustion engine or even a plastic box, so maybe there’s nothing wrong with this approach.
The quality of the mathematics is a function of who has authored it?
It would be great if someone could explain to me how AI improves this situation. Even if AI thinks it’s solved a problem, unless the proof is incredibly efficient and well explained, it will be difficult to verify the correctness. One hallucination in 300 steps of logic is enough to destroy the entire proof.
If you have the LLM generate Lean code, and it compiles, then the proof is correct and you don't need to bother checking its working. (You still need to check that it is proving the theorem you asked it to prove).
It's main utility is in the search step, not the verification step. The search is the bulk of the work and creativity. Separately, as the sibling commenter pointed out, it will likely get better at the verification step as well, with integrations of tools like Lean.
> One hallucination in 300 steps of logic is enough to destroy the entire proof.
The situation with human mathematicians is not much different. Eg, Wiles original proof of Fermat's Last Theorem contained errors found by reviewers, which he later repaired.
In fact, it was Wiles himself who realized there was an error.
> Some worry about the accessibility of AI tools. Traditionally, mathematicians have required little more than intuition, training, and a pen and paper to advance their field. If this slow, deliberative process is no longer valued by society, and particularly by research funders, then mathematics could become an elitist activity, only practiced by select organizations that can afford to work with proprietary AI models.
This can be true of anything LLMs do better than existing options. Because the best LLMs require enormous resources to develop, access to them can be very limited. Right now they are priced for market share. What happens when your small law firm attorney, or self-representation, goes up against a large firm with LLM expertise? Can the kid from the poor high school compete with the kid from the rich school with premium LLM access, in mathematics for example?
the poor kid always had disadvantages, had to help the family, while the rich kid could focus on the math, and maybe get into a good math place with family help
You aren't having the AI check the proof, you interactively work on the same LEAN proof, handing off between the AI assistant and having LEAN check it and provide feedback for both of you when there's a mistake.
(edit: lol didn't realize the sibling comment below is essentially my comment)
It can definitely be a good research assistant though