https://news.ycombinator.com/item?id=47903126
I wondered how far I could get by just choosing a random open problem and throwing it at LLMs.
Disclosure: I have no idea what the problem even refers to, let alone whether or not the output is even remotely correct. My interest is purely for testing capabilities of various models, curiousity, and entertainment.
Problem: https://www.erdosproblems.com/691
Given A\subseteq \mathbb{N} let M_A=\{ n \geq 1 : a\mid n\textrm{ for some }a\in A\} be the set of multiples of A.
Find a necessary and sufficient condition on A for M_A to have density 1.
My approach:
I used DeepSeek in Expert mode, using the same prompt as in the linked HN submission. It thought for a very long time, but I was doing other things in the background so I didn't really time it. I pressed "Continue" twice over the space of maybe 60mins. The output says it thought for about 46mins.Once it generated a proof, I asked Opus 4.7 to review it, and then entered the review into DeepSeek which made edits, corrections and refinements. This back-and-forth continued till Opus 4.7 was reasonably happy. At that point, I called in Gemini 3.1 Pro Preview, which raised issues which Opus missed. Opus acknowledged the feedback, and then I placed its feedback into Deepseek for a final round. Essentially, what Opus says Deepseek generated was a "clean exposition of a D[avenport]-E[rdos] corollary", not a new result. In all likelihood this result may already be known (Deepseek was not allowed to use the internet for this phase), or even wrong.
In "simple" terms:
The argument actually proves a stronger fact for every set \( A \) of natural numbers:
The upper density of the set \( M_A \) equals the largest possible lower density you can get from finite subsets of \( A \), and that also equals the lower density of \( M_A \).
When the upper density is 1, it forces the lower density to also be 1, so the natural (ordinary) density exists and equals 1 automatically, without needing any extra conditions.
The only non-basic part of the proof is the Davenport–Erdős theorem; everything else is simple.
In any case, these were my takeaways:- These new models seem to be surprisingly capable especially when used to in conjunction with each other, even with fairly simple prompts
- I am quite impressed by Deepseek. I'm going to review its coding ability, and may even switch completely from Anthropic
- This was a genuinely interesting exercise, even if I have no idea if any of it is correct or useful
Some other observations:
- Opus was really fast at reviewing Deepseek's output. Literally seconds
- Gemini had trouble figuring out what "Erdos 691" referred to
- The free version of ChatGPT of generated mostly useless output. I didn't include it.
Chat links below:
https://chat.deepseek.com/share/hpguvrhcxn226bi3hn
https://claude.ai/share/4f3ccad1-d862-4e37-8333-8a1ebd84b38f
https://aistudio.google.com/app/prompts?state=%7B%22ids%22:%...