FilterHN

Ask HN: What explains the recent surge in LLM coding capabilities?

6 points

13 hours ago

| 2 comments

It seems like we are in the midst of another AI hype cycle. Many people are calling the current coding models an "inflection point", where now the capabilities are so high that future model growth will be explosive. I have heard serious people, like economics writer Noah Smith, make this argument [0].

But it's not just the commentariat. I have seen very serious people in software engineering and tech talk about the ways in which their coding habits have change drastically.

Benchmarks [1] alone don't seem to capture everything, although there have been jumps in the agentic sections, so maybe they actually do.

My question is; what explains these big jumps in capabilities that many serious people seem to be noticing all at once? Is it simply that we have thrown enough data and compute at the models, or instead, are labs perhaps fine-tuning models to get really good at tool calls, which leads to this new, surprising behavior?

When I explain agents to people, I usually walk them through a manual task one might go through when debugging code. You copy some code into ChatGPT, it asks you for more context, you copy some more code in, it suggests and edit, you edit and run, there is an error, so you paste that in, and so on. An agent is just an LLM in that loop which can use tools to do those things automatically. It would not be shocking to me if we took weaker models like Claude Opus 4.0 and made it 10x better at tool calls, it would be a much stronger and more impressive model. But is that all that is happening, or am I missing something big?

[0] https://substack.com/@noahpinion/p-187818379

[1] https://www.anthropic.com/news/claude-opus-4-6

▲

softwaredoug

43 minutes ago

[-]

Codex/Claude gather telemetry by default. That’s why they are subsidized. You’re giving them training data.

If you start with everything on GitHub, with maybe some manual annotated prompts for fine tuning, you get a decent base model of “if you see this code, then this other code follows” you’ll only go so far

If you can track how thousands of people actually use prompts, then the most successful tool usage patterns that result in success, then you will be able to fine tune to even more data (and train to avoid the unsuccessful ones). Now you’re training with much more data, around how people actually use the product, not theoretical scenarios.

In ML it always boils down to the training data.

▲

coder4rover

12 hours ago

[-]

Quantum computing such that permutations of code to prompt is possible as it tries to answer to some kind of statistical probability solution.