A few random notes from Claude coding quite a bit last few weeks
253 points
1 day ago
| 47 comments
| twitter.com
| HN
https://xcancel.com/karpathy/status/2015883857489522876
einrealist
5 hours ago
[-]
> It's so interesting to watch an agent relentlessly work at something. They never get tired, they never get demoralized, they just keep going and trying things where a person would have given up long ago to fight another day. It's a "feel the AGI" moment to watch it struggle with something for a long time just to come out victorious 30 minutes later.

Somewhere, there are GPUs/NPUs running hot. You send all the necessary data, including information that you would never otherwise share. And you most likely do not pay the actual costs. It might become cheaper or it might not, because reasoning is a sticking plaster on the accuracy problem. You and your business become dependent on this major gatekeeper. It may seem like a good trade-off today. However, the personal, professional, political and societal issues will become increasingly difficult to overlook.

reply
cyode
1 hour ago
[-]
This quote stuck out to me as well, for a slightly different reason.

The “tenacity” referenced here has been, in my opinion, the key ingredient in the secret sauce of a successful career in tech, at least in these past 20 years. Every industry job has its intricacies, but for every engineer who earned their pay with novel work on a new protocol, framework, or paradigm, there were 10 or more providing value by putting the myriad pieces together, muddling through the ever-waxing complexity, and crucially never saying die.

We all saw others weeded out along the way for lacking the tenacity. Think the boot camp dropouts or undergrads who changed majors when first grappling with recursion (or emacs). The sole trait of stubbornness to “keep going” outweighs analytical ability, leetcode prowess, soft skills like corporate political tact, and everything else.

I can’t tell what this means for the job market. Tenacity may not be enough on its own. But it’s the most valuable quality in an employee in my mind, and Claude has it.

reply
BeetleB
56 minutes ago
[-]
This is a major concern for junior programmers. For many senior ones, after 20 (or even 10) years of tenacious work, they realize that such work will always be there, and they long ago stopped growing on that front (i.e. they had already peaked). For those folks, LLMs are a life saver.

At a company I worked for, lots of senior engineers become managers because they no longer want to obsess over whether their algorithm has an off by one error. I think fewer will go the management route.

(There was always the senior tech lead path, but there are far more roles for management than tech lead).

reply
rishabhaiover
45 minutes ago
[-]
That's just sad. Right when I found love in what I do, my work has no value anymore.
reply
test6554
35 minutes ago
[-]
Imagine a senior dev who just approves PRs, approves production releases, and prioritizes bug reports and feature requests. LLM watches for errors ceaslessly, reports an issue. Senior dev reviews the issue and assigns a severity to it. Another LLM has a backlog of features and errors to go solve, it makes a fix and submits a PR after running tests and verifying things work on its end.
reply
daxfohl
4 hours ago
[-]
I still find in these instances there's at least a 50% chance it has taken a shortcut somewhere: created a new, bigger bug in something that just happened not to have a unit test covering it, or broke an "implicit" requirement that was so obvious to any reasonable human that nobody thought to document it. These can be subtle because you're not looking for them, because no human would ever think to do such a thing.

Then even if you do catch it, AI: "ah, now I see exactly the problem. just insert a few more coins and I'll fix it for real this time, I promise!"

reply
gtowey
3 hours ago
[-]
The value extortion plan writes itself. How long before someone pitches the idea that the models explicitly almost keep solving your problem to get you to keep spending? Would you even know?
reply
password4321
17 minutes ago
[-]
First time I've seen this idea, I have a tingling feeling it might become reality sooner rather than later.
reply
sailfast
1 hour ago
[-]
That’s far-fetched. It’s in the interest of the model builders to solve your problem as efficiently as possible token-wise. High value to user + lower compute costs = better pricing power and better margins overall.
reply
d0mine
1 hour ago
[-]
> far-fetched

Remember Google?

Once it was far-fetched that they would make the search worse just to show you more ads. Now, it is a reality.

With tokens, it is even more direct. The more tokens users spend, the more money for providers.

reply
xienze
1 hour ago
[-]
> It’s in the interest of the model builders to solve your problem as efficiently as possible token-wise.

Unless you’re paying by the token.

reply
fragmede
2 hours ago
[-]
The free market proposition is that competition (especially with Chinese labs and grok) means that Anthropic is welcome to do that. They're even welcome to illegally collude with OpenAi such that ChatGPT is similarly gimped. But switching costs are pretty low. If it turns out I can one shot an issue with Qwen or Deepseek or Kimi thinking, Anthropic loses not just my monthly subscription, but everyone else's I show that too. So no, I think that's some grade A conspiracy theory nonsense you've got there.
reply
nozzlegear
11 minutes ago
[-]
And then your government bans Chinese AI for dubious national security reasons, or, hell, just to protect their own AI companies. The free market doesn't exist in a vacuum.
reply
coffeefirst
2 hours ago
[-]
It’s not that crazy. It could even happen by accident in pursuit of another unrelated goal. And if it did, a decent chunk of the tech industry would call it “revealed preference” because usage went up.
reply
hnuser123456
1 hour ago
[-]
LLMs became sycophantic and effusive because those responses were rated higher during RLHF, until it became newsworthy how obviously eager-to-please they got, so yes, being highly factually correct and "intelligent" was already not the only priority.
reply
daxfohl
1 hour ago
[-]
To be clear I don't think that's what they're doing intentionally. Especially on a subscription basis, they'd rather me maximize my value per token, or just not use them. Lulling users into using tokens unproductively is the worst possible option.

The way agents work right now though just sometimes feels that way; they don't have a good way of saying "You're probably going to have to figure this one out yourself".

reply
jrflowers
2 hours ago
[-]
This is a good point. For example if you have access to a bunch of slot machines, one of them is guaranteed to hit the jackpot. Since switching from one slot machine to another is easy, it is trivial to go from machine to machine until you hit the big bucks. That is why casinos have such large selections of them (for our benefit).
reply
krupan
1 hour ago
[-]
"for our benefit" lol! This is the best description of how we are all interacting with LLMs now. It's not working? Fire up more "agents" ala gas town or whatever
reply
thunderfork
2 hours ago
[-]
As a rational consumer, how would you distinguish between some intentional "keep pulling the slot machine" failure rate and the intrinsic failure rate?

I feel like saying "the market will fix the incentives" handwaves away the lack of information on internals. After all, look at the market response to Google making their search less reliable - sure, an invested nerd might try Kagi, but Google's still the market leader by a long shot.

In a market for lemons, good luck finding a lime.

reply
krupan
1 hour ago
[-]
FWIW, kagi is better than Google
reply
wvenable
2 hours ago
[-]
> These can be subtle because you're not looking for them

After any agent run, I'm always looking the git comparison between the new version and the previous one. This helps catch things that you might otherwise not notice.

reply
charcircuit
2 hours ago
[-]
You are using it wrong, or are using a weak model if your failure rate is over 50%. My experience is nothing like this. It very consistently works for me. Maybe there is a <5% chance it takes the wrong approach, but you can quickly steer it in the right direction.
reply
testaccount28
2 hours ago
[-]
you are using it on easy questions. some of us are not.
reply
mikkupikku
2 hours ago
[-]
I think a lot of it comes down to how well the user understands the problem, because that determines the quality of instructions and feedback given to the LLM.

For instance, I know some people have had success with getting claude to do game development. I have never bothered to learn much of anything about game development, but have been trying to get claude to do the work for me. Unsuccessful. It works for people who understand the problem domain, but not for those who don't. That's my theory.

reply
samrus
1 hour ago
[-]
It works for hard problems when the person already solves it and just needs the grunt work done

It also works for problems that have been solved a thousand times before, which impresses people and makes them think it is actually solving those problems

reply
daxfohl
39 minutes ago
[-]
Which matches what they are. They're first and foremost pattern recognition engines extraordinaire. If they can identify some pattern that's out of whack in your code compared to something in the training data, or a bug that is similar to others that have been fixed in their training set, they can usually thwack those patterns over to your latent space and clean up the residuals. If comparing pattern matching alone, they are superhuman, significantly.

"Reasoning", however, is a feature that has been bolted on with a hacksaw and duct tape. Their ability to pattern match makes reasoning seem more powerful than it actually is. If your bug is within some reasonable distance of a pattern it has seen in training, reasoning can get it over the final hump. But if your problem is too far removed from what it has seen in its latent space, it's not likely to figure it out by reasoning alone.

reply
charcircuit
26 minutes ago
[-]
>"Reasoning", however, is a feature that has been bolted on with a hacksaw and duct tape.

What do you mean by this? Especially for tasks like coding where there is a deterministic correct or incorrect signal it should be possible to train.

reply
baq
2 hours ago
[-]
Don’t use it for hard questions like this then; you wouldn’t use a hammer to cut a plank, you’d try to make a saw instead
reply
fooker
3 hours ago
[-]
> It might become cheaper or it might not

If it does not, this is going to be first technology in the history of mankind that has not become cheaper.

(But anyway, it already costs half compared to last year)

reply
ctoth
3 hours ago
[-]
> But anyway, it already costs half compared to last year

You could not have bought Claude Opus 4.5 at any price one year ago I'm quite certain. The things that were available cost half of what they did then, and there are new things available. These are both true.

I'm agreeing with you, to be clear.

There are two pieces I expect to continue: inference for existing models will continue to get cheaper. Models will continue to get better.

Three things, actually.

The "hitting a wall" / "plateau" people will continue to be loud and wrong. Just as they have been since 2018[0].

[0]: https://blog.irvingwb.com/blog/2018/09/a-critical-appraisal-...

reply
simianwords
2 hours ago
[-]
interesting post. i wonder if these people go back and introspect on how incorrect they have been? do they feel the need to address it?
reply
fooker
2 hours ago
[-]
No, people do not do that.

This is harmless when it comes to tech opinions but causes real damage in politics and activism.

People get really attached to ideals and ideas, and keep sticking to those after they fail to work again and again.

reply
simianwords
2 hours ago
[-]
i don't think it is harmless or we are incentivising people to just say whatever they want without any care for truth. people's reputations should be attached to their predictions.
reply
cogogo
2 hours ago
[-]
Some people definitely do but how do they go and address it? A fresh example in that it addresses pure misinformation. I just screwed up and told some neighbors garbage collection was delayed for a day because of almost 2ft of snow. Turns out it was just food waste and I was distracted checking the app and read the notification poorly.

I went back to tell them (do not know them at all just everyone is chattier digging out of a storm) and they were not there. Feel terrible and no real viable remedy. Hope they check themselves and realize I am an idiot. Even harder on the internet.

reply
bsder
54 minutes ago
[-]
> The "hitting a wall" / "plateau" people will continue to be loud and wrong. Just as they have been since 2018[0].

Everybody who bet against Moore's Law was wrong ... until they weren't.

And AI is the reaction to Moore's Law having broken. Nobody gave one iota of damn about trying to make programming easier until the chips couldn't double in speed anymore.

reply
twoodfin
28 minutes ago
[-]
This is exactly backwards: Dennard scaling stopped. Moore’s Law has continued and it’s what made training and running inference on these models practical at interactive timescales.
reply
peaseagee
3 hours ago
[-]
That's not true. Many technologies get more expensive over time, as labor gets more expensive or as certain skills fall by the wayside, not everything is mass market. Have you tried getting a grandfather clock repaired lately?
reply
willio58
2 hours ago
[-]
Repairing grandfather clocks isn't more expensive now because it's gotten any harder; it's because the popularity of grandfather clocks is basically nonexistent compared to anything else to tell time.
reply
esafak
2 hours ago
[-]
Instead of advancing tenuous examples you could suggest a realistic mechanism by which costs could rise, such as a Chinese advance on Taiwan, effecting TSMC, etc.
reply
simianwords
3 hours ago
[-]
"repairing a unique clock" getting costlier doesn't mean technology hasn't gotten cheaper.

check out whether clocks have gotten cheaper in general. the answer is that it has.

there is no economy of scale here in repairing a single clock. its not relevant to bring it up here.

reply
ipaddr
2 hours ago
[-]
Clocks prices have gone up since 2020. Unless a cheaper better way to make clocks has emerged inflation causes prices to grow.
reply
fooker
1 hour ago
[-]
Luxury watches have gone up, 'clocks' as a technology is cheaper than ever.

You can buy one for 90 cents on temu.

reply
ipaddr
38 minutes ago
[-]
The landing cost for that 90 cent watch has gone way up. Shipping and to some degree taxes has pushed the price higher.
reply
simianwords
1 hour ago
[-]
not true, clocks have gone down after accounting for inflation. verified using ChatGPT.
reply
ipaddr
42 minutes ago
[-]
You can't account for inflation because the price increase is inflation.
reply
simianwords
33 minutes ago
[-]
this is not true
reply
emtel
1 hour ago
[-]
Time-keeping is vastly cheaper. People don't want grandfather clocks. They want to tell time. And they can, more accurately, more easily, and much cheaper than their ancestors.
reply
groby_b
2 hours ago
[-]
No. You don't get to make "technology gets more expensive over time" statements for deprecated technologies.

Getting a bespoke flintstone axe is also pretty expensive, and has also absolutely no relevance to modern life.

These discussions must, if they are to be useful, center in a population experience, not in unique personal moments.

reply
ipaddr
2 hours ago
[-]
I purchased a 5T drive in 2019 and the price is higher now despite newer better drives going on the market since.

Not much has down in price over the last few years.

reply
groby_b
19 minutes ago
[-]
Price volatility exists.

Meanwhile the overall price of storage has been going down consistently: https://ourworldindata.org/grapher/historical-cost-of-comput...

reply
solomonb
2 hours ago
[-]
okay how about the Francis Scott Key Bridge?

https://marylandmatters.org/2025/11/17/key-bridge-replacemen...

reply
groby_b
5 minutes ago
[-]
You will get a different bridge. With very different technology. Same as "I can't repair my grandfather clock cheaply".

In general, there are several things that are true for bridges that aren't true for most technology:

* Technology has massively improved, but most people are not realizing that. (E.g. the Bay Bridge cost significantly more than the previous version, but that's because we'd like to not fall down again in the next earthquake) * We still have little idea how to reason about the cost of bridges in general. (Seriously. It's an active research topic) * It's a tiny market, with the major vendors forming an oligopoly * It's infrastructure, not a standard good * The buy side is almost exclusively governments.

All of these mean expensive goods that are completely non-repeatable. You can't build the same bridge again. And on top of that, in a distorted market.

But sure, the cost of "one bridge, please" has gone up over time.

reply
fooker
2 minutes ago
[-]
> But sure, the cost of "one bridge, please" has gone up over time.

Even if you adjust for inflation?

reply
arthurbrown
2 hours ago
[-]
Bought any RAM lately? Phone? GPU in the last decade?
reply
ipaddr
1 hour ago
[-]
The latest iphone has gone down in price? It's double. I guess the marketing is working.
reply
xnyan
9 minutes ago
[-]
"Pens are not cheaper, look at this Montblanc" is not a good faith response.

'84 Motorola DynaTAC - ~$12k AfI (adjusted for inflation)

'89 MicroTAC ~$8k AfI

'96 StarTAC ~$2k AfI

`07 iPhone ~$673 AfI

The current average smartphone sells for around $280. Phones are getting cheaper.

reply
root_axis
1 hour ago
[-]
Not true. Bitcoin has continued to rise in cost since its introduction (as in the aggregate cost incurred to run the network).

LLMs will face their own challenges with respect to reducing costs, since self-attention grows quadratically. These are still early days, so there remains a lot of low hanging fruit in terms of optimizations, but all of that becomes negligible in the face of quadratic attention.

reply
twoodfin
27 minutes ago
[-]
For Bitcoin that’s by design!
reply
krupan
1 hour ago
[-]
There are plenty of technologies that have not become cheaper, or at least not cheap enough, to go big and change the world. You probably haven't heard of them because obviously they didn't succeed.
reply
InsideOutSanta
3 hours ago
[-]
Sure, running an LLM is cheaper, but the way we use LLMs now requires way more tokens than last year.
reply
fooker
2 hours ago
[-]
10x more tokens today cost less than than half of X tokens from ~mid 2024.
reply
simianwords
3 hours ago
[-]
ok but the capabilities are also rising. what point are you trying to make?
reply
oytis
3 hours ago
[-]
That it's not getting cheaper?
reply
jstummbillig
3 hours ago
[-]
But it is, capability adjusted, which is the only way it makes sense. You can definitely produce last years capability at a huge discount.
reply
simianwords
3 hours ago
[-]
you are wrong. https://epoch.ai/data-insights/llm-inference-price-trends

this is accounting for the fact that more tokens are used.

reply
techpression
2 hours ago
[-]
The chart shows that they’re right though. Newer models cost more than older models. Sure they’re better but that’s moot if older models are not available or can’t solve the problem they’re tasked with.
reply
simianwords
2 hours ago
[-]
this is incorrect. the cost to achieve the same task by old models is way higher than by new models.

> Newer models cost more than older models

where did you see this?

reply
techpression
2 hours ago
[-]
On the link you shared, 4o vs 3.5 turbo price per 1m tokens.

There’s no such thing as ”same task by old model”, you might get comparable results or you might not (and this is why the comparison fail, it’s not a comparison), the reason you pick the newer models is to increase chances of getting a good result.

reply
simianwords
2 hours ago
[-]
> The dataset for this insight combines data on large language model (LLM) API prices and benchmark scores from Artificial Analysis and Epoch AI. We used this dataset to identify the lowest-priced LLMs that match or exceed a given score on a benchmark. We then fit a log-linear regression model to the prices of these LLMs over time, to measure the rate of decrease in price. We applied the same method to several benchmarks (e.g. MMLU, HumanEval) and performance thresholds (e.g. GPT-3.5 level, GPT-4o level) to determine the variation across performance metrics

This should answer. In your case, GPT-3.5 definitely is cheaper per token than 4o but much much less capable. So they used a model that is cheaper than GPT-3.5 that achieved better performance for the analysis.

reply
fooker
2 hours ago
[-]
OpenAI has always priced newer models lower than older ones.
reply
simianwords
1 hour ago
[-]
not true! 4o was costlier than 3.5 turbo
reply
techpression
2 hours ago
[-]
https://platform.openai.com/docs/pricing

Not according to their pricing table. Then again I’m not sure what OpenAI model versions even mean anymore, but I would assume 5.2 is in the same family as 5 and 5.2-pro as 5-pro

reply
fooker
1 hour ago
[-]
Check GPT 5.2 vs it's predecessor the 'o' series of reasoning models.
reply
asadotzler
1 hour ago
[-]
cheaper doesnt mean cheap enough to be viable after the bills come due
reply
ak_111
1 hour ago
[-]
Concorde?
reply
redox99
1 hour ago
[-]
> And you most likely do not pay the actual costs.

This is one of the weakest anti AI postures. "It's a bubble and when free VC money stops you'll be left with nothing". Like it's some kind of mystery how expensive these models are to run.

You have open weight models right now like Kimi K2.5 and GLM 4.7. These are very strong models, only months behind the top labs. And they are not very expensive to run at scale. You can do the math. In fact there are third parties serving these models for profit.

The money pit is training these models (and not that much if you are efficient like chinese models). Once they are trained, they are served with large profit margins compared to the inference cost.

OpenAI and Anthropic are without a doubt selling their API for a lot more than the cost of running the model.

reply
mikeocool
1 hour ago
[-]
To me this tenacity is often like watching someone trying to get a screw into board using a hammer.

There’s often a better faster way to do it, and while it might get to the short term goal eventually, it’s often created some long term problems along the way.

reply
crazygringo
35 minutes ago
[-]
> Somewhere, there are GPUs/NPUs running hot.

Running at their designed temperature.

> You send all the necessary data, including information that you would never otherwise share.

I've never sent data that isn't already either stored by GitHub or a cloud provider, so no difference there.

> And you most likely do not pay the actual costs.

So? Even if costs double once investor subsidies stop, that doesn't change much of anything. And the entire history of computing is that things tend to get cheaper.

> You and your business become dependent on this major gatekeeper.

Not really. Switching between Claude and Gemini or whatever new competition shows up is pretty easy. I'm no more dependent on it than I am on any of another hundred business services or providers that similarly mostly also have competitors.

reply
hahahahhaah
1 hour ago
[-]
It is also amazing seeing Linux kernel work, scheduling threads, proving interrupts and API calls all without breaking a sweat or injuring its ACL.
reply
YetAnotherNick
2 hours ago
[-]
With optimizations and new hardware, power is almost a negligible cost. You can get 5.5M tokens/s/MW[1] for kimi k2(=20M/KWH=181M tokens/$) which is 400x cheaper than current pricing. It's just Nvidia/TSMC/other manufacturers eating up the profit now because they can. My bet is that China will match current Nvidia within 5 years.

[1]: https://developer-blogs.nvidia.com/wp-content/uploads/2026/0...

reply
storystarling
1 hour ago
[-]
Electricity is negligible but the dominant cost is the hardware depreciation itself. Also inference is typically memory bandwidth bound so you are limited by how fast you can move weights rather than raw compute efficiency.
reply
daxfohl
5 hours ago
[-]
I worry about the "brain atrophy" part, as I've felt this too. And not just atrophy, but even moreso I think it's evolving into "complacency".

Like there have been multiple times now where I wanted the code to look a certain way, but it kept pulling back to the way it wanted to do things. Like if I had stated certain design goals recently it would adhere to them, but after a few iterations it would forget again and go back to its original approach, or mix the two, or whatever. Eventually it was easier just to quit fighting it and let it do things the way it wanted.

What I've seen is that after the initial dopamine rush of being able to do things that would have taken much longer manually, a few iterations of this kind of interaction has slowly led to a disillusionment of the whole project, as AI keeps pushing it in a direction I didn't want.

I think this is especially true if you're trying to experiment with new approaches to things. LLMs are, by definition, biased by what was in their training data. You can shock them out of it momentarily, whish is awesome for a few rounds, but over time the gravitational pull of what's already in their latent space becomes inescapable. (I picture it as working like a giant Sierpinski triangle).

I want to say the end result is very akin to doom scrolling. Doom tabbing? It's like, yeah I could be more creative with just a tad more effort, but the AI is already running and the bar to seeing what the AI will do next is so low, so....

reply
nemothekid
58 minutes ago
[-]
I think I should write more about but I have been feeling very similar. I've been recently exploring using claude code/codex recently as the "default", so I've decided to implement a side project.

My gripe with AI tools in the past is that the kind of work I do is large and complex and with previous models it just wasn't efficient to either provide enough context or deal with context rot when working on a large application - especially when that application doesn't have a million examples online.

I've been trying to implement a multiplayer game with server authoritative networking in Rust with Bevy. I specifically chose Bevy as the latest version was after Claude's cut off, it had a number of breaking changes, and there aren't a lot of deep examples online.

Overall it's going well, but one downside is that I don't really understand the code "in my bones". If you told me tomorrow that I had optimize latency or if there was a 1 in 100 edge case, not only would I not know where to look, I don't think I could tell you how the game engine works.

In the past, I could not have ever gotten this far without really understanding my tools. Today, I have a semi functional game and, truth be told, I don't even know what an ECS is and what advantages it provides. I really consider this a huge problem: if I had to maintain this in production, if there was a SEV0 bug, am I confident enough I could fix it? Or am I confident the model could figure it out? Or is the model good enough that it could scan the entire code base and intuit a solution? One of these three questions have to be answered or else brain atrophy is a real risk.

reply
striking
2 hours ago
[-]
It's not just brain atrophy, I think. I think part of it is that we're actively making a tradeoff to focus on learning how to use the model rather than learning how to use our own brains and work with each other.

This would be fine if not for one thing: the meta-skill of learning to use the LLM depreciates too. Today's LLM is gonna go away someday, the way you have to use it will change. You will be on a forever treadmill, always learning the vagaries of using the new shiny model (and paying for the privilege!)

I'm not going to make myself dependent, let myself atrophy, run on a treadmill forever, for something I happen to rent and can't keep. If I wanted a cheap high that I didn't mind being dependent on, there's more fun ones out there.

reply
daxfohl
2 hours ago
[-]
Businesses too. For two years it's been "throw everything into AI." But now that shit is getting real, are they really feeling so coy about letting AI run ahead of their engineering team's ability to manage it? How long will it be until we start seeing outages that just don't get resolved because the engineers have lost the plot?
reply
zamalek
16 minutes ago
[-]
> I worry about the "brain atrophy" part, as I've felt this too. And not just atrophy, but even moreso I think it's evolving into "complacency".

Not trusting the ML's output is step one here, that keeps you intellectually involved - but it's still a far cry from solving the majority of problems yourself (instead you only solve problems ML did a poor job at).

Step two: I delineate interesting and uninteresting work, and Claude becomes a pair programmer without keyboard access for the latter - I bounce ideas off of it etc. making it an intelligent rubber duck. [Edit to clarify, a caveat is that] I do not bore myself with trivialities such as retrieving a customer from the DB in a REST call (but again, I do verify the output).

reply
polytely
14 minutes ago
[-]
I feel like I'm still a couple steps behind in skill level as my lead and is trying to gain more experience I do wonder if I am shooting myself in the foot if I rely too much on AI at this stage. The senior engineer I'm trying to learn from can very effectively use ai because he has very good judgement of code quality, I feel like if I use AI too much I might lose out on chance to improve my judgement. It's a hard dilemma.
reply
gritspants
2 hours ago
[-]
My disillusionment comes from the feeling I am just cosplaying my job. There is nothing to distinguish one cosplayer from another. I am just doordashing software, at this point, and I'm not in control.
reply
krupan
1 hour ago
[-]
I've been thinking along these lines. LLMs seem to have arrived right when we were all getting addicted to reels/tic tocks/whatever. For some reason we love to swipe, swipe, swipe, until we get something funny/interesting/shocking, that gives us a short-lasting dopamine hit (or whatever chemicals it is) that feels good for about 1 second, and we want MORE, so we keep swiping.

Using an LLM is almost exactly the same. You get the occasional, "wow! I've never seen it do that before!" moments (whether that thing it just did was even useful or not), get a short hit of feel goods, and then we keep using it trying to get another hit. It keeps providing them at just the right intervals for people to keep them going just like they do with tick tock

reply
epolanski
41 minutes ago
[-]
> Like if I had stated certain design goals recently it would adhere to them, but after a few iterations it would forget again and go back to its original approach, or mix the two, or whatever.

Context management, proper prompting and clear instructions, proper documentation are still relevant.

reply
freediver
1 hour ago
[-]
My experience is the opposite - I haven't used my brain more in a while.. Typing characters was never what developers were valued for anyway. The joy of building is back too.
reply
swader999
1 hour ago
[-]
Same. I feel I need to be way more into the domain and what the user is trying to do than ever before.
reply
Imustaskforhelp
4 hours ago
[-]
> I want to say it's very akin to doom scrolling. Doom tabbing? It's like, yeah I could be more creative with just a tad more effort, but the AI is already running and the bar to seeing what the AI will do next is so low, so....

Yea exactly, Like we are just waiting so that it gets completed and after it gets completed then what? We ask it to do new things again.

Just as how if we are doom scrolling, we watch something for a minute then scroll down and watch something new again.

The whole notion of progress feels completely fake with this. Somehow I guess I was in a bubble of time where I had always end up using AI in web browsers (just as when chatgpt 3 came) and my workflow didn't change because it was free but recently changed it when some new free services dropped.

"Doom-tabbing" or complete out of the loop AI agentic programming just feels really weird to me sucking the joy & I wouldn't even consider myself a guy particular interested in writing code as I had been using AI to write code for a long time.

I think the problem for me was that I always considered myself a computer tinker before coder. So when AI came for coding, my tinkering skills were given a boost (I could make projects of curiosity I couldn't earlier) but now with AI agents in this autonomous esque way, it has come for my tinkering & I do feel replaced or just feel like my ability of tinkering and my interests and my knowledge and my experience is just not taken up into account if AI agent will write the whole code in multi file structure, run commands and then deploy it straight to a website.

I mean my point is tinkering was an active hobby, now its becoming a passive hobby, doom-tinkering? I feel like I have caught up on the feeling a bit earlier with just vibe from my heart but is it just me who feels this or?

What could be a name for what I feel?

reply
stuaxo
2 hours ago
[-]
LLMs have some terrible patterns, don't know what do ? Just chuck a class named Service in.

Have to really look out for the crap.

reply
atonse
21 hours ago
[-]
> LLM coding will split up engineers based on those who primarily liked coding and those who primarily liked building.

I’ve always said I’m a builder even though I’ve also enjoyed programming (but for an outcome, never for the sake of the code)

This perfectly sums up what I’ve been observing between people like me (builders) who are ecstatic about this new world and programmers who talk about the craft of programming, sometimes butting heads.

One viewpoint isn’t necessarily more valid, just a difference of wiring.

reply
ryandrake
5 hours ago
[-]
I noticed the same thing, but wasn't able to put it into words before reading that. Been experimenting with LLM-based coding just so I can understand it and talk intelligently about it (instead of just being that grouchy curmudgeon), and the thought in the back of my mind while using Claude Code is always:

"I got into programming because I like programming, not whatever this is..."

Yes, I'm building stupid things faster, but I didn't get into programming because I wanted to build tons of things. I got into it for the thrill of defining a problem in terms of data structures and instructions a computer could understand, entering those instructions into the computer, and then watching victoriously while those instructions were executed.

If I was intellectually excited about telling something to do this for me, I'd have gotten into management.

reply
nunez
36 minutes ago
[-]
Same same. Writing the actual code is always a huge motivator behind my side projects. Yes, producing the outcome is important, but the journey taken to get there is a lot of fun for me.

I used Claude Code to implement a OpenAI 4o-vision powered receipt scanning feature in an expense tracking tool I wrote by hand four years ago. It did it in two or three shots while taking my codebase into account.

It was very neat, and it works great [^0], but I can't latch onto the idea of writing code this way. Powering through bugs while implementing a new library or learning how to optimize my test suite in a new language is thrilling.

Unfortunately (for me), it's not hard at all to see how the "builders" that see code as a means to an end would LOVE this, and businesses want builders, not crafters.

In effect, knowing the fundamentals is getting devalued at a rate I've never seen before.

[^0] Before I used Claude to implement this feature, my workflow for processing receipts looked like this: Tap iOS Shortcut, enter the amount, snap a pic of the receipt, type up the merchant, amount and description for the expense, then have the shortcut POST that to my expenses tracking toolkit which, then, POSTs that into a Google Sheet. This feature amounted the need for me to enter the merchant and amount. Unfortunately, it often took more time to confirm that the merchant, amount and date details OpenAI provided were correct (and correct it when details were wrong, which was most of the the time) than it did to type out those details manually, so I just went back to my manual workflow. However, the temptation to just glance at the details and tap "This looks correct" was extremely high, even if the info it generated was completely wrong! It's the perfect analogue to what I've been witnessing throughout the rise of the LLMs.

reply
polishdude20
3 hours ago
[-]
What I have enjoyed about programming is being able to get the computer to do exactly what I want. The possibilities are bounded by only what I can conceive in my mind. I feel like with AI that can happen faster.
reply
testaccount28
2 hours ago
[-]
> get the computer to do exactly what I want.

> with AI that can happen faster.

well, not exactly that.

reply
viccis
1 hour ago
[-]
Same. This kind of coding feels like it got rid of the building aspect of programming that always felt nice, and it replaced it entirely with business logic concerns, product requirements, code reviews, etc. All the stuff I can generally take or leave. It's like I'm always in a meeting.

>If I was intellectually excited about telling something to do this for me, I'd have gotten into management.

Exactly this. This is the simplest and tersest way of explaining it yet.

reply
atonse
3 hours ago
[-]
Funny you say that. Because I have never enjoyed management as much as being hands on and directly solving problems.

So maybe our common ground is that we are direct problem solvers. :-)

reply
coffeeaddict1
4 hours ago
[-]
But how can you be a responsible builder if you don't have trust in the LLMs doing the "right thing"? Suppose you're the head of a software team where you've picked up the best candidates for a given project, in that scenario I can see how one is able to trust the team members to orchestrate the implementation of your ideas and intentions, with you not being intimately familiar with the details. Can we place the same trust in LLM agents? I'm not sure. Even if one could somehow prove that LLM are very reliable, the fact an AI agents aren't accountable beings renders the whole situation vastly different than the human equivalent.
reply
inerte
4 hours ago
[-]
You don't simply put a body in a seat and get software. There are entire systems enabling this trust: college, resume, samples, referral, interviews, tests and CI, monitoring, mentoring, and performance feedback.

And accountability can still exist? Is the engineer that created or reviewed a Pull Request using Claude Code less accountable then one that used PICO?

reply
chrisjj
1 minute ago
[-]
[delayed]
reply
coffeeaddict1
3 hours ago
[-]
> And accountability can still exist? Is the engineer that created or reviewed a Pull Request using Claude Code less accountable then one that used PICO?

The point is that in the human scenario, you can hold the human agents accountable. You cannot do that with AI. Of course, you as the orchestrator of agents will be accountable to someone, but you won't have the benefit of holding your "subordinates" accountable, which is what you do in a human team. IMO, this renders the whole situation vastly different (whether good or bad I'm not sure).

reply
polishdude20
2 hours ago
[-]
You can switch to another LLM provider or stop using them altogether. It's even easier than firing a developer.
reply
ipaddr
1 hour ago
[-]
It is as easy as getting rid of Microsoft Teams at your org.
reply
addisonj
5 hours ago
[-]
IMO, this isn't entirely a "new world" either, it is just a new domain where the conversation amplifies the opinions even more (weird how that is happening in a lot of places)

What I mean by that: you had compiled vs interpreted languages, you had types vs untyped, testing strategies, all that, at least in some part, was a conversation about the tradeoffs between moving fast/shipping and maintainability.

But it isn't just tech, it is also in methodologies and the words use, from "build fast and break things" and "yagni" to "design patterns" and "abstractions"

As you say, it is a different viewpoint... but my biggest concern with where are as industry is that these are not just "equally valid" viewpoints of how to build software... it is quite literally different stages of software, that, AFAICT, pretty much all successful software has to go through.

Much of my career has been spent in teams at companies with products that are undergoing the transition from "hip app built by scrappy team" to "profitable, reliable software" and it is painful. Going from something where you have 5 people who know all the ins and outs and can fix serious bugs or ship features in a few days to something that has easy clean boundaries to scale to 100 engineers of a wide range of familiarities with the tech, the problem domain, skill levels, and opinions is just really hard. I am not convinced yet that AI will solve the problem, and I am also unsure it doesn't risk making it worse (at least in the short term)

reply
dpflan
3 hours ago
[-]
“””

Much of my career has been spent in teams at companies with products that are undergoing the transition from "hip app built by scrappy team" to "profitable, reliable software" and it is painful. Going from something where you have 5 people who know all the ins and outs and can fix serious bugs or ship features in a few days to something that has easy clean boundaries to scale to 100 engineers of a wide range of familiarities with the tech, the problem domain, skill levels, and opinions is just really hard. I am not convinced yet that AI will solve the problem, and I am also unsure it doesn't risk making it worse (at least in the short term)

“””

This perspective is crucial. Scale is the great equalizer / demoralizer, scale of the org and scale of the systems. Systems become complex quickly, and verifiability of correctness and function becomes harder. Companies that built from day with AI and have AI influencing them as they scale, where does complexity begin to run up against the limitations of AI and cause regression? Or if all goes well, amplification?

reply
senderista
2 hours ago
[-]
Maybe there's an intermediate category: people who like designing software? I personally find system design more engaging than coding (even though I enjoy coding as well). That's different from just producing an opaque artifact that seems to solve my problem.
reply
mkozlows
5 hours ago
[-]
I think he's really getting at something there. I've been thinking about this a lot (in the context of trying to understand the persistent-on-HN skepticism about LLMs), and the framing I came up with[1] is top-down vs. bottom-up dev styles, aka architecting code and then filling in implementations, vs. writing code and having architecture evolve.

[1] https://www.klio.org/theory-of-llm-dev-skepticism/

reply
verdverm
5 hours ago
[-]
I think the division is more likely tied to writing. You have to fundamentally change how you do your job, from one of writing a formal language for a compiler to one of writing natural language for a junior-goldfish-memory-allstar-developer, closer to management then to contributor.

This distinction to me separates the two primary camps

reply
slaymaker1907
5 hours ago
[-]
I enjoy both and have ended up using AI a lot differently than vibe coders. I rarely use it for generating implementations, but I use it extensively for helping me understand docs/apis and more importantly, for debugging. AI saves me so much time trying to figure out why things aren’t working and in code review.

I deliberately avoid full vibe coding since I think doing so will rust my skills as a programmer. It also really doesn’t save much time in my experience. Once I have a design in mind, implementation is not the hard part.

reply
jimbokun
5 hours ago
[-]
The new LLM centered workflow is really just a management job now.

Managers and project managers are valuable roles and have important skill sets. But there's really very little connection with the role of software development that used to exist.

It's a bit odd to me to include both of these roles under a single label of "builders", as they have so little in common.

EDIT: this goes into more detail about how coding (and soon other kinds of knowledge work) is just a management task now: https://www.oneusefulthing.org/p/management-as-ai-superpower...

reply
simianwords
2 hours ago
[-]
i don't disagree. at some point LLM's might become good enough that we wouldn't need exact technical expertise.
reply
Imustaskforhelp
4 hours ago
[-]
> I enjoy both and have ended up using AI a lot differently than vibe coders. I rarely use it for generating implementations, but I use it extensively for helping me understand docs/apis and more importantly, for debugging. AI saves me so much time trying to figure out why things aren’t working and in code review.

I had felt like this and still do but man, at some point, I feel like the management churn feels real & I just feel suffering from a new problem.

Suppose, I actually end up having services literally deployed from a single prompt nothing else. Earlier I used to have AI write code but I was interested in the deployment and everything around it, now there are services which do that really neatly for you (I also really didn't give into the agent hype and mostly used browsers LLM)

Like on one hand you feel more free to build projects but the whole joy of project completely got reduced.

I mean, I guess I am one of the junior dev's so to me AI writing code on topics I didn't know/prototyping felt awesome.

I mean I was still involved in say copy pasting or looking at the code it generates. Seeing the errors and sometimes trying things out myself. If AI is doing all that too, idk

For some reason, recently I have been disinterested in AI. I have used it quite a lot for prototyping but I feel like this complete out of the loop programming just very off to me with recent services.

I also feel like there is this sense of if I buy for some AI thing, to maximally extract "value" out of it.

I guess the issue could be that I can have vague terms or have a very small text file as input (like just do X alternative in Y lang) and I am now unable to understand the architectural decisions and the overwhelmed-ness out of it.

Probably gonna take either spec-driven development where I clearly define the architecture or development where I saw something primagen do recently which is that the AI will only manipulate code of that particular function, (I am imagining it for a file as well) and somehow I feel like its something that I could enjoy more because right now it feels like I don't know what I have built at times.

When I prototype with single file projects using say browser for funsies/any idea. I get some idea of what the code kind of uses with its dependencies and functions names from start/end even if I didn't look at the middle

A bit of ramble I guess but the thing which kind of is making me feel this is that I was talking to somebody and shwocasing them some service where AI + server is there and they asked for something in a prompt and I wrote it. Then I let it do its job but I was also thinking how I would architect it (it was some detect food and then find BMR, and I was thinking first to use any api but then I thought that meh it might be hard, why not use AI vision models, okay what's the best, gemini seems good/cheap)

and I went to the coding thing to see what it did and it actually went even beyond by using the free tier of gemini (which I guess didn't end up working could be some rate limit of my own key but honestly it would've been the thing I would've tried too)

So like, I used to pride myself on the architectural decisions I make even if AI could write code faster but now that is taken away as well.

I really don't want to read AI code so much so honestly at this point, I might as well write code myself and learn hands on but I have a problem with build fast in public like attitude that I have & just not finding it fun.

I feel like I should do a more active job in my projects & I am really just figuring out what's the perfect way to use AI in such contexts & when to use how much.

Thoughts?

reply
pron
16 minutes ago
[-]
People who just let the agent code for them, how big of a codebase are you working on? How complex (i.e. is it a codebase that junior programmers could write and maintain)?
reply
aixpert
4 minutes ago
[-]
rust compiler and redux operating system with modified Qemu for Mac Vulcan metal pipe line probably not junior stuff
reply
0xbadcafebee
2 hours ago
[-]
> What happens to the "10X engineer" - the ratio of productivity between the mean and the max engineer? It's quite possible that this grows a lot.

I was thinking about this the other day as relates to the DevOps movement.

The DevOps movement started as a way to accelerate and improve the results of dev<->ops team dynamics. By changing practices and methods, you get acceleration and improvement. That creates "high-performing teams", which is the team form of a 10x engineer. Whether or not you believe in '10x engineers', a high-performing team is real. You really can make your team deploy faster, with fewer bugs. You have to change how you all work to accomplish it, though.

To get good at using AI for coding, you have to do the same thing: continuous improvement, changing workflows, different designs, development of trust through automation and validation. Just like DevOps, this requires learning brand new concepts, and changing how a whole team works. This didn't get adopted widely with DevOps because nobody wanted to learn new things or change how they work. So it's possible people won't adapt to the "better" way of using AI for coding, even if it would produce a 10x result.

If we want this new way of working to stick, it's going to require education, and a change of engineering culture.

reply
oxag3n
35 minutes ago
[-]
> Atrophy. I've already noticed that I am slowly starting to atrophy my ability to write code manually... > Largely due to all the little mostly syntactic details involved in programming, you can review code just fine even if you struggle to write it.

Until you struggle to review it as well. Simple exercise to prove it - ask LLM to write a function in familiar programming language, but in the area you didn't invest learning and coding yourself. Try reviewing some code involving embedding/SIMD/FPGA without learning it first.

reply
sleazebreeze
31 minutes ago
[-]
People would struggle to review code in a completely unfamiliar domain or part of the stack even before LLMs.
reply
piskov
4 minutes ago
[-]
That’s why you need to write code to learn it.

No-one has ever learned skill just by reading/observing

reply
chrisjj
10 minutes ago
[-]
No, because they wouldn't be so foolish as to try it.
reply
porise
5 hours ago
[-]
I wish the people who wrote this let us know what king of codebases they are working on. They seem mostly useless in a sufficiently large codebase especially when they are messy and interactions aren't always obvious. I don't know how much better Claude is than ChatGPT, but I can't get ChatGPT to do much useful with an existing large codebase.
reply
CameronBanga
5 hours ago
[-]
This is an antidotal example, but I released this last week after 3 months of work on it as a "nights and weekdends" project: https://apps.apple.com/us/app/skyscraper-for-bluesky/id67541...

I've been working in the mobile space since 2009, though primarily as a designer and then product manager. I work in kinda a hybrid engineering/PM job now, and have never been a particularly strong programmer. I definitely wouldn't have thought I could make something with that polish, let alone in 3 months.

That code base is ~98% Claude code.

reply
bee_rider
5 hours ago
[-]
I don’t know if “antidotal example” is a pun or a typo but I quite like it.
reply
CameronBanga
5 hours ago
[-]
Lol typing on my phone during lunch and meant anecdotal. But let's leave it anyways. :)
reply
oasisbob
5 hours ago
[-]
That is fun.

Not sure if it's an American pronunciation thing, but I had to stare at that long and hard to see the problem and even after seeing it couldn't think of how you could possibly spell the correct word otherwise.

reply
bsder
43 minutes ago
[-]
> Not sure if it's an American pronunciation thing

It's a bad American pronunciation thing like "Febuwary" and "nuculer".

If you pronounce the syllables correctly, "an-ec-dote", "Feb-ru-ar-y", "nu-cle-ar" the spellings follow.

English has it's fair share of spelling stupidities, but if people don't even pronounce the words correctly there is no hope.

reply
BeetleB
47 minutes ago
[-]
> They seem mostly useless in a sufficiently large codebase especially when they are messy and interactions aren't always obvious.

What type of documents do you have explaining the codebase and its messy interactions, and have you provided that to the LLM?

Also, have you tried giving someone brand new to the team the exact same task and information you gave to the LLM, and how effective were they compared to the LLM?

> I don't know how much better Claude is than ChatGPT, but I can't get ChatGPT to do much useful with an existing large codebase.

As others have pointed out, from your comment, it doesn't sound like you've used a tool dedicated for AI coding.

(But even if you had, it would still fail if you expect LLMs to do stuff without sufficient context).

reply
keerthiko
5 hours ago
[-]
Almost always, notes like these are going to be about greenfield projects.

Trying to incorporate it in existing codebases (esp when the end user is a support interaction or more away) is still folly, except for closely reviewed and/or non-business-logic modifications.

That said, it is quite impressive to set up a simple architecture, or just list the filenames, and tell some agents to go crazy to implement what you want the application to do. But once it crosses a certain complexity, I find you need to prompt closer and closer to the weeds to see real results. I imagine a non-technical prompter cannot proceed past a certain prototype fidelity threshold, let alone make meaningful contributions to a mature codebase via LLM without a human engineer to guide and review.

reply
reubenmorais
4 hours ago
[-]
I'm using it on a large set of existing codebases full of extremely ugly legacy code, weird build systems, tons of business logic and shipping directly to prod at neckbreaking growth over the last two years, and it's delivering the same type of value that Karpathy writes about.
reply
jjfoooo4
4 hours ago
[-]
That was true for me, but is no longer.

It's been especially helpful in explaining and understanding arcane bits of legacy code behavior my users ask about. I trigger Claude to examine the code and figure out how the feature works, then tell it to update the documentation accordingly.

reply
chrisjj
7 minutes ago
[-]
[delayed]
reply
1123581321
4 hours ago
[-]
These models do well changing brownfield applications that have tests because the constraints on a successful implementation are tight. Their solutions can be automatically augmented by research and documentation.
reply
gwd
1 hour ago
[-]
For me, in just the golang server instance and the core functional package, `cloc` reports over 40k lines of code, not counting other supporting packages. I spent the last week having Claude rip out the external auth system and replace it with a home-grown one (and having GPT-codex review its changes). If anything, Claude makes it easier on me as a solo founder with a large codebase. Rather than having to re-familiarize myself with code I wrote a year ago, I describe it at a high level, point Claude to a couple of key files, and then tell it to figure out what it needs to do. It can use grep, language server, and other tools to poke around and see what's going on. I then have it write an "epic" in markdown containing all the key files, so that future sessions already know the key files to read.

I really enjoyed the process. As TFA says, you have to keep a close eye on it. But the whole process was a lot less effort, and I ended up doing mor than I would otherwise have done.

reply
danielvaughn
5 hours ago
[-]
It's important to understand that he's talking about a specific set of models that were release around november/december, and that we've hit a kind of inflection point in model capabilities. Specifically Anthropic's Opus 4.5 model.

I never paid any attention to different models, because they all felt roughly equal to me. But Opus 4.5 is really and truly different. It's not a qualitative difference, it's more like it just finally hit that quantitative edge that allows me to lean much more heavily on it for routine work.

I highly suggest trying it out, alongside a well-built coding agent like the one offered by Claude Code, Cursor, or OpenCode. I'm using it on a fairly complex monorepo and my impressions are much the same as Karpathy's.

reply
TaupeRanger
5 hours ago
[-]
Claude and Codex are CLI tools you use to give the LLM context about the project on your local machine or dev environment. The fact that you're using the name "ChatGPT" instead of Codex leads me to believe you're talking about using the web-based ChatGPT interface to work on a large codebase, which is completely beside the point of the entire discussion. That's not the tool anyone is talking about here.
reply
ph4te
5 hours ago
[-]
I don't know how big sufficiently large codebase is, but we have a 1mil loc Java application, that is ~10years old, and runs POS systems, and Claude Code has no issues with it. We have done full analyses with output details each module, and also used it to pinpoint specific issues when described. Vibe coding is not used here, just analysis.
reply
epolanski
30 minutes ago
[-]
1. Write good documentation, architecture, how things work, code styling, etc.

2. Put your important dependencies source code in the same directory. E.g. put a `_vendor` directory in the project, in it put the codebase at the same tag you're using or whatever: postgres, redis, vue, whatever.

3. Write good plans and requirements. Acceptance criteria, context, user stories, etc. Save them in markdown files. Review those multiple times with LLMs trying to find weaknesses. Then move to implementation files: make it write a detailed plan of what it's gonna change and why, and what it will produce.

4. Write very good prompts. LLMs follow instructions well if they are clear "you should proactively do X", is a weak instruction if you mean "you must do X".

5. LLMs are far from perfect, and full of limits. Karpathy sums their cons very well in his long list. If you don't know their limits you'll mismanage the expectations and not use them when they are a huge boost and waste time on things they don't cope well with. On top of that: all LLMs are different in their "personality", how they adhere to instruction, how creative they are, etc.

reply
smusamashah
1 hour ago
[-]
The code base I work on at $dayjob$ is legacy, has few files with 20k lines each and a few more with around 10k lines each. It's hard to find things and connect dots in the code base. Dont think LLMs able to navigate and understand code bases of that size yet. But have seen lots of seemingly large projects shown here lately that involve thousands of files and millions of lines of code.
reply
jumploops
1 hour ago
[-]
I’ve found that LLMs seem to work better on LLM-generated codebases.

Commercial codebases, especially private internal ones, are often messy. It seems this is mostly due to the iterative nature of development in response to customer demands.

As a product gets larger, and addresses a wider audience, there’s an ever increasing chance of divergence from the initial assumptions and the new requirements.

We call this tech debt.

Combine this with a revolving door of developers, and you start to see Conway’s law in action, where the system resembles the organization of the developers rather than the “pure” product spec.

With this in mind, I’ve found success in using LLMs to refactor existing codebases to better match the current requirements (i.e. splitting out helpers, modularizing, renaming, etc.).

Once the legacy codebase is “LLMified”, the coding agents seem to perform more predictably.

YMMV here, as it’s hard to do large refactors without tests for correctness.

(Note: I’ve dabbled with a test first refactor approach, but haven’t gone to the lengths to suggest it works, but I believe it could)

reply
olig15
22 minutes ago
[-]
Surely because LLM generated code is part of the training data for the model, so code/patterns it can work with is closer to its training data.
reply
tunesmith
5 hours ago
[-]
If you have a ChatGPT account, there's nothing stopping you from installing codex cli and using your chatgpt account with it. I haven't coded with ChatGPT for weeks. Maybe a month ago I got utility out of coding with codex and then having ChatGPT look at my open IDE page to give comments, but since 5.2 came out, it's been 100% codex.
reply
redox99
59 minutes ago
[-]
What do you even mean by "ChatGPT"? Copy pasting code into chatgpt.com?

AI assisted coding has never been like that, which would be atrocious. The typical workflow was using Cursor with some model of your choice (almost always an Anthropic model like sonnet before opus 4.5 released). Nowadays (in addition to IDEs) it's often a CLI tool like Claude Code with Opus or Codex CLI with GPT Codex 5.2 high/xhigh.

reply
bluGill
2 hours ago
[-]
I've been trying Claude on my large code base today. When I give it the requirements I'd give an engineer and so "do it" it just writes garbage that doesn't make sense and doesn't seem to even meet the requirements (if it does I can't follow how - though I'll admit to giving up before I understood what it did, and I didn't try it on a real system). When I forced it to step back and do tiny steps - in TDD write one test of the full feature - it did much better - but then I spent the next 5 hours adjusting the code it wrote to meet our coding standards. At least I understand the code, but I'm not sure it is any faster (but it is a lot easier to see things wrong than come up with green field code).

Which is to say you have to learn to use the tools. I've only just started, and cannot claim to be an expert. I'll keep using them - in part because everyone is demanding I do - but to use them you clearly need to know how to do it yourself.

reply
simonw
2 hours ago
[-]
Have you tried showing it a copy of your coding standards?

I also find pointing it to an existing folder full of code that conforms to certain standards can work really well.

reply
bflesch
2 hours ago
[-]
Yeah let's share all your IP for the vague promise that it will somehow work ;)
reply
simonw
1 hour ago
[-]
You just gave me a revelation as to why some people report being unable to get decent results out of coding agents!
reply
rob
1 hour ago
[-]
I've been playing around with the "Superpowers" [0] plugin in Claude Code on a new small project and really like it. Simple enough to understand quickly by reading the GitHub repo and seems to improve the output quality of my projects.

There's basically a "brainstorm" /slash command that you go back and forth with, and it places what you came up with in docs/plans/YYYY-MM-DD-<topic>-design.md.

Then you can run a "write-plan" /slash command on the docs/plans/YYYY-MM-DD-<topic>-design.md file, and it'll give you a docs/plans/YYYY-MM-DD-<topic>-implementation.md file that you can then feed to the "execute-plan" /slash command, where it breaks everything down into batches, tasks, etc, and actually implements everything (so three /slash commands total.)

There's also "GET SHIT DONE" (GSD) [1] that I want to look at, but at first glance it seems to be a bit more involved than Superpowers with more commands. Maybe it'd be better for larger projects.

[0] https://github.com/obra/superpowers

[1] https://github.com/glittercowboy/get-shit-done

reply
Okkef
5 hours ago
[-]
Try Claude code. It’s different.

After you tried it, come back.

reply
Imustaskforhelp
4 hours ago
[-]
I think its not Claude code per se itself but rather the (Opus 4.5 model?) or something in an agentic workflow.

I tried a website which offered the Opus model in their agentic workflow & I felt something different too I guess.

Currently trying out Kimi code (using their recent kimi 2.5) for the first time buying any AI product because got it for like 1.49$ per month. It does feel a bit less powerful than claude code but I feel like monetarily its worth it.

Y'know you have to like bargain with an AI model to reduce its pricing which I just felt really curious about. The psychology behind it feels fascinating because I think even as a frugal person, I already felt invested enough in the model and that became my sunk cost fallacy

Shame for me personally because they use it as a hook to get people using their tool and then charge next month 19$ (I mean really Cheaper than claude code for the most part but still comparative to 1.49$)

reply
languid-photic
5 hours ago
[-]
They build Claude Code fully with Claude Code.
reply
Macha
4 hours ago
[-]
Which is equal parts praise and damnation. Claude Code does do a lot of nice things that people just kind of don't bother for time cost / reward when writing TUIs that they've probably only done because they're using AI heavily, but equally it has a lot of underbaked edges (like accidentally shadowing the user's shell configuration when it tries to install terminal bindings for shift-enter even though the terminal it's configuring already sends a distinct shift-enter result), and bugs (have you ever noticed it just stop, unfinished?).
reply
simianwords
2 hours ago
[-]
i haven't used Claude Code but come on.. it is a production level quality application used seriously by millions.
reply
gsk22
1 hour ago
[-]
If you haven't used it, how can you judge its quality level?
reply
xyzsparetimexyz
51 minutes ago
[-]
Look up the flickering issue. The program was created by dunces.
reply
vindex10
1 hour ago
[-]
Ah, now I understand why @autocomplete suddenly got broken between versions and still not fixed )
reply
maxdo
5 hours ago
[-]
chatGPT is not made to write code. Get out of stone age :)
reply
spaceman_2020
5 hours ago
[-]
I'm afraid that we're entering a time when the performance difference between the really cutting edge and even the three-month-old tools is vast

If you're using plain vanilla chatgpt, you're woefully, woefully out of touch. Heck, even plain claude code is now outdated

reply
shj2105
4 hours ago
[-]
Why is plain Claude code outdated? I thought that’s what most people are using right now that are AI forward. Is it Ralph loops now that’s the new thing?
reply
spaceman_2020
1 hour ago
[-]
Plain Claude Code doesn’t have enough scaffolding to handle large projects

At a base level, people are “upgrading” their Claude Code with custom skills and subagents - all text files saved in .claude/agents|skills.

You can also use their new tasks primitive to basically run a Ralph-like loop

But at the edges, people are using multiple instances, each handling different aspects in parallel - stuff like Gas Town

Tbf you can still get a lot of mileage out of vanilla Claude Code. But I’ve found that even adding a simple frontend design skill improves the output substantially

reply
twa927
2 hours ago
[-]
I don't see the AI capacity jump in the recent months at all. For me it's more the opposite, CC works worse than a few months ago. Keeps forgetting the rules from CLAUDE.md, hallucinates function calls, generates tons of over-verbose plans, generates overengineered code. Where I find it a clear net-positive is pure frontend code (HTML + Tailwind), it's spaghetti but since it's just visualization, it's OK.
reply
ValentineC
2 hours ago
[-]
> Where I find it a clear net-positive is pure frontend code (HTML + Tailwind), it's spaghetti but since it's just visualization, it's OK.

This makes it sound like we're back in the days of FrontPage/Dreamweaver WYSIWYG. Goodness.

reply
twa927
2 hours ago
[-]
Hmm, your comment gave me the idea that maybe we should invent "What You Describe Is What You Get|. To replace HTML+Tailwind spaghetti with prompts generating it.
reply
epolanski
42 minutes ago
[-]
> What happens to the "10X engineer" - the ratio of productivity between the mean and the max engineer? It's quite possible that this grows a lot.

No doubt that good engineers will know when and how to leverage the tool, both for coding and improving processes (design-to-code, requirement collection, task tracking, basic code reviewal, etc) improving their own productivity and of those around them.

Motivated individuals will also leverage these tools to learn more and faster.

And yes, of course it's not the only tool one should use, of course there's still value in talking with proper human experts to learn from, etc, but 90% of the time you're looking for info the LLM will dig it from you reading at the source code of e.g. Postgres and its test rather than asking on chats/stack overflow.

This is a trasformative technology that will make great engineers even stronger, but it will weed out those who were merely valued for their very basic capability of churning something but never cared neither about engineering nor coding, which is 90% of our industry.

reply
neuralkoi
2 hours ago
[-]
> The most common category is that the models make wrong assumptions on your behalf and just run along with them without checking.

If current LLMs are ever deployed in systems harboring the big red button, they WILL most definitely somehow press that button.

reply
arthurcolle
2 hours ago
[-]
US MIC are already planning on integrating fucking Grok into military systems. No comment.
reply
groby_b
2 hours ago
[-]
fwiw, the same is true for humans. Which is why there's a whole lot of process and red tape around that button. We know how to manage risk. We can choose to do that for LLM usage, too.

If instead we believe in fantasies of a single all-knowing machine god that is 100% correct at all times, then... we really just have ourselves to blame. Might as well just have spammed that button by hand.

reply
jimbokun
5 hours ago
[-]
I'm pretty happy with Copilot in VS Code. Type what change I want Claude to make in the Copilot panel, and then use the VS Code in context diffs to accept or reject the proposed changes. While being able to make other small changes on my own.

So I think this tracks with Karpathy's defense of IDEs still being necessary ?

Has anyone found it practical to forgo IDEs almost entirely?

reply
simonw
3 hours ago
[-]
Are you letting it run your tests and run little snippets of code to try them out (like "python -c 'import module; print(module.something())'") or are you just using it to propose diffs for you to accept or reject?

This stuff gets a whole lot more interesting when you let it start making changes and testing them by itself.

reply
vmbm
5 hours ago
[-]
I have been assigning issues to copilot in Github. It will then create a pull request and work on and report back on the issue in the PR. I will pull the code and make small changes locally using VSCode when needed.

But what I like about this setup is that I have almost all the context I need to review the work in a single PR. And I can go back and revisit the PR if I ever run into issues down the line. Plus you can run sessions in parallel if needed, although I don't do that too much.

reply
maxdo
5 hours ago
[-]
Coplilot is not on par with cc or cursor even
reply
jimbokun
5 hours ago
[-]
I use it to access Claude. So what's the difference?
reply
nsingh2
3 hours ago
[-]
This stuff is a little messy and opaque, but the performance of the same model in different harnesses depends a lot on how context is managed. The last time I tried Copilot, it performed markedly worse for similar tasks compared to Claude Code. I suspect that Copilot was being very aggressive in compressing context to save on token cost, but I'm not 100% certain about this.

Also note that with Claude models, Copilot might allocate a different number of thinking tokens compared to Claude Code.

Things may have changed now compared to when I tried it out, these tools are in constant flux. In general I've found that harnesses created by the model providers (OpenAI/Codex CLI, Anthropic/Claude Code, Google/Gemini CLI) tend to be better than generalist harnesses (cheaper too, since you're not paying a middleman).

reply
walthamstow
2 hours ago
[-]
Different harnesses and agentic environments produce different results from the same model. Claude Code and Cursor are the best IME and Copilot is by far the worst.
reply
WA
5 hours ago
[-]
Why not? You can select Opus 4.5, Gemini 3 Pro, and others.
reply
spaceman_2020
5 hours ago
[-]
Claude Code is a CLI tool which means it can do complete projects in a single command. Also has fantastic tools for scaffolding and harnessing the code. You can define everything from your coding style to specific instructions for designing frontpages, integrating payments, etc.

It's not about the model. It's about the harness

reply
binarycrusader
1 hour ago
[-]
Claude Code is a CLI tool which means it can do complete projects in a single command

https://github.com/features/copilot/cli/

reply
piker
2 hours ago
[-]
This would make some sense if VS Code didn't have a terminal built into it. The LLMs have the same bash capabilities in either form.
reply
maxdo
5 hours ago
[-]
it's not a model limit anymore, it's tools , skills, background agents, etc. It's an entire agentic environment.
reply
illnewsthat
5 hours ago
[-]
Github copilot has support for this stuff as well. Agent skills, background/subagents, etc.
reply
nsb1
2 hours ago
[-]
The best thing I ever told Claude to do was "Swear profusely when discussing code and code changes". Probably says more about me than Claude, but it makes me snicker.
reply
thomassmith65
32 minutes ago
[-]

  Slopacolypse. I am bracing for 2026 as the year of the slopacolypse across all of github, substack, arxiv, X/instagram, and generally all digital media.
Did he coin the term "slopacolypse"? It's a useful one.
reply
chrisjj
4 minutes ago
[-]
[delayed]
reply
Macha
4 hours ago
[-]
> - What does LLM coding feel like in the future? Is it like playing StarCraft? Playing Factorio? Playing music?

Starcraft and Factorio are exactly what it is not. Starcraft has a loooot of micro involved at any level beyond mid level play, despite all the "pro macros and beats gold league with mass queens" meme videos. I guess it could be like Factorio if you're playing it by plugging together blueprint books from other people but I don't think that's how most people play.

At that level of abstraction, it's more like grand strategy if you're to compare it to any video game? You're controlling high level pushes and then the units "do stuff" and then you react to the results.

reply
zetazzed
1 hour ago
[-]
It's like the Victoria 3 combat system. You just send an army and a general to a given front and let them get to work with no micro. Easy! But of course some percentage of the time they do something crazy like deciding to redeploy from your existential Franco-Prussian war front to a minor colonial uprising...
reply
siliconc0w
1 hour ago
[-]
Not sure how he is measuring, I'm still closer to about a 60% success rate. It's more like 20% is an acceptable one-shot, this goes to 60% acceptable with some iteration, but 40% either needs manual intervention to succeed or such significant iteration that manual is likely faster.

I can supervise maybe three agents in parallel before a task requiring significant hand-holding means I'm likely blocking an agent.

And the time an agent is 'restlessly working' on something in usually inversely correlated with the likelihood to succeed. Usually if it's going down a rabbit hole, the correct thing to do is to intervene and reorient it.

reply
tomlockwood
4 minutes ago
[-]
Oh wow! Guy who's current project depends on AI being good is talking about AI being good.

Interesting.

reply
rileymichael
5 hours ago
[-]
> LLM coding will split up engineers based on those who primarily liked coding and those who primarily liked building

as the former, i've never felt _more ahead_ than now due to all of the latter succumbing to the llm hype

reply
alexose
1 hour ago
[-]
It's refreshing to see one of the top minds in AI converge on the same set of thoughts and frustrations as me.

For as fast as this is all moving, it's good to remember that most of us are actually a lot closer to the tip of the spear than we think.

reply
onetimeusename
5 hours ago
[-]
> the ratio of productivity between the mean and the max engineer? It's quite possible that this grows *a lot*

I have a professor who has researched auto generated code for decades and about six months ago he told me he didn't think AI would make humans obsolete but that it was like other incremental tools over the years and it would just make good coders even better than other coders. He also said it would probably come with its share of disappointments and never be fully autonomous. Some of what he said was a critique of AI and some of it was just pointing out that it's very difficult to have perfect code/specs.

reply
slfreference
5 hours ago
[-]
I can sense two classes of coders emerging.

Billionaire coder: a person who has "written" billion lines.

Ordinary coders : people with only couple of thousands to their git blame.

reply
ositowang
47 minutes ago
[-]
It’s a great and insightful review—not over-hyping the coding agent, and not underestimating it either. It acknowledges both its usefulness and its limitations. Embracing it and growing with it is how I see it too.
reply
strogonoff
5 hours ago
[-]
LLM coding splits up engineers based on those who primarily like building and those who primarily like code reviews and quality assessment. I definitely don’t love the latter (especially when reviewing decisions not made by a human with whom I can build long-term personal rapport).

After certain experience threshold of making things from scratch, “coding” (never particularly liked that term) has always been 99% building, or architecture, and I struggle to see how often a well-architected solution today, with modern high-level abstractions, requires so much code that you’d save significant time and effort by not having to just type, possibly with basic deterministic autocomplete, exactly what you mean (especially considering you would have to also spend time and effort reviewing whatever was typed for you if you used a non-deterministic autocomplete).

reply
OkayPhysicist
5 hours ago
[-]
See, I don't take it that extreme: LLMs make fantastic, never-before seen quality autocompletes. I hacked together a Neovim plugin that prompts an LLM to "finish this function" on command, and it's a big time save for the menial plumbing type operations. Think things like "this api I use expects JSON that encodes some subset of SQL, I want all the dogs with Ls in their name that were born on a Tuesday". Given an example of such API (or if the documentation ended up in its training), LLMs will consistently one-shot stuff like that.

Asking it to do entire projects? Dumb. You end up with spaghetti, unless you hand-hold it to a point that you might as well be using my autocomplete method.

reply
appstorelottery
42 minutes ago
[-]
> Atrophy. I've already noticed that I am slowly starting to atrophy my ability to write code manually.

I've been increasingly using LLM's to code for nearly two years now - and I can definitely notice my brain atrophy. It bothers me. Actually over the last few weeks I've been looking at a major update to a product in production & considered doing the edits manually - at least typing the code from the LLM & also being much more granular with my instructions (i.e. focus on one function at a time). I feel in some ways like my brain is turning into slop & I've been coding for at least 35 years... I feel validated by Karpathy.

reply
epolanski
36 minutes ago
[-]
Don't be too worried about it.

1. Manual coding may be less relevant (albeit ability to read code, interpret it and understand it will be more) in the future. Likely already is.

2. Any skill you don't practice becomes "weaker". Gonna give you an example. I play chess since my childhood, but sometimes I go months without playing it, even years. When I get back I start losing elo fast. If I was in the top 10% of chess.com, I drop to top 30% in the weeks after. But after few months I'm back at top 10%. Takeaway: your relative ability is more or less the same compared to other practitioners, you're simply rusty.

reply
appstorelottery
20 minutes ago
[-]
Thanks for your comment, it set me at ease. I know from experience that you're right on point 2. As for point one, I also tend to agree. AI is such a paradigm shift & rapid/massive change doesn't come without stress. I just need to stay cool about it all ;-)
reply
toephu2
2 hours ago
[-]
I think in less than a year writing code manually will be akin to doing arithmetic problems by hand. Sure you can still code manually, but it's going to be a lot faster to use an LLM (calculator).
reply
adamddev1
1 hour ago
[-]
People keep using these analogies but I think these are fundamentally different things.

1. hand arithmetic -> using a calculator

2. assembly -> using a high level language

3. writing code -> making an LLM write code

Number 3 does not belong. Number 3 is a fundamentally different leap because it's not based on deterministic logic. You can't depend on an LLM like you can depend on a calculator or a compiler. LLMs are totally different.

reply
kypro
1 hour ago
[-]
I agree, but writing code is so different to calculations that long-term benefits are less clear.

It doesn't matter how good you are at calculations the answer to 2 + 2 is always 4. There are no methods of solving 2 + 2 which could result in you accidentally giving everyone who reads the result of your calculation write access to your entire DB. But there are different ways to code a system even if the UI is the same, and some of these may neglect to consider permissions.

I think a good parallel here would be to imagine that tomorrow we had access to humanoid robots who could do construction work. Would we want them to just go build skyscrapers and bridges and view all construction businesses which didn't embrace the humanoid robots as akin to doing arithmetic by hand?

You could of course argue that there's no problem here so long as trained construction workers are supervising the robots to make sure they're getting tolerances right and doing good welds, but then what happens 10 years down the road when humans haven't built a building in years? If people are not writing code any more then how can people be expected to review AI generated code?

I think the optimistic picture here is that humans just won't be needed in the future. In theory when models are good enough we should be able to trust the AI systems more than humans. But the less optimistic side of me questions a future in which humans no longer do, or even know how to do such fundamental things.

reply
nsainsbury
2 hours ago
[-]
Touching on the atrophy point, I actually wrote a few thoughts about this yesterday: https://www.neilwithdata.com/outsourced-thinking

I actually disagree with Andrej here re: "Generation (writing code) and discrimination (reading code) are different capabilities in the brain." and I would argue that the only reason he can read code fluently, find issues, etc. is because he has spent year in a non-AI assisted world writing code. As time goes on, he will become substantially worse.

This also bodes incredibly poorly for the next generation, who will mostly in their formative years now avoid writing code and thus fail to even develop a idea of what good code is, how it works/why it works, why you make certain decisions, and not others, etc. and ultimately you will see them become utterly dependent on AI, unable to make progress without it.

IMO outsourcing thinking is going to have incredibly negative consequences for the world at large.

reply
thoughtpeddler
1 hour ago
[-]
Read your blog post and agree with some of it. Largely I agree with the premise that the 2nd and 3rd order effects of this technology will be more impactful than the 1st order “I was able to code this app I wouldn’t have otherwise even attempted to”. But they are so hard to predict!
reply
gwd
1 hour ago
[-]
Is coding like piloting, where pilots need a certain number of hours of "flight time" to gain skills, and then a certain number of additional hours each year to maintain their skills? Do developers need to schedule in a certain number of "manually written lines of code" every year?
reply
all2well
2 hours ago
[-]
What particular setups are getting folks these sorts of results? If there’s a way I could avoid all the babysitting I have to do with AI tools that would be welcome
reply
geraneum
2 hours ago
[-]
> If there’s a way I could avoid all the babysitting I have to do with AI tools that would be welcome

OP mentions that they are actually doing the “babysitting”

reply
spongebobstoes
2 hours ago
[-]
i use codex cli. work on giving it useful skills. work on the other instruction files. take Karpathy tips around testing and declarativeness

use many simultaneously, and bounce between them to unblock them as needed

build good tools and tests. you will soon learn all the things you did manually -- script them all

reply
TheGRS
4 hours ago
[-]
I do feel a big mood shift after late November. I switched to using Cursor and Gemini primarily and it was big change in my ability to get my ideas into code effectively. The Cursor interface for one got to a place that I really like and enjoy using, but its probably more that the results from the agents themselves are less frustrating. I can deal with the output more now.

I'm still a little iffy on the agent swarm idea. I think I will need to see it in action in an interface that works for me. To me it feels like we are anthropomorphizing agents too much, and that results in this idea that we can put agents into roles and them combine them into useful teams. I can't help seeing all agents as the same automatons and I have trouble understanding why giving an agent with different guideliens to follow, and then having them follow along another agent would give me better results than just fixing the context in the first place. Either that or just working more on the code pipeline to spot issues early on - all the stuff we already test for.

reply
daxfohl
4 hours ago
[-]
I'm curious to see what effect this change has on leadership. For the last two years it's been "put everything you can into AI coding, or else!" with quotas and firings and whatever else. Now that AI is at the stage where it can actually output whole features with minimal handholding, is there going to be a Frankenstein moment where leadership realizes they now have a product whose codebase is running away from their engineering team's ability to support it? Does it change the calculus of what it means to be underinvested vs overinvested in AI, and what are the implications?
reply
fishtoaster
5 hours ago
[-]
> if you have any code you actually care about I would watch them like a hawk, in a nice large IDE on the side.

This is about where I'm at. I love pure claude code for code I don't care about, but for anything I'm working on with other people I need to audit the results - which I much prefer to do in an IDE.

reply
forrestthewoods
2 hours ago
[-]
HN should ban any discussion on “things I learned playing with AI” that don’t include direct artifacts of the thing built.

We’re about a year deep into “AI is changing everything” and I don’t see 10x software quality or output.

Now don’t get me wrong I’m a big fan of AI tooling and think it does meaningfully increase value. But I’m damn tired of all the talk with literally nothing to show for it or back it up.

reply
vibeprofessor
12 hours ago
[-]
The AGI vibes with Claude Code are real, but the micromanagement tax is heavy. I spend most of my time babysitting agents.

I expect interviews will evolve into "build project X with an LLM while we watch" and audit of agent specs

reply
maxdo
5 hours ago
[-]
I've been doing vibe code interviews for nearly a year now. Most people are surprisingly bad with AI tools. We specifically ask them to bring their preferred tool, yet 20–30% still just copy-paste code from ChatGPT.

fun stats: corelation is real, people who were good at vibe code, also had offer(s) with other companies that didn't run vibe code interviews.

reply
xyzsparetimexyz
47 minutes ago
[-]
Copy pasting from chatgpt is the most secure option.
reply
bflesch
2 hours ago
[-]
Interesting you say that, feels like when people were too stupid to google things and "googling something" was a skill that some had and others didn't.
reply
thefourthchime
5 hours ago
[-]
From what I've heard, what few interviews there are for software engineers these days, they do have you use models and see how quickly you can build things.
reply
iwontberude
5 hours ago
[-]
The interviews I’ve given have asked about how control for AI slop without hurting your colleagues feelings. Anyone can prompt and build, the harder part, as usual for business, is knowing how and when to say, ‘no.’
reply
0xy
12 hours ago
[-]
Sounds great to me. Leetcode is outdated and heavily abused by people who share the questions ahead of time in various forums and chats.
reply
maximedupre
5 hours ago
[-]
> It hurts the ego a bit but the power to operate over software in large "code actions" is just too net useful

It does hurt, that's why all programmers now need an entrepreneurial mindset... you become if you use your skills + new AI power to build a business.

reply
xyzsparetimexyz
46 minutes ago
[-]
What about the people who dont want to be entrepreneurs?
reply
maximedupre
30 minutes ago
[-]
They have to pivot to something else
reply
maximedupre
29 minutes ago
[-]
Or stay ahead of the curve as long as possible, e.g. work on the loop/ralphing
reply
randoglando
52 minutes ago
[-]
Senpai has taken the words out of my mouth and put them on the page.
reply
philipwhiuk
3 hours ago
[-]
> It's so interesting to watch an agent relentlessly work at something. They never get tired, they never get demoralized, they just keep going and trying things where a person would have given up long ago to fight another day. It's a "feel the AGI" moment to watch it struggle with something for a long time just to come out victorious 30 minutes later.

The bits left unsaid:

1. Burning tokens, which we charge you for

2. My CPU does this when I tell it to do bogosort on a million 32-bit integers, it doesn't mean it's a good thing

reply
jopsen
1 day ago
[-]
> - How much of society is bottlenecked by digital knowledge work?

Any qualified guesses?

I'm not convinced more traders on wall street will allocate capital more effectively leading to economic growth.

Will more programmers grow the economy? Or should we get real jobs ;)

reply
iwontberude
5 hours ago
[-]
Most of this countries challenges are strictly political. The pittance of work software can contribute is most likely negligible or destructive (e.g. software buttons in cars or palantir). In other words were picked all the low hanging fruit and all that left is to hang ourselves.
reply
js8
4 hours ago
[-]
I actually disagree. Having software (AI) that can cut through the technological stuff faster will make people more aware of political problems.
reply
iwontberude
3 hours ago
[-]
edit: country's* all that is left*
reply
tintor
2 hours ago
[-]
"you can review code just fine even if you struggle to write it."

Well, merely approving code takes no skill at all.

reply
roblh
1 hour ago
[-]
Seriously, that’s a completely nonsense line.
reply
superze
2 hours ago
[-]
I don't know about you guys but most of the time it's spitting nonsense models in sqlalchemy and I have to constantly correct it to the point where I am back at writing the code myself. The bugs are just astonishing and I lose control of the codebase after some time to the point where reviewing the whole thing just takes a lot of time.

On the contrary if it was for a job in a public sector I would just let the LLM spit out some output and play stupid, since salary is very low.

reply
rschick
1 day ago
[-]
Great point about expansion vs speedup. I now have time to build custom tools, implement more features, try out different API designs, get 100% test coverage.. I can deliver more quickly, but can also deliver more overall.
reply
hollowturtle
5 hours ago
[-]
> Coding workflow. Given the latest lift in LLM coding capability, like many others I rapidly went from about 80% manual+autocomplete coding and 20% agents in November to 80% agent coding and 20% edits+touchups in December

Anyone wondering what exactly is he actually building? What? Where?

> The mistakes have changed a lot - they are not simple syntax errors anymore, they are subtle conceptual errors that a slightly sloppy, hasty junior dev might do.

I would LOVE to have jsut syntax errors produced by LLMs, "subtle conceptual errors that a slightly sloppy, hasty junior dev might do." are neither subtle nor slightly sloppy, they actually are serious and harmful, and no junior devs have no experience to fix those.

> They will implement an inefficient, bloated, brittle construction over 1000 lines of code and it's up to you to be like "umm couldn't you just do this instead?"

Why just not hand write 100 loc with the help of an LLM for tests, documentation and some autocomplete instead of making it write 1000 loc and then clean it up? Also very difficult to do, 1000 lines is a lot.

> Tenacity. It's so interesting to watch an agent relentlessly work at something. They never get tired, they never get demoralized, they just keep going and trying things where a person would have given up long ago to fight another day.

It's a computer program running in the cloud, what exactly did he expected?

> Speedups. It's not clear how to measure the "speedup" of LLM assistance.

See above

> 2) I can approach code that I couldn't work on before because of knowledge/skill issue. So certainly it's speedup, but it's possibly a lot more an expansion.

mmm not sure, if you don't have domain knowledge you could have an initial stubb at the problem, what when you need to iterate over it? You don't if you don't have domain knowledge on your own

> Fun. I didn't anticipate that with agents programming feels more fun because a lot of the fill in the blanks drudgery is removed and what remains is the creative part.

No it's not fun, eg LLMs produce uninteresting uis, mostly bloated with react/html

> Atrophy. I've already noticed that I am slowly starting to atrophy my ability to write code manually.

My bet is that sooner or later he will get back to coding by hand for periods of time to avoid that, like many others, the damage overreliance on these tools bring is serious.

> Largely due to all the little mostly syntactic details involved in programming, you can review code just fine even if you struggle to write it.

No programming it's not "syntactic details" the practice of programming it's everything but "syntactic details", one should learn how to program not the language X or Y

> What happens to the "10X engineer" - the ratio of productivity between the mean and the max engineer? It's quite possible that this grows a lot.

Yet no measurable econimic effects so far

> Armed with LLMs, do generalists increasingly outperform specialists? LLMs are a lot better at fill in the blanks (the micro) than grand strategy (the macro).

Did people with a smartphone outperformed photographers?

reply
TaupeRanger
5 hours ago
[-]
Lots of very scared, angry developers in these comment sections recently...
reply
hollowturtle
5 hours ago
[-]
Not angry nor scared, I value my hard skills a lot, I'm just wondering why people believe religiously everything AI related. Maybe I'm a bit sick with the excessive hype
reply
hollowturtle
5 hours ago
[-]
Also note that I'm a heavy LLM user, not anti ai for sure
reply
thr59182617
3 hours ago
[-]
I see way more hype that is boosted by the moderators. The scared ones are the nepo babies who founded a vaporware AI company that will be bought by daddy or friends through a VC.

They have to maintain the hype until a somewhat credible exit appears and therefore lash out with boomer memes, FOMO, and the usual insane talking points like "there are builders and coders".

reply
simianwords
2 hours ago
[-]
i'm not sure what kind of conspiracy you are hallucinating. do you think people have to "maintain the hype"? it is doing quite well organically.
reply
hollowturtle
2 hours ago
[-]
So well that they're losing billions and OpenAI may go bankrupt this year
reply
simianwords
2 hours ago
[-]
what if it doesn't?
reply
hollowturtle
2 hours ago
[-]
better for them! the heck i care about it
reply
simianwords
2 hours ago
[-]
This is a low quality curmudgeonly comment
reply
hollowturtle
2 hours ago
[-]
Now that you contributed zero net to the discussion and learned a new word you can go out and play with toys! Good job
reply
potatogun
2 hours ago
[-]
You learned a new adjective? If people move beyond "nice", "mean" and "curmudgeonly" they might even read Shakespeare instead of having an LLM producing a summary.
reply
simianwords
2 hours ago
[-]
cool.

>Anyone wondering what exactly is he actually building? What? Where?

this is trivially answerable. it seems like they did not do even the slightest bit of research before asking question after question to seem smart and detailed.

reply
hollowturtle
2 hours ago
[-]
I asked many question and you focused on only one, btw yes I did my research, and I know him because I followed almost every tutorial he has on YouTube, and he never mentions clearly what weekend project worked on to make him conclude with such claims. I had a very high respect of him if not that at some point started acting like the Jesus Christ of LLMs
reply
simianwords
2 hours ago
[-]
its not clear why you asked that question if you knew the answer to it?
reply
nadis
1 day ago
[-]
The section on IDEs/agent swarms/fallibility resonated a lot for me; I haven't gone quite as far as Karpathy in terms of power usage of Claude Code, but some of the shifts in mistakes (and reality vs. hype) analysis he shared seems spot on in my (caveat: more limited) experience.

> "IDEs/agent swarms/fallability. Both the "no need for IDE anymore" hype and the "agent swarm" hype is imo too much for right now. The models definitely still make mistakes and if you have any code you actually care about I would watch them like a hawk, in a nice large IDE on the side. The mistakes have changed a lot - they are not simple syntax errors anymore, they are subtle conceptual errors that a slightly sloppy, hasty junior dev might do. The most common category is that the models make wrong assumptions on your behalf and just run along with them without checking. They also don't manage their confusion, they don't seek clarifications, they don't surface inconsistencies, they don't present tradeoffs, they don't push back when they should, and they are still a little too sycophantic. Things get better in plan mode, but there is some need for a lightweight inline plan mode. They also really like to overcomplicate code and APIs, they bloat abstractions, they don't clean up dead code after themselves, etc. They will implement an inefficient, bloated, brittle construction over 1000 lines of code and it's up to you to be like "umm couldn't you just do this instead?" and they will be like "of course!" and immediately cut it down to 100 lines. They still sometimes change/remove comments and code they don't like or don't sufficiently understand as side effects, even if it is orthogonal to the task at hand. All of this happens despite a few simple attempts to fix it via instructions in CLAUDE . md. Despite all these issues, it is still a net huge improvement and it's very difficult to imagine going back to manual coding. TLDR everyone has their developing flow, my current is a small few CC sessions on the left in ghostty windows/tabs and an IDE on the right for viewing the code + manual edits."

reply
shawabawa3
1 day ago
[-]
It's been a bit like the boiling frog analogy for me

I started by copy pasting more and more stuff in chatgpt. Then using more and more in-IDE prompting, then more and more agent tools (Claude etc). And suddenly I realise I barely hand code anymore

For sure there's still a place for manual coding, especially schemas/queries or other fiddly things where a tiny mistake gets amplified, but the vast majority of "basic work" is now just prompting, and honestly the code quality is _better_ that it was before, all kinds of refactors I didn't think about or couldn't be bothered with have almost automatically

And people still call them stochastic parrots

reply
Macha
3 hours ago
[-]
I've had the opposite experience, it's been a long time listening to people going "It's really good now" before it developed to a permutation that was actually worth the time to use it.

ChatGPT 3.5/4 (2023-2024): The chat interface was verbose and clunky and it was just... wrong... like 70+% of the time. Not worth using.

CoPilot autocomplete and Gitlab Duo and Junie (late 2024-early 2025): Wayyy too aggressive at guessing exactly what I wasn't doing and hijacked my tab complete when pre-LLM type-tetris autocomplete was just more reliable.

Copilot Edit/early Cursor (early 2025): Ok, I can sort of see uses here but god is picking the right files all the time such a pain as it really means I need to have figured out what I wanted to do in such detail already that what was even the point? Also the models at that time just quickly descended into incoherency after like three prompts, if it went off track good luck ever correcting it.

Copilot Agent mode / Cursor (late 2025): Ok, great, if the scope is narrowly scoped, and I'm either going to write the tests for it or it's refactoring existing code it could do something. Like something mechanical like the library has a migration where we need to replace the use of methods A/B/C and replace them with a different combination of X/Y/Z. great, it can do that. Or like CRUD controller #341. I mean, sure, if my boss is going to pay for it, but not life changing.

Zed Agent mode / Cursor agent mode / Claude code (early 2026): Finally something where I can like describe the architecture and requirements of a feature, let it code, review that code, give it written instructions on how to clean it up / refactor / missing tests, and iterate.

But that was like 2 years of "really it's better and revolutionary now" before it actually got there. Now maybe in some languages or problem domains, it was useful for people earlier but I can understand people who don't care about "but it works now" when they're hearing it for the sixth time.

And I mean, what one hand gives the other takes away. I have a decent amount of new work dealing with MRs from my coworkers where they just grabbed the requirements from a stakeholder, shoved it into Claude or Cursor and it passed the existing tests and it's shipped without much understanding. When they wrote them themselves, they tested it more and were more prepared to support it in production...

reply
ed_mercer
5 hours ago
[-]
I find myself even for small work, telling CC to fix it for me is better as it usually belongs to a thread of work, and then it understands the big picture better.
reply
phailhaus
1 day ago
[-]
> And people still call them stochastic parrots

Both can be true. You're tapping into every line of code publicly available, and your day-to-day really isn't that unique. They're really good at this kind of work.

reply
uejfiweun
5 hours ago
[-]
Honestly, how long do you guys think we have left as SWEs with high pay? Like the SWE job will still exist, but with a much lower technical barrier of entry, it strikes me that the pay is going to decrease a lot. Obviously BigCo codebases are extremely complex, more than Claude Code can handle right now, but I'd say there's definitely a timer running here. The big question for my life personally is whether I can reach certain financial milestones before my earnings potential permanently decreases.
reply
jerf
5 hours ago
[-]
It's counterintuitive but something becoming easier doesn't necessarily mean it becomes cheap. Programming has arguably been the easiest engineering discipline to break into by sheer force of will for the past 20+ years, and the pay scales you see are adapted to that reality already.

Empowering people to do 10 times as much as they could before means they hit 100 times the roadblocks. Again, in a lot of ways we've already lived in that reality for the past many years. On a task-by-task basis programming today is already a lot easier than it was 20 years ago, and we just grew our desires and the amount of controls and process we apply. Problems arise faster than solutions. Growing our velocity means we're going to hit a lot more problems.

I'm not saying you're wrong, so much as saying, it's not the whole story and the only possibility. A lot of people today are kept out of programming just because they don't want to do that much on a computer all day, for instance. That isn't going to change. There's still going to be skills involved in being better than other people at getting the computers to do what you want.

Also on a long term basis we may find that while we can produce entry-level coders that are basically just proxies to the AI by the bucketful that it may become very difficult to advance in skills beyond that, and those who are already over the hurdle of having been forced to learn the hard way may end up with a very difficult to overcome moat around their skills, especially if the AIs plateau for any period of time. I am concerned that we are pulling up the ladder in a way the ladder has never been pulled up before.

reply
spaceman_2020
5 hours ago
[-]
I think the senior devs will be fine. They're like lawyers at this point - everyone is too scared they'll screw up and will keep them around

The juniors though will radically have to upskill. The standard junior dev portfolio can be replicated by claude code in like three prompts

The game has changed and I don't think all the players are ready to handle it

reply
daxfohl
4 hours ago
[-]
Supply and demand. There will continue to be a need for engineers to manage these systems and get them to do the thing you actually want, to understand implications of design tradeoffs and help stakeholders weigh the pros and cons. Some people will be better at it than others. Companies will continue to pay high premiums for such people if their business depends on quality software.
reply
tietjens
4 hours ago
[-]
I think to give yourself more context you should ask about the patterns that led to SWEs having such high pay in the last 10-15 years and why it is you expected it to stay that way.

I personally think the barrier is going to get higher, not lower. And we will be back expected to do more.

reply
q3k
1 hour ago
[-]
I think the pay is going to skyrocket for senior devs within a few years, as training juniors that can graduate past pure LLM usage becomes more and more difficult.

Day after day the global quality of software and learning resources will degrade as LLM grey goo consumes every single nook and cranny of the Internet. We will soon see the first signs of pure cargo cult design patterns, conventions and schemes that LLMs made up and then regurgitated. Only people who learned before LLMs became popular will know that they are not to be followed.

People who aren't learning to program without LLMs today are getting left behind.

reply
riku_iki
3 hours ago
[-]
> like the SWE job will still exist, but with a much lower technical barrier of entry

its opposite, now in addition to all other skills, you need skill how to handle giant codebases of viobe-coded mess using AI.

reply
DeathArrow
5 hours ago
[-]
>LLM coding will split up engineers based on those who primarily liked coding and those who primarily liked building.

Quite insightful.

reply
cyanydeez
1 day ago
[-]
So I'm curious, whats the actual quality control.

Like, do these guys actually dog food real user experience, or are they all admins with the fast lane to the real model while everyone outside the org has to go through the 10 layers of model sheding, caching and other means and methods of saving money.

We all know these models are expensive as fuck to run and these companies are degrading service, A+B testing, and the rest. Do they actually ponder these things directly?

Just always seems like people are on drugs when they talk about the capabilities, and like, the drugs could be pure shit (good) or ditch weed, and we call just act like the pipeline for drugs is a consistent thing but it's really not, not at this stage where they're all burning cash through infrastructure. Definitely, like drug dealers, you know they're cutting the good stuff with low cost cached gibberish.

reply
quinnjh
5 hours ago
[-]
> Definitely, like drug dealers, you know they're cutting the good stuff with low cost cached gibberish.

Can confirm. My partner's chatGPT wouldnt return anything useful for her given a specific query involving web use, while i got the desired result sitting side by side. She contacted support and they said nothing they can do about it, her account is in an A/B test group without some features removed. I imagine this saves them considerable resources despite still billing customers for them.

how much this is occurring is anyones guess

reply
bigwheels
23 hours ago
[-]
If you access a model through an openrouter provider it might be quantized (akin to being "cut with trash"), but when you go directly to Anthropic or OpenAI you are getting access to the same APIs as everyone else. Even top-brass folks within Microsoft use Anthropic and OpenAI proper (not worth the red-tape trouble to go directly through Azure). Also, the creator and maintainer of Claude, Boris Cherny, was a bit of an oddball but one of the comparatively nicer people at Anthropic, and he indicated he primarily uses the same Anthropic APIs as everyone else (which makes sense from a product development perspective).

The underlying models are all actually really undifferentiated under the covers except for the post-training and base prompts. If you eliminate the base prompts the models behave near identically.

A conspiracy would be a helluva lot more interesting and fun, but I've spoken to these folks firsthand and it seems they already have enough challenges keeping the beast running.

reply
Madmallard
1 day ago
[-]
Are game developers vibe coding with agents?

It's such a visual and experiential thing that writing true success criteria it can iterate on seems like borderline impossible ahead of time.

reply
20260126032624
5 hours ago
[-]
I don't "vibe code" but when I use an LLM with a game I usually branch out into several experiments which I don't have to commit to. Thus, it just makes that iteration process go faster.

Or slower, when the LLM doesn't understand what I want, which is a bigger issue when you spawn experiments from scratch (and have given limited context around what you are about to do).

reply
TheGRS
4 hours ago
[-]
I'm trying it out with Godot for my little side projects. It can handle writing the GUI files for nodes and settings. The workflow is asking cursor to change something, I review the code changes, then load up the game in Godot to check out the changes. Works pretty well. I'm curious if any Unity or Unreal devs are using it since I'm sure its a similar experience.
reply
redox99
1 day ago
[-]
Vibe coding in Unreal Engine is of limited use. It obviously helps with C++, but so much of your time is doing things that are not C++. It hurts a lot that UE relies heavily on blueprints, if they were code you could just vibecode a lot of that.
reply
spaceman_2020
5 hours ago
[-]
Once again, 80% of the comments here are from boomers.

HN used to be a proper place for people actually curious about technology

reply
vardalab
3 hours ago
[-]
I'm almost a boomer and I agree. THis dichotomy is weird. I am retired EE and I love the ability to just have AI do whatever I want for me. I have it manage a 10 node proxmox cluster in my basement via ansible and terraform. I can finally do stuff I always wanted but had no time. I got sick of editing my kids sports videos for highlights in Davinci Resolve so just asked claude to write a simple app for me and then use all my random video cards in my boxes to render clips in parallel and so on. Tech is finally fun again when I do not have to dedicate days to understand some new framework. It does feel a little like late 1990's computing when everyone was making geocities webpages but those days were more fun. Now with local llms getting strong as well and speaking to my PC instead of typing it feels like SciFi, so yeah, I do not get this hacker news hand wringing about code craft.
reply
kejaed
3 hours ago
[-]
So what is your workflow now with this app for kids sports highlights?
reply
zennit
1 hour ago
[-]
Also interested
reply
weirdmantis69
5 hours ago
[-]
Ya it's so weird lol
reply
themafia
2 hours ago
[-]
Instead of a 17 paragraph twitter post with a baffling TLDR at the end why not just record your screen and _demonstrate_ all of what you're describing?

Otherwise, I think you're incidentally right, your "ego" /is/ bruised, and you're looking for a way out by trying to prognosticate on the future of the technology. You're failing in two different ways.

reply
wkh129857
5 hours ago
[-]
It is pretty sad who much attention people give to someone who has never written any production software and leaves Tesla once video FSD becomes difficult.

This is just a rambling tweet that has all the hallmarks of an AI addict.

reply
soganess
5 hours ago
[-]
"addict"

Great idea! Le's pathalogize another thing! I love quickly othering whole concepts and putting them in my brain's "bad" box so I can feel superior.

reply
reducesuffering
4 hours ago
[-]
https://github.com/karpathy/nanochat

https://github.com/karpathy/llm.c

The proof is in the pudding. Let's see your code

reply
jackling
2 hours ago
[-]
I don't agree with the parent commenters characterization of Karpathy, but these projects are just simple toy projects. They're educational material, not production level software.
reply