> ...
> Writing good code remains significantly more expensive
I think this is a bad argument. Code was expensive because you were trying to write the expensive good code in the first place.
When you drop your standards, then writing generated code is quick, easy and cheap. Unless you're willing to change your standard, getting it back to "good code" is still an equivalent effort.
There are alternative ways to define the argument for agentic coding, this is just a really really bad argument to kick it off.
Last month I did the majority of my work through an agent, and while I did review its work, I’m now finding edge cases and bugs of the kind that I’d never have expected a human to introduce. Obviously it’s on me to better review its output, but the perceived gains of just throwing a quick bug ticket at the ai quickly disappear when you want to have a scalable project.
I chose this words because I don't think good code is nearly as expensive with coding agents as it was without them.
You still have to actively work to get good code, but it takes so much less time when you have a coding agent who can do the fine-grained edits on your behalf.
I firmly believe that agentic engineering should produce better code. If you are moving faster but getting worse results it's worth stopping and examining if there are processes you could fix.
I’m using a combination of 100s of megabytes of Ghidra decompiled delphi DLLs and millions of lines of decompiled C# code to do this reverse engineering. I can’t imagine even trying such a large project for LLMs so while a good implementation is still taking a lot of time, it’s definitely a lot cheaper than before.
[1] I saw your red/green TDD article/book chapter and I don’t think you go far enough. Since we have agents, you can generalize red/green development to a lot of things that would be impractical to implement in tests. For example I have agents analyze binary diffs of the file format to figure out where my implementation is incorrect without being bogged down by irrelevant details like the order or encoding of parameters. This guides the agent loop instead of tests.
Which is nuance that will get overlooked or waved away by upper management who see the cost of hiring developers, know that developers "write code", and can compare the developer salary with a Claude/Codex/whatever subscription. If the correction comes, it will be late and at the expense of rank and file, as usual. (And don't be naive: if an LLM subscription can let you employ fewer developers, that subscription plus offshore developers will enable even more cost saving. The name of the game is cost saving, and has been for a long time.)
The reason you pay attention to details is because complexity compounds and the cheapest cleanup is when you write something, not when it breaks.
This last part is still not fully fleshed out.
For now. Is there any reason to not expect things to improve further?
Regardless, a lot of code is cheap now and building products is fun regardless, but I doubt this will translate into more than very short-term benefits. When you lower the bar you get 10x more stuff, 10x more noise, etc. You lower it more you get 100x and so on.
With python I can write a simple debugging UI server with a few lines.
There are frameworks that allow me to complete certain tasks in hours.
You do not need to program everything from scratch.
The more code, the faster everything gets, since the job is mostly done.
We are accelerating, but we still work 9 to 5 jobs.
The former: 1) understand the problem, 2) solve the problem.
The latter: 1) understand the problem, 2) solve the problem, 3) understand how somebody or something else understood & solved the problem, 4) diff those two, 5) plan a transition from that solution to this solution, 6) implement that transition (ideally without unplanned downtime and/or catastrophic loss of data).
This is also why I’m not a fan of code reviews. Code review is basically steps 1–4 from the second approach, plus having to verbally explain the diff, every time.
That's specious reasoning. Code reviews are a safeguard against cowboy coding, and a tool to enforce shared code ownership. You might believe you know better than most of your team members, but odds are a fresh pair of eyes can easily catch issues you snuck in your code that you couldn't catch due to things like PR tunnel vision.
And if your PR is sound, you certainly don't have a problem explaining what you did and why you did it.
[0] Reviews are OK if I enjoy working with the person whose work I’m reviewing and I feel like I’m helping them grow.
Do current LLM based agents generate code which is easy to change? My gut feeling is a no at the moment. Until they do I'd argue code generated from agents is only good for prototypes. Once you can ask your agent to change a feature and be 100% sure they won't break other features then you don't care about how the code looks like.
But LLMs are both really good at writing code _and_ reading code. However, they're not great at knowing when to stop - either finishing early and leaving stuff broken, over-engineering and adding in stuff that's not needed or deciding it's too hard and just removing stuff it deems unimportant.
I've found a TDD approach (with not just unit tests but high-level end-to-end behaviour-driven tests) works really well with them. I give them a high-level feature specification (remember Gherkin specifications?) and tell it to make that pass (with unit tests for any intermediate code it writes), make sure it hasn't broken anything (by running the other high-level tests) then, finally, refactor. I've also just started telling it to generate screenshots for each step in the feature, so I can quickly evaluate the UI flow (inspired by Simon Willison's Rodney tool).
Now I don't actually need to care if the code is easy to read or easy to change - because the LLM handles the details. I just need to make sure that when it says "I have implemented Feature X" that the steps it has written for that feature actually do what is expected and the UI fits the user's needs.
They do. I am no longer writing code, everything I commit is 100% generated using an agent.
And it produces code depending on the code already in my code-base and based on my instructions, which tell it about clean-code, good-practices.
If you don't get maintainable code from an LLM it's for this reason: Garbage in, garbage out.
That bar is unreasonably high.
Right now, if I ask a senior engineer to change a feature in a mature codebase, I only have perhaps 70% certainty they won't break other features. Tests help, but only so far.
With auto generated code which almost no one will check or debug by hand, you want at least compiler level exactitude. Then changing "the code" is as easy as asking your code generator for new things. If people have to debug its output, then it does not help in making maintainable software unless it also generates "good" code.
I'd also argue that we should be pushing towards tracer bullets as a development concept and less so prototypes that are nice but meant to be thrown away and people might not do that.
The clean room auto porting, after a messy exploratory prototyping session would be a nice pattern, nonetheless.
Every human can string words together, but there's a world of difference between words that raise $100M and words that get you slapped in the face.
The raw material was always cheap. The skill is turning it into something useful. Agentic engineering is just the latest version of that. The new skill is mastering the craft of directing cheap inputs toward valuable outcomes.
Strongly agree with this. It took me awhile to realize that "agentic engineering" wasn't about writing software it was about being able to very quickly iterate on bespoke tools for solving a very specific problem you have.
However, as soon as you start unblocking yourself from the real problem you want to solve, the agentic engineering part is no longer interesting. It's great to be solving a problem and then realize you could improve it very quickly with a quick request to an agent, but you should largely be focused on solving the problem.
Yet I see so many people talking about running multiple agents and just building something without much effort spent using that thing, as though the agentic code itself is where the value lies. I suspect this is a hangover from decades where software was valuable (we still have plenty of highly valued, unprofitable software companies as a testament to this).
I'm reminded a bit of Alan Watts' famous quote in regards to psychedelics:
> If you get the message, hang up the phone.
If you're really leveraging AI to do something unique and potentially quite disruptive, very quickly the "AI" part should become fairly uninteresting and not the focus of your attention.
It seems to depend a lot on the industry and niche you're in, working at an agency I get experience across many different projects and industries and sometimes you are just at the edge of AIs training and it can get very unhelpful. Noting many if not most companies are working on proprietary code in donain specific problems, that isn't all that surprising either.
As an example: One of my most promising projects I was discussing with a friend and we realized together we could potentially use these tools to build a two person agency with no need to hire anyone ever. If this were to work, could theoretically make nice revenue and it shouldn't show up in any metric anywhere.
Additionally I've heard of countless teams cancelling their contracts with outsourced engineers because cheap but bad coders in India are worse that an LLM and still cost more. I'm not sure if there's a number around this activity, but again, these type of changes don't show up in the usual places.
My current belief is not that AI will replace traditional software engineering it will replace a good chunk of the entire model of software.
You're not following your last line to its logical conclusion regarding your own prospects: no one is going to buy the vibeslop your two person agency is selling because they'd rather create and maintain their own vibeslop instead of dealing with yours.
If you follow some of your thoughts to their logical conclusion you'll realize the parent is right: there will be limited productivity that ends up fueling the economy when nobody is buying each other's vibeslop.
I absolutely agree that it's not logical to think "oh we'll sell our AI stuff", that's the old model (which is just a variation on SaaS). I suspect a lot of HNers can't imagine a "product" that isn't code, but that's not at all what I'm describing.
The products that most people on HN have traditionally built are used by other companies to make money by allowing those processes to be scaled. AI, in many new cases, eliminates the need for a 'software' middle man. The case I'm describing is "I know how to make money doing X if only I could scale it up with out hiring people" and my offering is "I can scale it up without hiring people".
This is increasingly where I think the future of work is headed, and it's more than fine if you aren't convinced.
Faster than what? You will be faster than your previous self, just like all of your competitors. Where’s the net gain here? Even if you somehow managed to capture more value for yourself, you’ve stopped providing value to 5-10x that many employees who are no longer employed.
When costs approach zero on a large scale, margins do not increase. Low costs = you’re not paying anyone = your competitors aren’t paying anyone = your customers no longer have money = your revenue follows your costs straight to zero.
Companies that provide physical services can’t scale without hiring. A one-man “crew” isn’t putting a roof on a data center.
I want to be wrong. Tell me why you think any of this is wrong.
Except production GDP, the standard measure of economic activity.
What are the 48 other people doing now? Presumably some other economic activity.
potentially...if this were to work...theoretically
shouldn't show up? I would worry that something with so many variables wouldn't show up.
That's because the threat is now not other businesses, but your own users who decide to vibe-code their own "Claw" product instead of using your company's vibeslop, so there are no buyers for your single-week product. All these new harness developers are engaging in resume-driven development to save their own asses. The only ones that are not naked when the tide recedes are the ones that are able to jump to the next layer of abstraction on the infinite staircase, until the next tide comes five seconds later.
Yeah but the actual productivity gains that the internet and software tools introduced has had diminishing returns after a while.
Like, are people more productive today when they use Outlook and Slack than they were 20 years ago when using IBM Lotus Notes and IBM Sametime? I'm not. Are people more productive with the Excel of today than with Excel 2003/2007? I'm not. Is Windows 11 and MacOS Tahoe making people more productive than Windows 7 and Snow Leopard? Not me. Are IDEs of today offering so much more productivity boost than what Visual Studio, CodeWarrior and Borland Delphi did back in the day? Don't think so.
To me it seems that at least on the productivity side, we've mostly been reinventing the wheel "but in Rust/Electron" for the last 15 or so years, and the biggest productivity gains came IMHO from increased compute power due to semiconductor advancement, so that the same tasks finished faster today than 20 years ago, but not that the SW or the internet got so much more capable since then.
At least, in my own experience.
So then the question is, lis there anything other than feels to say productive has or has not gone up? What would we accept as actual evidence one way or another? Commits-per-day is similarly not a good measure either. Jira tickets and tshirts sizes? We don't have a good measure, so while ShowHN's per weekend is equally dumb, it's also equally good in the bag of lies, damn lies, and statistics.
Another commentor said it makes the easy part easy, and the hard part harder, which I resonate with at the moment.
I am pretty excited by being able to jump deep into real problems without code being the biggest bottleneck. I love coding but I love solving problems more, and coding for fun is very different to coding for outcomes.
There's also the question of the true cost of all the hardware, electricity, and potential output that's being tossed onto the pyres. We aren't getting the real Cortana from the books / games; we're getting GIR trained on the corpus of fallible human code, prompted by fallible humans.
As a specialization? Sure. But the ditch diggers moved since to machine operators, handymen and the like.
In the past there were sysadmins. Do we have less software engineers since sysadmins ceased to be a thing?
All of them? What if they liked digging ditches?
> In the past there were sysadmins. Do we have less software engineers since sysadmins ceased to be a thing?
Software Engineers were never sysadmins in the past, you’re thinking DevOps maybe?
Bootcamp grads are basically obsolete now. The real skill has always been the ability to make good design decisions and that's still the case in the LLM era.
For now maybe yes but the goal is totally removing the human from the decision loop regarding technical stuff.
I beg to differ. I know for a fact that some companies started hiring people with LLM experience, whose only expertise is spending all Copilot enterprise account tokens on their first week at the job and proceed to whine that the lack of tokens was stifling their creativity.
Say what you may about boot camps, but at least the people getting hired could do things and understand what they are doing.
TLDR A raise is not robust signal in this regard.
[1] https://news.ycombinator.com/item?id=7260137
[2] https://www.linkedin.com/posts/peterjameswalker_most-venture...
[3] https://en.wikipedia.org/wiki/There%27s_a_sucker_born_every_...
Also, I do want to note that these little "Here is how I see the world of SWE given current model capabilities and tooling" posts are MUCH appreciated, given how much you follow the landscape. When a major hype wave is happening and I feel like I am getting drowned on twitter, I tend to wonder "What would Simon say about this?"
That is my observation as well. Churning code is easy, but making sure the code is not total crap is a completely new challenge and concern.
It's not like prior to LLMs code reviews didn't required work. Far from it. It's just that how the code is generated in a completely different way, and in some cases with barely any oversight from vibecoders who are trying to punch way above their weight. So they generate these massive volumes of changes that fail in obvious and subtle ways, and the flow is relentless.
You can remove those comments afterwards if you feel they are too much but it helps a lot the reviewing.
More a trick than a silver bullet but it's nice.
It sounds harsh, but over the lifetime of a project, 10-lines/person/day is often a high estimate of the number of lines produced. It’s not because humans type so slow - it is because after a while, it’s all about changing previously written lines in ways that don’t break things.
LLMs are much better at that than humans, if the constraints and tests are reasonably well specified.
if they are, then why would a human be so slow? You're not comparing the same situation.
The difference comes in confidence that the solution works and can be maintained in the future, but in terms of purely making the decisions and applying the changes an LLM is faster when it has all the required infos available
In that time the LLM has made a change, ran tests, committed, pushed, checked that the CI build failed, looked at the CI logs, fixed the issue and the PR is now passing.
I think it's a very fun space, finally being able to empower many people who in the past wouidve been bottlenecked unless they were using very simple tools for their domain and upskilled enough. Those 2 things will still be true, but the speed at which some things can happen at the exploration and other layers has seen a significant speedup.
Other problems like entropy/slop, security, system testing, lack of automation fundamentals arise but it's a good problem to start tackling.
I'm very focused on evals [1] because is what allows me to not to be the bottleneck with economists who I want to empower to code end to end and I'd like that mental shift to happen for anyone becoming a builder so non traditional developers and developers by trade have a common language for product building [2]. That part of speaking to different audiences and combating hype that promises to do everything for you Vs the intent that's actually needed is hard, but trying gets you to advance quite a lot.
[1] https://alexhans.github.io/posts/series/evals/measure-first-...
LLMs lower the cost of copy/pasting code around, or troubleshooting issues using standard error messages.
Instead of going through Stack Overflow to find how to use a framework to do some specific thing, you prompt a model. You don't even need to know a thing about the language you are using to leverage a feedback loop.
LLMs lower the cost of a multitude of drudge work in developing software, such as having to read the docs to learn how a framework should be used to achieve a goal. You still need to know what you are doing, but you don't need to reinvent the wheel.
> The challenge is to develop new personal and organizational habits that respond to the affordances and opportunities of agentic engineering.
I don't think it's the habits that need to change, it's everything. From how accountability works, to how code needs to be structured, to how languages should work. If we want to keep shipping at this speed, no stone can be left unturned.
[1]: https://lucumr.pocoo.org/2026/2/13/the-final-bottleneck/
I'm very focused on their minimalistic building experience as a way to make me and other traditional developers, not the bottleneck and empowering them end to end.
I think AI evals [1] are a big part of that route and hope that different disciplines can finally have probable product design stories [2] instead of there being big gaps of understanding between them.
[1] https://alexhans.github.io/posts/series/evals/measure-first-...
If agentic AI is a good idea and if it increases productivity we should expect to see some startup blowing everyone out of the water. I think we should be seeing it now if it makes you say ten times more productive. A lot of startups have had a year of agentic AI now to help them beat their competitors.
Imo the wave of top down 'AI mandates' from incumbent companies is a direct result of the competitive pressure, although it probably wont work as well as the execs think it will
that being said even Dario claims a 5-20% speedup from coding agents, 10x productivity only exists in microcosm prototypes, or if someone was so unskilled oneshotting a localhost web app is a 10x for them
Could you give us a few examples?
Which gets back to the outsourcing argument: it’s always been cheap to make buggy code. If we were able to solve this, outsourcing would have been ubiquitous. Maybe LLMs change the calculus here too?
But coding assistance tools must themselves be evaluated by what they produce. We won't see significant economic growth through using AI tools to build other AI tools recursively unless the there are companies using these tools to make enough money to justify the whole stack.
I believe there are teams out there producing software that people are willing to pay for faster than they did before. But if we were on the verge of rapid economic growth, I would expect HN commenters to be able to rattle these off by the dozen.
ant 10xing ARR, oai
harvey legora sierra decagon 11labs glean(ish) base10(infra) modal(infra) gamma mercor(ish) parloa cognition
regulated industries giving these companies 7/8-fig contracts less than 2 years from incorporation
Not sure how long it’ll last though. With the time I spend on reviews I could have done it myself, so if they don’t start learning…
Then? Your job is still to review their code. If they are your coworker, you can not fire them.
(Whether you think OpenClaw is good software is kind of beside the point.)
I don’t think anyone is arguing against code agents being good at prototypes, which is a great feat, but most SWE work is built on maintaining code over time.
It's very much imperfect, but the only consistently agreed upon and useful definition of "value" we have in the West is monetary value, and in that sense, we have at least a few major examples of AI generating value rapidly.
In any case, I agree with the grandparent post about the distinction between being successful and good.
I don't see a bunch of small agents in the future, instead just one per device or user. Maybe there will be a fleeting moment for GUI/local apps to tie into some local, OS LLM library (or some kind of WebLLM spec) to leverage this local agent in your app.
sort of how the hammer is the most useful tool ever and all we have to do is to make every thing that needs doing look like a nail.
Will we stop using web browsers as we understand them today in the next few decades in favor of only interacting with agents? Maybe.
These are valid points, taken to the extreme we will have apps that cannot be supported.
In short term, we already have SQL/reports being automated. Lovable etc is experimenting with generating user interfaces from prompts, soon we will have complete working apps from a prompt. Why not have one core that you can expand via a prompt?
I am currently studying and depending heavily on Anki, its been amazing to use Claude Code to add new functionality on the fly. Its a holy mess of inconsistent/broken UX but it so clearly gives me value over the core version. Sometimes it breaks, but CC can usually fix it within a prompt or two.
Me too, and I see this as _incredibly_ wasteful.
Why? Why do we need to "write code so much faster and quicker" to the point we saturate systems downstream? I understand that we can, but just because we can, does'nt mean we should.
But that's point of TFA, no? Now that writing code is no longer the bottleneck, the upstream and downstream processes have become the new bottlenecks, and we need to figure out how to widen them.
As I see it, the end goal for all of this is generating software at the speed of thought, or at least at the speed of speech. I want the digital butler to whom I could just say - "I'm not happy with the way things happened to day, please change it so that from here on, it'll be like x" - and it'll just respond with "As you wish", and I'll have confidence that it knows me well enough and is capable enough to have actually implemented the best possible interpretation of what I asked for, and that the few miscommunications that do occur would be easy to fix.
We're obviously not close that yet, but why shouldn't we build towards it?
I think it’s contestable that writing the code was ever the main bottleneck.
> As I see it, the end goal for all of this is generating software at the speed of thought, or at least at the speed of speech.
The question is what distinguishes that from having AGI, and if the answer is “nothing”, then that will change the whole game entirely again.
Using AI to ship more and more code faster, instead of to make code more mature, will make this worse.
With coding agent projects I find that investing in DRY doesn't really help very much. Needing to apply the same fix in two places is a waste of time as a human. An agent will spot both places with grep and update them almost as fast as if there was just one.
It's another case where my existing programming instincts appear to not hold as well as I would expect them to.
Is the goal basically a codebase where your interactions are mediated through an LLM?
I'm not ready to write about how radically though because I don't know myself!
Do we? Spewing features like explosive diarrhea is not something I want.
The thing I'd add from running agents in actual production (not demos, but workflows executing unattended for weeks): the hard part isn't code volume or token cost. It's state continuity.
Agents hallucinate their own history. Past ~50-60 turns in a long-running loop, even with large context windows, they start underweighting earlier information and re-solving already-solved problems. File-based memory with explicit retrieval ends up being more reliable than in-context stuffing - less elegant but more predictable across longer runs.
Second hard part: failure isolation. If an agent workflow errors at step 7 of 12, you want to resume from step 6, not restart from zero. Most frameworks treat this as an afterthought. Checkpoint-and-resume with idempotent steps is dramatically more operationally stable.
Agree it's not just habits - the infrastructure mental model has to change too. You're not writing programs so much as engineering reliability scaffolding around code that gets regenerated anyway.
Tokens are expensive. We don't know what the actual cost is yet. We have startups, who aren't turning a profit, buying up all the capacity of the supply chain. There are so many impacts here that we don't have the data on.
Code is still liability but it's undeniable that going from thought to running code is very cheap today.
To recap, the author disagrees that writing code is cheap, because we've collectively invested trillions of dollars and redirected entire supply chains into automating code generation. The externalities will be paid for generations to come by all of humanity; it's just not reflected in your Claude subscription.
The cat is out of the bag: compute shall keep getting cheaper as it's always been since 60 years or something.
It's always been maintenance that's been the killer and GP is totally right about that.
And if we look at a company like Cloudflare who basically didn't have any serious outage for five years then had five serious outages in six months since they drank the AI kool-aid, we kinda have a first data point on how amazing AI is from a maintenance point of view.
We all know we're generating more lines of underperforming, insecure, probably buggy, code than ever before.
We're in for a wild ride.
We kind of do? Local models (thought no state of the art) set a floor on this.
Even if prices are subsidized now (they are) that doesn't mean they will be more expensive later. e.g. if there's some bubble deflation then hardware, electricity, and talent could all get cheaper.
The same applies to a small software project - you need to choose what features you can fit. And while the cost of building is part of the consideration, I'd say most of it is about the cost of maintaining features, not only in code, but also in product coherence and other incidental 'costs' like documentation and user support.
Be careful of building too many features and ending up being overwhelmed by the maintenance, or worse, diluting the product's value to a point where you loose users.
Writing good software is still expensive.
It's going to take everybody a while to figure that out (just like with outsourcing)
Which is a shame, cause I think LLMs have a lot more use for software dev than writing code. And that’s really what’s going to shift the industry - not just the part willing to cut on quality.
The real cost was never the code itself. It was the decision-making around what to build. That hasn't gotten cheaper at all.
Empowering people to work Tracer bullet style after they've selected their prototype of choice and thrown it away might be a powerful pattern that actually gets us into a nice collaborative space.
It seems to me that in order to obtain the ability to build things that other people like, you need to go through the process of creating things they won't. Like a painter needs to paint a bunch of crappy paintings to learn how to create a good painting. If you have the LLM create these throwaway prototypes, how will you even know when you come across a good idea and how will you be able to build it.
Okay, granted. What does that have to do with how the code is written? Do people generally care if a web app is running from nicely formatted JS or minified JS? Is a product manager not getting better at building things people like because they're not iterating on the code themselves?
Without agreeing or disagreeing with the premise, I think a relevant metaphor* here is that the painter can practice and iterate and go from creating crappy paintings to creating good paintings, without needing to make their own paint and canvas and brushes. If they're particular, they can have their assistant go to the supply shop and get just the right things they want, with increasing specificity as needed, but they don't need to manufacture them by hand.
* Like most metaphors, it's not perfect; please try to understand the intent.
The cost of iterating (with software) dropped by a few orders of magnitude in the last few months.
One huge barrier is fighting entropy. You should be wary of prototypes which create false expectations and don't help product evolution whereas tracer bullets [2] might be better if you want to quickly show something and adjust.
Testing and testability are concepts that aren't intuitive or easy until you develop a feel for them so we should be preaching feeling that pain and moving slowly and with intent and working minimally [3] when you actually want to share or maintain your coding artifact. There should be no difference between judicious human and computer code. Don't suddenly start putting What instead of why in comments or repeating everything.
Helping non tech people become builders or sharers is a challenge beyond "vibe coding" and the agent skills [4] space is fascinating for that. Like most things AI (LLM), UX matters more than almost anything else.
[2] concept from the Pragmatic Programmer, https://www.aihero.dev/tracer-bullets
[3] https://alexhans.github.io/posts/series/evals/measure-first-...
[4] https://alexhans.github.io/posts/series/evals/building-agent...
Yeah, coding is cheaper now, but knowing what to code has always been the more expensive piece. I think AI will be able to help there eventually, but it's not as far along on that vector yet.
AIs so far seem to prefer addition by addition, not addition by subtraction or addition by saying "are you sure?".
This doesn't mean that "code is cheap" is bad. Rather, it means that soon our primary role will be to guide AIs to produce a high proportion of "code that was cheap", while being able to quickly distinguish, prevent, and reject "cheap code".
1. The time spent to think and iteratively understand what you want to build 2. The time spent to spell out how you want to build it
The cost for #2 is nearly zero now. The cost for #1 too is slashed substantially because instead of thinking in abstract terms or writing tests you can build a version of the thing and then ground your reasoning in that implementation and iterate until you attain the right functionality.
However, once that thing is complex enough you still need to burn time on identifying the boundaries of the various components and their interplay. There is no gain from building "a browser" and then iterating on the whole thing until it becomes "the browser". You'll be up against combinatorial complexity. You can perhaps deal with that complexity if you have a way to validate every tiny detail, which some are doing very well in porting software for example.
> [...]
> - It’s simple and minimal - it does only what’s needed, in a way that both humans and machines can understand now and maintain in the future.
But do the humans need to actually understand the code? A "yes" means the bottleneck is understanding (code review, code inspection). A "no" means you can go faster, but at some risk.
> The resulting code does not always match human stylistic preferences, and that’s okay. As long as the output is correct, maintainable, and legible *to future agent runs*, it meets the bar.
I always thought of things like code reviews as semi pseudo-science in most cases. I've sat through meetings where developers obviously understand the code that they are reviewing, but where they didn't understand anything about the system as a whole. If your perfect function pulls on 800 external dependencies that you trust. Trust because it's too much of a hazzle to go through them. I'd argue that in this situation you don't understand your code at all. I don't think it matters and I certainly don't think I'm better than anyone else in this regard. I only know how things work when it matters.
If anything, I think AI will increase human understanding without the need to write computer unfriendly code like "Clean Code", "DRY" and so on.
How?
You might not get gcc/llvm level optimization from a newly built compiler - but if you had a home-built one, which took $15,000/month engineer to support (for years!) you can now get a new one for $20,000 every 3 months, for a 50% cost saving, each time changing your requirements (which you couldn’t do before).
Code used to be a liability, like a car or an apartment for the average person. Now it’s a liability, like a car or apartment for Bill Gates.
Next to that, eventually you run into the same issue that we humans run into: no more context windows.
But we as software engineers have learned to abstract away components, to reduce the cognitive load when writing code. E.g., when you write file you don't deal with syscalls anymore.
This is different with AI. It doesn't abstract away things, which means you requesting a change might make the AI make a LOT of changes to the same pattern, but this can cause behavior to change in ways you haven't anticipated, haven't tested, or haven't seen yet.
And because it's so much code to review, it doesn't get the same scrutiny.
Then "AI" code is even more of a liability.
But please correct me if I'm wrong.
Even if I understand all my code, when I go to make changes, if it's 100k lines of code vs 2k lines of code, it's going to take more time and be more error prone.
Even if I understand all my code, the intern I hired last week won't and I'll have to teach it to them.
Even if I understand all my code, I don't remember everything all the time and I can forget about an edge case handed in thousands of lines of code.
Even if I understand all my code, I don't understand my co-workers code, and they don't understand mine.
Even if I understand all my code, I might not want to work for this company the rest of my life.
Not an employee market, that's for sure.
This is the thing I don't really get. I enjoy tinkering with AI and seeing what it comes up with to solve problems. But when I need to write working code that does anything beyond simple CRUD, it's faster for me to write the code than it is to (1) describe the problem in English with sufficient detail and working theory, then (2) check the AI's work, understand what it's written, de-duplicate and dry it out.
I guess if I skipped step 2, it might save time, but it would be completely irresponsible to put it into production, so that's not an option in any world where I maintain code quality and the trust of my clients.
Plus, having AI code mixed into my projects also leaves me with an uneasy sense of being less able to diagnose future bugs. Yes, I still know where everything is, but I don't know it as well as if I'd written it myself. So I find myself going back and re-reviewing AI-written code, re-familiarizing myself with it, in order to be sure I still have a full handle on everything.
To the extent that it may save me time as an engineer, I don't mind using it. But the degree to which the evangelists can peddle it to the management of a company as a replacement for human coders seems highly correlated with whether that company's management understood the value of safe code in the first place. If they didn't, then their infrastructure may have already been garbage, but it will now become increasingly unusable garbage. At some point, I think there will be a backlash when the results in reality can no longer be denied, and engineers who can come in and clean up the mess will be in high demand. But maybe that's just wishful thinking.
Perhaps you have to be certain type of person or work in a peculiar company where second step (review) can be ignored as long as AI says that it does. Hardcore YOLO life.
saw an article recently where every sector is seeing a reduction in IT/devs except for tech and ai companies
if your company is in a sector where eng is a cost-center and the product is not directly tied to your engineers / your company is pushing for efficiency it's an employer's market
That's like saying that photography killed painting because it saved you from having to draw things. Drawing is basically free now, I just take the photo. But the number of painters (and by that I mean, artists who paint) is dramatically higher today than in 1800. Artists didn't die because of mechanical reproduction, they flourished, because that wasn't the problem they were solving.
As such, they can often be improved as easily as one can prompt, which is much faster and easier than before. Notably in the FOSS world where one had to ask the maintainer, get ghosted for a year and have them go back with a "close: wontfix (too tedious)".
Compare it to visual arts. With a guidance form an artist, AI tools can help create wonderful pictures. Without such guidance, or at least expert prompting, a typical one-shot image from Gemini is... well, at best recognizable as such.
Owning code is getting more and more expensive.
SWEs sacrificed their jobs so that SREs could have unlimited job security.
> At the macro level we spend a great deal of time designing, estimating and planning out projects, to ensure that our expensive coding time is spent as efficiently as possible. Product feature ideas are evaluated in terms of how much value they can provide in exchange for that time - a feature needs to earn its development costs many times over to be worthwhile!
Maybe I am spending my life working at the wrong corporations (not FAANG/direct tech related), but that doesn't match at all my experience. The `design` phase was reduced to something more akin to a sketch in order to get faster iterating products. Obviously that now, as you create and debate over more iterations, the time for writing code is increased (as you built more stuff that is discarded). What is that discarded time used for? Well, it's the way new people learn the system/business domain. It's how we build the knowledge to support the product in production. It's how the business learns what are the limits/features, why they are there, what they can offer, what they must ask the regulators etc.
Realistically, if you only count the time required to develop the feature as described, is basically nothing. Most of the time is spent on edge-cases that are not written anywhere. You start coding something and 15m in you discover 5-10 cases not handled in any way. You ask business people, they ask other people. You start checking regulation docs/examples, etc. etc. Maybe there are no docs available, so you just push a version, and test if you assumptions are correct (most likely not...so go again and again). At the end of this process everyone gains a better understanding on how the business works, why, and what you can further improve.
Can AI speedrun this? Sure, but then how will all the people around gain the knowledge required to advance things? We learn through trial and error. Previously this was a shared experience for everyone in the business, now it becomes more and more a solitary experience of just speaking with AI.
Despire the explosion of AI art, the amount of meaningful art in the world is increased only by a tiny amount.
Would some people prefer no art/illustration to AI generated art? Sure. But even more would prefer no art to my doodles.
Thus, "Code" is a liability; Producing excess liabilities 'cheaply' is still a loss.
You only ever want to have just enough code to accomplish the task at hand.
LLMs may help you get to just enough faster, but you'll only know that you are there after doing the second 90%.
the downstream bottleneck is real though. built a video production pipeline recently - generating the python glue code took maybe 10% of total project time. the other 90% was testing edge cases, tuning ffmpeg parameters, and figuring out why API responses were subtly different between providers. cheap code just means you hit the hard problems faster.
This. All LLM code I saw so far was lots of abstraction to the point that it’s hard to maintain.
It is testable for sure, but the complications cost is so high.
Something else that is not addressed in the article is working within enterprise env where new technologies are adopted in much slower paces compared to startups. LLMs come with strange and complicated patterns to solve these problems, which is understandable as I would imagine all training and tuning were following structured frameworks
When it’s trained on enough APL/K code, you’ll get minimal abstraction.
Turned it into a Stripe revenue dashboard and notifier.
Even bought a couple more, flashed them, and gave to my cofounders, complete with AI written (personally tested, though) setup instructions!
[0]: https://idiallo.com/blog/writing-code-is-easy-reading-is-har...
> Delivering new code has dropped in price to almost free... but delivering good code remains significantly more expensive than that.
Writing code was always cheap to start with. Just outsource it to the lowest bidder. Writing good code remains as expensive.
The same when programmers from different languages are considered. How many Scala/Haskell engineers can I find compared to Java is not the question. It is about how many good engineers you can hire. With Haskell that pool is definitely denser.
The second chapter is more of a classic pattern, it describes how saying "Use red/green TDD" is a shortcut for kicking the coding agent into test-first development mode which tends to get really good results: https://simonwillison.net/guides/agentic-engineering-pattern...
I also see that the tests generated by ChatGPT are far too few for the code features implemented. The cannot be the result of actual red/green TDD where the test comes before the feature is added.
For examples, 1) the code allows "~~~" but only tests for "```", 2) there are no tests when len(fence) < fence_len nor when len(fence) > fence_len, and 3) there are no tests for leading spaces.
There's also duplicate code. The function _strip_closing_hashes is used once, in the line:
text = _strip_closing_hashes(m.group("text")).strip()
The function is: def _strip_closing_hashes(s: str) -> str:
s = s.rstrip()
# remove trailing " ###" style closers
s = re.sub(r"[ \t]+#+\s*$", "", s).rstrip()
return s
The ".rstrip()" is unneeded as the ".strip()" does both lstrip and rstrip.I think that rstrip() should be replaced with a strip(), the function renamed to "_get_inline_content", and used as "text = _get_inline_content(m.group("text")).
Also, the Google spec also says "A sequence of # characters with anything but spaces following it is not a closing sequence, but counts as part of the contents of the heading:" so is it really correct to use "\s*" in that regex, instead of "[ ]*"? And does it matter, since the input was rstrip'ped already?
So perhaps:
def _get_inline_content(s: str) -> str:
s = s.rstrip(" ") # remove trailing spaces
s = s.rstrip("#") # removing "#" style closers
return s.strip() # remove leading and trailing whitespace
would be more correct, readable, and maintainable?Currently there is this notion that white collar workers and artists still have which is that they bring "taste" too to the experience but eventually AI will come for those as well, may or may not be LLM, and not sure about timelines.
Even as we speak, when I read through HN comments, I always ask : "Did an AI write this" or did someone use AI to help write their response. This goes beyond HN but any photo or drawing or music I hear now I ask the same question but eventually nobody will care because we are climbing out of uncanny valley very quickly.
What's worse, is that these decisions are usually made on a short-term, quarterly basis. They never consider that slowing down today might save us time and money in the long-term. Better code means less bugs and faster bug-fixes. LLMs only exacerbate the business leader's worst tendencies.
We have autopilot and i'm sure if we tried could automate take off and landing of commercial flights.
But we will keep pilots on planes long after they are needed.
But you still need the pilots because the system can only handle the happy path. As soon as there's any blockade or strong weather change, the autopilot will just turn off. And then you need the pilots.
I would say software engineering with AI is similar: The AI can handle CRUD just fine. But once things get messy, you need someone who can actually think.
The real bottleneck isn’t writing (or even reviewing) code anymore. It’s:
1. extracting knowledge from domain experts
2. building a coherent mental model of the domain
3. making product decisions under ambiguity / tradeoffs
4. turning that into clear, testable requirements and steering the loop as reality pushes back
The workflow is shifting to:
Understand domain => Draft PRD/spec (LLM helps) => Prompt agent to implement => Evaluate against intent + constraints => Refine (requirements + tests + code) => Repeat
The “typing” part used to dominate the cost structure, so we optimized around it (architecture upfront, DRY everywhere, extreme caution). Now the expensive part is clarity of intent and orchestrating the iteration: deciding what to build next, what to cut, what to validate, what to trust, and where to add guardrails (tests, invariants, observability).
If your requirements are fuzzy, the agent will happily generate 5k lines of very confident nonsense. If your domain model + constraints are crisp, results can be shockingly good.
So the scarce skill isn’t “can you write good code?” It’s “can you interrogate reality well enough to produce a precise model—and then continuously steer the agent against that model?”
Automated intelligence is now cheap....
It's widely accepted that you can't learn just by reading, you have to write. So only thinking and reviewing is a great way to lose all the business domain knowledge.
> the thinking part didn't get cheaper -- domain knowledge, edge cases, integration constraints -- none of that is free. what changed is you now review AI output instead of type your own, which is genuinely faster but not as different as it sounds
It's very different - you lose business domain knowledge if you're only reading.
But that doesn't mean we solved world hunger. In the same way, AIs churning out millions of lines of code doesn't mean we have solved software engineering.
Actually, I would argue that high LOCs are a liability, not an asset. We have found a very fast way of turning money into slop, which will then need maintenance and delay every future release. Unless, of course, you have an expert code reviewer who checks the AI output. But in that case, the productivity gains will be max 10%. Because thoroughly reviewing code is almost the same amount of work as writing it.
Code is cheap. Show me the talk
And LLMs aren’t half as good as maintaining code as they are to generate it in the first place. At least yet.