1 - Surprising success when an agent can build on top of established patterns & abstractions
2 - A deep hole of "make it work" when an LLM digs a whole it can't get out of, and fails to anticipate edge cases/discover hidden behavior.
The same things that make it easier for humans to contribute code make it easier for LLMs to contribute code.
I've already heard - many times - that the place that needs the LLMs isn't really inside the code. It's the requirements.
History has a ton of examples of a new technology that gets pushed, but doesn't displace the culture of the makers & shakers. Even though it is more than capable of doing so and indeed probably should.
Why do you think taht the suits refuse to understand what it is?
At least in my experience, they are good at imitating a "visually" similar style, but they'll hide a lot of coupling that is easy to miss, since they don't understand the concepts they're imitating.
They think "Clean Code" means splitting into tiny functions, rather than cohesive functions. The Uncle Bob style of "Clean Code" is horrifying
They're also very trigger-happy to add methods to interfaces (or contracts), that leak implementation detail, or for testing, which means they are testing implementation rather than behavior
You can't really run the experiment because to do it you have to isolate a bunch of software engineers and carefully measure them as they go through parallel test careers. I mean I guess you could measure it but it's expensive and time consuming and likely to have massive experimental issues.
Although now you can sort of run the experiment with an LLM. Clean code vs unclean code. Let's redefine clean code to mean this other thing. Rerun everything from a blank state and then give it identical inputs. Evaluate on tokens used, time spent, propensity for unit tests to fail, and rework.
The history of science and technology is people coming up with simple but wrong untestable theories which topple over once someone invents a thingamajig that allows tests to be run.
I tend to prefer Feature-Centric Layout or even Vertical Slices, where related work is closer together based on what is being worked on as opposed to the type of work being done. I find it to be far more discoverable in practice while able to be simpler and easier to maintain over time... no need to add unnecessary complexity at all. In general, you don't need a lot of the patterns introduced by Clean or Onion structures as you aren't creating multiple, in production, implementations of interfaces and you don't need that type of inheritance for testing.
Just my own take... which of course, has been fighting upstream having done a lot of work in the .Net space.
From https://github.com/feldera/feldera/blob/main/CLAUDE.md:
- Adhere to rules in "Code Complete" by Steve McConnell.
- Adhere to rules in "The Art of Readable Code" by Dustin Boswell & Trevor Foucher.
- Adhere to rules in "Bugs in Writing: A Guide to Debugging Your Prose" by Lyn Dupre.
- Adhere to rules in "The Elements of Style, Fourth Edition" by William Strunk Jr. & E. B. White
e.g., mentioning Elements of Style and Bugs in Writing certainly has helped our review LLM to make some great suggestions about English documentation PRs in the past.
Instead, you should approach it as if instructing the agent to write "perfect" code (whatever that means in the context of your patterns and practices, language, etc.).
How should exceptions be handled? How should parameters be named? How should telemetry and logging be added? How should new modules to be added? What are the exact steps?
Do not let the agent randomly pick from your existing codebase unless it is already highly consistent; tell it exactly what "perfect" looks like.
(BTW: I was not around then, so if I'm guessing wrong here please correct me!)
Over time compilers have gotten better and we're now at the point where we trust them enough that we don't need to review the Assembly or machine code for cleanliness, optimization, etc. And in fact we've even moved at least one abstraction layer up.
Are there mission-critical inner loops in systems these days that DO need hand-written C or Assembly? Sure. Does that matter for 99% of software projects? Negative.
I'm extrapolating that AI-generated code will follow the same path.
No more than the exact order of items being placed in main memory matters now. This used to be a pretty significant consideration in software engineering until the early 1990s. This is almost completely irrelevant when we have ‘unlimited’ memory.
Similarly generating code, refactoring, implementing large changes are easy to a point now that you can just rewrite stuff later. If you are not happy about how something is designed, a two sentence prompt fixes it in a million line codebase in thirty minutes.
I think complex systems will still turn into a big ball of mud and AI agents will get just as bogged down as humans when dealing with it. And even though re-build from scratch is cheaper than ever, it can't possibly be done cheaply while also remembering the millions+ of specific characteristics that users will have come to rely on.
Maybe if you pushed spec-driven development to the absolute extreme, but i don't think pushing it that far is easy/cheap. Just as the effort to go from 90% unit test coverage to 100% is hard and possibly not worth it, I expect a similar barrier around extreme spec-driven.
Clarification: I'm advocating clean code in the generic sense, not Uncle Bob's definition.
There are fundamental truths about complex systems that go beyond "coding". Patterns can be experienced in nature where engineering principals and "prevailing wisdom" are truer than ever.
I suggest you take some time to study systems that are powering critical infrastructure. You'll see and read about grizzled veterans that keep them alive. And how they are even more religious about clean engineering principals and how "prevailing wisdom" is very much needed and will always be needed.
That said there are a lot of spaces where not following wisdom works temporarily. But at scale, it crashes and crumbles. Web-apps are a good example of this.
I have worked on compilers and databases the entire world runs on, the code quality (even before AI) is absolutely garbage.
Real systems built by hundreds of engineers over twenty years do not have clean code.
Nanoseconds matter.
Clean code tends to equal simple code, which tends to equal fast code.
The order of items in memory does matter, as does cache locality. 32Kb fits in L1 cache.
If of course you're talking about web apps then that's just always been the Wild West.
Wat? Approximately every algorithm in CS101 has a clean and simple N^2 version, a long menu of complex N*log(N) versions, and an absolute zoo of special cases grafted onto one of the complex versions if you want the fastest code. This pattern generalizes out of the classroom to every corner of industry, but with less clean names+examples. The universal truth is that speed and simplicity are very quick to become opposing priorities. It happens in nanoseconds, one might say.
Cache-aware optimization in particular tends to create unholy code abominations, it's a strange example to pick for clean=simple=fast wishcasting.
I tend to prefer feature-oriented structures as an alternative, which I do find simpler and easy enough to refactor over time as complexity is required and not before.
This fact alone insinuates that the idea of having unlimited memory or unlimited CPU clocks is just wrong.
[0]: And TypeScript, technically. But I'd consider TypeScript a fork of JavaScript rather than a new language.
The llm is forced to eat its own output. If the output is garbage, its inputs will be garbage in future passes. How code is structured makes the llm implement new features in different ways.
But there will always be a spectrum of structures that are better for the llm to code with, and coding with less optimal patterns will have negative feedback effects as the loop goes on.
I think the OP is right; the problem is context. If you have a nicely modularized codebase where the LLM can neatly process one module at a time, you're in good shape. But two million lines of spaghetti requires too much context. The AI companies may advertise million-token windows, but response quality drops off long before you hit the end.
You still need discipline. Personally I think the biggest gains in my company will not come from smarter AIs, but from getting the codebase modularized enough that LLMs can comfortably digest it. AI is helping in that effort but it's still mostly human driven - and not for lack of trying.
You might be pleasantly surprised if you haven’t yet.
Supporting production applications with low MTTR to me is what matters a lot. If you are relying entirely on your agent to identify and fix a production defect, I'd argue you are out at sea in a very scary place (comprehension debt and all that). It is in these cases where architecture and organization matters, so you can trace the calls and see what's broken. I get that largely the code is a black box as less and less people review the details, but you do have to review the architecture and design still, and that's not going away. To me, things like SRP, SOLID, DRY and ever-more important.
Could you please create a verifiable and reproducible example of this? In my experience, agents get slower the larger a repository is. Maybe I'm just very strict with my prompts, but while initial changes in a greenfield project might take 5-10 minutes for each change, unless you deeply care about the design and architecture, you'll reach 30 minute change cycles way before you reach a million lines of code.
This is largely a solved problem now with better harnesses and 1M context windows.
This is a really funny comment to make when the entire Western economy is propped up by computers doing multiplication of extremely large matrices, which is probably the single most obvious CompSci 101 example of when the placement of data in memory is really, really important.
If it's easier for a human to read and grasp, it will end up using less context and be less error prone for the LLM. If the entities are better isolated, then you also save context and time when making changes since the AoE is isolated.
Clean code matters because it saves cycles and tokens.
If you're going to generate the code anyways, why not generate "pristine" code?. Why would you want the agent to generate shitty code?
c.f.,
https://github.com/johnousterhout/aposd-vs-clean-code
and instead of cleaning your code, design it:
https://www.goodreads.com/en/book/show/39996759-a-philosophy...
Time, feature changes, bugs, emergent needs of the system all drive these sorts of changes.
No amount of "clean code" is going to eliminate these problems in the long term.
All AI is doing is speed running your code base into a legacy system (like the one you describe).
Are you implying legacy systems stop growing because I didn't mean to imply those companies stop growing.
But my counter argument is that the generated code can easily balloon in size and then if you ever have to manually figure out how something works, it is much harder. You'll also end up with a lot of dead or duplicated code.
A single complex change (defined as 'touching many parts') can take Claude code a couple hours to do. I could probably do it in a couple hours, but I can have Claude do it (while I steer it) while I also think about other things.
My current guess is that LLMs are really good at web code because its seen a shitload of it. My experience with it in arenas where there's less open source code has been less magical.
I think you're missing the cost of screwing up design-level decision-making. If you fundamentally need to rethink how you're doing data storage, have a production system with other dependent systems, have public-facing APIs, and so on and so forth, you are definitely not talking about "two sentence prompts". You are playing a dangerous game with risk if you are not paying some of it off, or at the very least, accounting for it as you go.
I am now hand coding the UI because the vibe coded method does not work.
I then looked at the db-agent I was designing and I explicitly told it to create SQL using the LLM, and it does. But the ACTUAL SQL that it persists to the project is a separate SQL generator that it wrote by hand. The LLM one that gets displayed on the screen looks perfect, then when it comes down to committing it to the database, it runs an alternative DDL generator with lots of hard coded CREATE TABLE syntax etc... It's actually a beautiful DDL generator, for something written in like 2015, but I ONLY wanted the LLM to do it.
I started screaming at the agent. I think when they do take over I might be high up on their hit list.
Just anecdata. I still think in a year or two, we'll be right about clean code not mattering, but 2026 might not be that year.
I threw in the towel last night and switched to codex, which has actually been following instructions.
I'm not a fan of Clean Code[1], but the only tip I can give is: Don't instruct the LLM to write code in the form of Clean Code by Robert Martin. Itemize all the things you view as clean code, and put that in CLAUDE.md or wherever. You'll get better luck that way.
[1] I'm also not that anti-Uncle-Bob as some are.
Couple that with a self-correcting loop (design->code->PR review->QA review in playwright MCP->back to code etc), orchestrated by a swarm coordinator agent, and the quality increases even further.
When the training is across code with varying styles, it is going to take effort to get this technology performing in a standardized way, especially when what's possible changes every 3 months.