If anything, it’s the exact opposite. It shows that you can build a crazy popular & successful product while violating all the traditional rules about “good” code.
This isn't a dig at anyone, I've certainly shipped my share of bad code as well. Deadlines, despite my wishes sometimes, continue to exist. Sometimes you have to ship a hack to make a customer or manager happy, and then replacing those hacks with better code just never happens.
For that matter, the first draft of nearly anything I write is usually not great. I might just be stupid, but I doubt I'm unique; when I've written nice, beautiful, optimized code, it's usually a second or third draft, because ultimately I don't think I fully understand the problem and the assumptions I am allowed to make until I've finished the first draft. Usually for my personal projects, my first dozen or so commits will be pretty messy, and then I'll have cleanup branches that I merge to make the code less terrible.
This isn't inherently bad, but a lot of the time I am simply not given time to do a second or third draft of the code, because, again, deadlines, so my initial "just get it working" draft is what ships into production. I don't love it, and I kind of dread of some of the code with my name attached to it at BigCo ever gets leaked, but that's just how it is in the corporate world sometimes.
There are some cases where the most profitable code is also good code. We like those.
But in most (99%+) cases, the code is not going to survive contact with the market and so spending any time on making it good is wasted.
I often have a lot of time between projects, and am able to really think about things, and write the code that I'm happy with. Even when I do that, I do some more research, or work on another project, and immediately I'm picking apart sections of my code that I really took the time to "get right." Sometimes it can be worse if you are given vast amounts of time to build your solution, where some form of deadline may have pushed you to make decisions you were able to put off. At least that's my perspective on it, I feel like if you love writing software, you are going to keep improving nearly constantly, and look back at what you've done and be able to pick it apart.
To keep myself from getting too distressed over looking at past code now, I tend to look at the overall architecture and success of the project (in regards to the performing what it was supposed to, not necessarily monetarily). If I see a piece of code that I feel could have been written far better, I look at how it fits into the rest. I tend to work on very small teams, so I'm often making architecture decisions that touch large areas of the code, so this may just be from my perspective of not working on a large team. I still do think if you care about your craft, you will be harsh on yourself, more than you deserve.
I get a junior developer or a team of developers with varying levels of experience and a lot of pressure to deliver producing crummy code, but not the very tool that's supposed to be the state-of-the-art coder.
I don't actually think it's a solved problem, I'm saying that the fact that it generates terrible code doesn't necessarily mean that it doesn't have parity with humans.
It generates terrible code when used in a nearly open loop manner, which all coding agents are currently doing.
Now we get hyper mass production of the same quality.
People who say they don't care about the quality of code produced by agents are those who haven't been evolving non-trivial codebases with agents long enough to see just how catastrophically they implode after a while. At that point, everyone cares, and that point always comes with today's agents given enough lines and enough changes.
As a user of terrible products, I only care about code quality in as much as the product is crap (Spotify I'm looking at you), or it takes forever for it to evolve/improve.
Biz people don't care about quality, but they're notoriously short sighted. Whoever nerfed Google's search is angering millions of people as we speak.
This guy, supposedly:
Regarding code quality and tech debt, it's sensible not to care if it doesn't lead to anything observable. Do you really care of some "bad" code somewhere that hasn't changed for 5 years but keeps working fine, and has no new requirements?
On the other hand, if you work on an active codebase where fixing one bug inevitably leads to another, maybe it's worth asking whether the code quality is simply too low to deliver on the product expectations.
It's not even obvious to me in which direction coding agents move the needle. Do you want higher quality, at least at a higher (design) level, when you heavily use agents, so that you know know the mess will at least compartmentalized, and easier to deal with later if needed? Or do you just assume the agent will always do the work and you won't need to dig into the code yourself? So far I've mostly done the former, but I understand that for some projects, the latter can make sense.
I wouldnt say that customers are indifferent, but it wouldnt be the first time that investor expectations are prioritized far above customer satisfaction.
Why not? It is subject to the same pressures, in fact it is subject to more time pressure than most corp code out there. Also, it's the model that's doing the coding, not the frontend tool.
I have a subscription to Claude Code and despite my skepticism, it has been pretty good at just getting a goofy PoC thing going. When I look at the code, it’s usually insane unless the prompt was so narrow and specific like about writing a function that does one thing and only one thing.
Outside of small, personal projects, I am still really uncomfortable at having agents run wild. I see the result, and then I spend a bunch of time having to gain the context of what is going on, especially if I ask it to implement features in spaces I have general knowledge, but not expertise. So, the problem remains the same. These things still need handholding by people who understand the domain, but having people become glorified PR reviewers is not an acceptable path forward.
Arguing that there is lots of bad production code kinda avoids the actual issue that is going on here. Yes, a lot of sloppy code can and has been written by people. I’ve seen it myself, but it feels like the actual thing is that, we are now enabling that at scale and calling it “abundance” when instead we are really generating an abundance of completely avoidable security holes and logic errors.
i would guess telling it to "hurry up" would produce even worse code than already does without hand-holding or maybe it would make an excuse again...
> often tries to change behavior
yes, and (in my experience) at the same time re-write the unit tests that are supposed to guarantee behavior doesn't change...Yeah, we even have an idiom for this - "Temporary is always permanent"
But as a great man once said: Later == Never.
Absolutely. The difference is that the amount of bad code that could be generated had an upper limit on it — how fast a human can type it out. With LLMs bad code can be shat out at warp speed.
I think the better unit to commit and work with is the prompt itself, and I think that the prompt is the thing that should be PR'd at this point, because ultimately the spec is what's important.
The fundamental problem there is the code generation step is non-deterministic. You might make a two sentence change to the prompt to fix a bug and the generation introduces two more. Generate again and everything is fine. Way too much uncertainty to have confidence in that approach.
Also, people aren't actually reading through most of the code that is generated or merged, so if there's a fear of deploying buggy code generated by AI, then I assure you that's already happening. A lot.
That said it's so trivial to do, why haven't you done that already?
Somehow, everyone has forgotten the terrible code quality that existed prior to 2020.
https://www.youtube.com/watch?v=UjZQGRATlwA
Like, come on. Software has been shit for decades. AI hasn't observably reduced the quality of software I use everyday in a way that is meaningfully separable from normal incidents in the past.
I have noticed a spike in web apps exhibiting random failures and glitchy behavior over the past few months.
For LLMs, I don't really know. I only have a couple years experience at that.
Everything depends on context. Most code written by humans is indeed, garbage.
I think that this is the problem, actually.
It's similar to writing. Most people suck at writing so badly that the LLM/AI writing is almost always better when writing is "output".
Code is similar. Most programmers suck at programming so badly that LLM/AI production IS better than 90+% (possibly 99%+). Remember, a huge number of programmers couldn't pass FizzBuzz. So, if you demand "output", Claude is probably better than most of your (especially enterprise) programming team.
The problem is that the Claude usage flood is simply identifying the fact that things that work do so because there is a competent human somewhere in the review pipeline who has been rejecting the vast majority of "output" from your programming team. And he is now overwhelmed.
a) a pristine, good codebase that follows the best coding practices, but it is built on top of bad specs, wrong data/domain model
b) a bad codebase but it correctly models and nails the domain model for your business case
Real life example, a fintech with:
a) a great codebase but stuck with a single-entry ledger
b) a bad codebase that perfectly implements a double-entry ledger
Fair, by “perfectly implements” I meant to say that it correctly implemented the core invariant of a double entry ledger (debits = credits), not that it was 100% bug free
Many super talented developers I know will say “Make it work, then make it good”. I think it’s okay to do this on a bigger scale than just the commit cycle.
Make it work, make it work right, make it work fast. In that order.
Who is to judge the "good" or "bad" anyway?
which has always been true
No accounting for taste, but part of makes code hard for me to reason about is when it has lots of combinatorial complexity, where the amount of states that can happen makes it difficult to know all the possible good and bad states that your program can be in. Combinatorial complexity is something that objectively can be expensive for any form of computer, be it a human brain or silicon. If the code is written in such a way that the number of correct and incorrect states are impossible to know, then the problem becomes undecidable.
I do think there is code that is "objectively" difficult to work with.
If you make sure the compiler catches most issues, AI will run it, see it doesn't build and fix what needs to be fixed.
So I agree that a lot of things that make code good, including comments and documentation, is beneficial for AI.
I don't entirely disagree that there is code that's objectively difficult to work with, but I suspect that the Venn diagram of "code that's hard for humans" and "code that's hard for computers" has much less overlap than you're suggesting.
I'm sure that these models will get better, and I agree that the overlap will be lower at that point, but I still think what I said will be true.
I mean, it seems like that has always been true to an extent, but now it may be even more true? Once you know you're sitting on a lode of gold, it's a lot easier to know how much to invest in the mine.
And some people thought they were building "disposable" code, only to see their hacks being used for decades. I'm thinking about VB but also behemoth Excel files.
I hate self-promotion but I posted my opinions on this last night https://blog.tombert.com/Posts/Technical/2026/04-April/Stop-...
The tl;dr of this is that I don't think that the code itself is what needs to be preserved, the prompt and chat is the actual important and useful thing here. At some point I think it makes more sense to fine tune the prompts to get increasingly more specific and just regenerate the the code based on that spec, and store that in Git.
Generating code using a non-deterministic code generator is a bold strategy. Just gotta hope that your next pull of the code slot machine doesn’t introduce a bug or ten.
Given that, we should instead tune the prompts well enough to not leave things to chance. Write automated tests to make sure that inputs and outputs are ok, write your specs so specifically that there's no room for ambiguity. Test these things multiple times locally to make sure you're getting consistent results.
Write them by hand or generate them and check them in? You can’t escape the non-determinism inherent in LLMs. Eventually something has to be locked in place, be it the application code or the test code. So you can’t just have the LLM generate tests from a spec dynamically either.
> write your specs so specifically that there's no room for ambiguity
Using English prose, well known for its lack of ambiguity. Even extremely detailed RFCs have historically left lots of room for debate about meaning and intention. That’s the problem with not using actual code to “encode” how the system functions.
I get where you’re coming from but I think it’s a flawed idea. Less flawed than checking in vibe-coded feature changes, but still flawed.
Yes, written by hand. I think that ultimately you should know what valid inputs and outputs are and as such the tests should be written by a human in accordance with the spec.
> Less flawed than checking in vibe-coded feature changes, but still flawed.
This is what I'm trying to get at. I agree it's not perfect, but I'm arguing it's less evil than what is currently happening.
Observability into how a foundation model generated product arrived to that state is significantly more important than the underlying codebase, as it's the prompt context that is the architecture.
The solution people are coming up with now is using AI for code reviews and I have to ask "why involve Git at all then?". If AI is writing the code, testing the code, reviewing the code, and merging the code, then it seems to me that we can just remove these steps and simply PR the prompts themselves.
I made a similar point 3 weeks ago. It wasn't very well received.
https://news.ycombinator.com/item?id=47411693
You don't actually need source control to be able to roll back to any particular version that was in use. A series of tarballs will let you do that.
The entire purpose of source control is to let you reason about change sets to help you make decisions about the direction that development (including bug fixes) will take.
If people are still using git but not really using it, are they doing so simply to take advantage of free resources such as github and test runners, or are they still using it because they don't want to admit to themselves that they've completely lost control?
Git gives you the series of past snapshots if that's all you want it for, but in infrastructure you don't need to re-invent.
does it have to be free to be useful? the CD part is is even more important than before, and if they still use git as their input, and everyone including the LLM is already familiar with git, whats the need to get rid of it?
there's value in git as a tool everyone knows the basics of, and as a common interface of communicating code to different systems.
passing tarballs around requires defining a bunch of new interfaces for those tarballs which adds a cost to every integration that you'd otherwise get for about free if you used git
I think this is the case, or at least close.
I think a lot of people are still convincing themselves that they are the ones "writing" it because they're the ones putting their names on the pull request.
It reminds me of a lot of early Java, where it would make you feel like you were being very productive because everything that would take you eight lines in any other language would take thirty lines across three files to do in Java. Even though you didn't really "do" anything (and indeed Netbeans or IntelliJ or Eclipse was likely generating a lot of that bootstrapping code anyway), people would act like they were doing a lot of work because of a high number of lines of code.
Java is considerably less terrible now, to a point where I actually sort of begrudgingly like writing it, but early Java (IMO before Java 21 and especially before 11) was very bad about unnecessary verbosity.
Also, the approach you described is what a number of AI for Code Review products are using under-the-hood, but human-in-the-loop is still recognized as critical.
It's the same way how written design docs and comments are significantly more valuable than uncommented and undocumented source.
Ive noticed that theyre often quite bad at refactoring, also.
Some business models will require “good” code, and some won’t. That’s how it is right now as well. But pretending that all business models will no longer require “good” code is like pretending that Michelin should’ve retired its list after the microwave was invented.
Research in academia seems less appropriate because that’s famously not really a business model, except maybe in the extractive sense
As far as good or bad, how food is made is irreverent to the outcome if it's enjoyable.
Now whether this is still true with AI, or if vibe coding means bad code no longer have this long term stability and velocity cost because AI are better than humans at working with this bad code... We don't know yet.
2. The problem with "bad code" has nothing to do with the short-term success of the product but with the ability to evolve it successfully over time. In other words, it's about long-term success, not short-term success.
3. Perhaps most importantly, Claude Code is a fairly simple product at its core, and almost all its value comes from the model, not from its own code (and the same is true on the cost side). Claude Code is relatively a low stakes product. This means that the problems caused by bad code matter less in this instance, and they're managed further by Claude Code not being at the extreme "vibey" end of the spectrum.
So AI aside, Claude Code is proof that if you pour years and many billions into a product, it can be a success even if the code in the narrow and small UI layer isn't great.
There's this definition of LLM generation + "no thorough review or testing"
And there's the more normative one: just LLM generation.[1][2][3]
"Not even looking at it" is very difficult as part of a definition. What if you look at it once? Or just glance at it? Is it now no longer vibe coding? What if I read a diff every ten commits? Or look at the code when something breaks?
At which point is it no longer vibe coding according to this narrower definition?
[1] https://www.collinsdictionary.com/dictionary/english/vibe-co...
[2] https://www.merriam-webster.com/dictionary/vibe%20coding
If you actually look at the code and understand it and you'd stand by it, then it's not vibecode. If you had an LLM shit it out in 20 minutes and you don't really know what going on, it's vibecode. Which, to me, is not derogatory. I have a bunch of stuff I've vibecoded and a bunch of stuff that I've actually read the code and fixed it, either by hand or with LLM assistance. And ofc, all the code that was written by me prior to ChatGPT's launch.
But my point was that I don't think the development of Claude Code itself isn't supervised, hence it's not really "vibe coded".
When I used to contract code for some game engine stuff back in the 2000's early 2010's, that was essentially my working standard. Essentially gave a boiler plate terms of my work. I will make it work, I will get it done quickly, it will run fast BUT you will incur a lot of technical debt in doing so. In a games engine it isn't a big deal, you are essentially just pushing pixels around a screen in a sand-boxed system. "Yeah, the decompression system does some odd things with memory addresses but it is quick and plays nice with your streaming system. Just don't change the overall mapping too much and it should ship ok.". "In this level of your game, I have this explicit exception code that prevents that streaming system from booting out too much necessary data. Don't change this unless you want a headache"
I shudder to think of what that style of code would do in a work environment with serious consequences.
I suspect we will find out a lot more of this in the decades to come.
This codebase has existed for maybe 18 months, written by THE experts on agentic coding. If it is already unintelligible, that bodes poorly for how much it is possible to "accelerate" coding without taking on substantial technical debt.
i.e., the claude code codebase doesn't need to be good right now [^1] — so i don't think the assumption that this is an exemplary product / artifact of expert agentic coding actually holds up here specifically
[^1]: the startup graveyard is full of dead startups with good code
- Good code is what enables you to be able to build very complex software without an unreasonable number of bugs.
- Good code is what enables you to be responsive to changing customer needs and times. Whether you view that as valuable is another matter though. I guess it is a business decision. There have been plenty of business that have gone bust though by neglecting that.
Good code is for your own sanity, the machine does not care.
So that sounds to me like it is evidence vibe coding doesn't work well long term.
The situation there is akin to Viaweb - Viaweb also rode hype wave and code situation was awful as well (see PG's stories about fixing bugs during customer's issue reproduction theater).
What did Viaweb's buyer do? They rewrote thing in C++.
If history rhymes, then buyer of Anthropic would do something close to "rewrite it in C++" to the current Claude Code implementation.
While there are no companies with $1.5 trillions (4*$380B) of net revenue, the difference is that Anthropic is cash net-negative, has more than 4 people in staff (none of them are hungry artists like PG) and hardware use spendings, I think, are astronomical. They are cash net-negative because of hardware needed to train models.
There should be more than one company able to offer good purchase terms to Anthropic's owners.
I also think that Anthropic, just like OpenAI and most of other LLM companies and companies' departments, ride "test set leakage," hoping general public and investors do not understand. Their models do not generalize well, being unable to generate working code in Haskell [1] at the very least.
[1] https://haskellforall.com/2026/03/a-sufficiently-detailed-sp...
PG's Viaweb had an awful code as a liability. Anthropic's Claude Code has an awful implementation (code) and produces awful code, with more liability than code written by human.
isn't that pretty much why anthropic and openai are racing to IPO?
I do M&As at my company - as a cto. I have seen lots of successful companies' codebases, and literally none of them elegant. Including very profitable companies with good, loved products.
The only good code I know is in the open source domain and in the demoscene. The commercial code is mostly crap - and still makes money.
Occasionally, in IRL you hear the feel good story how Fred smith gambled the last $5,000 to save FedEx and so on, but most people with that mindset end up crashing out.
Vibe coding a product runs the risk of acquiring too much tech debt before project is successful.
Product Market Fit is very hard, you need to keep enough room for pivots. Changes in direction will always accumulate debt, even when tech is well written. It is far more difficult when you accumulate debt quickly.
The counterpoint being that procrastinating and over-engineering prematurely or building lot of unrelated tooling and loosing focus can also bring the product down quickly or never let it start .
Being able to vibe code POCs etc is a great tool if done in a controlled limited well defined way.
Just as borrowing cash on your credit card is not always bad, it just usually is.
Perhaps the problem is getting multiple vibe-coders synced up when working on a large repo.
> Everybody can tell you how to do it, they never did it
> —jay-z
Not the front end
That’s the rub, yes - as long as your failures are nice and gradual and proportional to the changes you’re making, everything’s fine.
it's easy to see how the product (claude code) could be abstracted to spec form and then a future version built from that without inheriting previous iterations tech debt
It already costed many developers months and hundreds of dollars worth of tokens because of a bug. There will be more.
I wouldn't recommend neglecting tactics if your strategy doesn't put you on the good side of a generational bubble though.
Each one is broken, doesn’t have working error handling, and prevents you from giving them money. They all exist to insert the same record somewhere. Lost revenue, and they seem to have no idea.
Amazons flagship ios app has had at least three highly visible bugs, for years. They’re like thorns in my eye sockets, every time I use it. They don’t care.
These companies are working with BILLIONS of dollars in engineering resources, unlimited AI resources, and with massive revenue effects for small changes.
Sometimes the world just doesn’t make sense.
AI could play a big rule here. Husky (git hook) but AI. It will score lazy engineering. You lazy implement enough times, you loose your job.
Maybe there’s a reason Netflix makes you click on the ONE user profile on the account, repeatedly, even if it feels like sheer stupidity to their users. At least it’s not costing them revenue, directly.
Amazons ios app not properly handling state change after checkout, for years? Probably not directly costing them millions. Only second order disengagement.
But Walmart keeps pushing a thing you don’t want, because you looked at it once? Amazon solved this. It’s not a major fix, and it’s using a valuable slot that costs them money. Walmart just doesn’t fix it.
Meta refusing to take people’s advertising dollars because ALL of their page creation pages have unhandled breaking flows in them? That’s lost money for no reason at all. And you’re telling me they don’t realize how janky it is to try to maintain four implementations of that?
Apple App Store Connect and Ads platform? Don’t get me started.
Again, all with unlimited pools of the smartest people on earth, unlimited AI, and a billion people testing for them…
Social capital just isn't given out to people that fix things in a lot of these companies, but instead those who ship a 1.0a.
On the management/product side, the inevitable issues are problem for another quarter. On the engineering side, it's a problem for the poor shmucks who didn't get to jump to the next big thing.
Neither of those groups instructionally care about the mess they leave in their wake, and such guardrails they'd perceive as antithetical to releasing the next broken but new, fancy feature.
The success is undeniable, but whether this vibe-coded level of quality is acceptable for more general use cases isn't something you can infer from that.
claude code, the app, is also not some radically complex concept (even if the codebase today is complicated)
but hey, that's why people do version breaking rewrites
We already knew that. This is a matter of people who didn't know that or didn't want to acknowledge that thinking they now have proof that it doesn't matter for creating a crazy popular & successful product, as if it's a gotcha on those who advocate for good practices. When your goal is to create something successful that you can cash out, good practices and quality are/were never a concern. This is the basis for YAGNI, move-fast-and-break-things, and worse-is-better. We've know this since at least betamax-vs-VHS (although maybe the WiB VHS cultural knowledge is forgotten these days).
WiB doesn't mean the thing is worse, it means it does less. Claude Code interestingly does WAY more than something like Pi which is genuinely WiB.
Move Fast and Break Things comes from the assumption that if you capture a market quick enough you will then have time to fix things.
YAGNI is simply a reminder that not preparing for contingencies can result in a simpler code base since you're unlikely to use the contingencies.
The spaghetti that people are making fun of in Claude Code is none of these things except maybe Move Fast and Break Things.
Also to correct another common myth, porn was widely available on both formats and was not the cause of VHS’s success over Betamax.
But we (the dev community) are kind of spoiled, because we have a lot of great developer tools that come from people passionate about their work, skilled at what they do and take pride in what they put out. I don't count myself among one of those people but I have benefited from their work throughout my career and have gotten used to it in my tooling.
All that being said Opus is hands down the best coding model for me (and I'm actively trying all of them) and I'll tolerate it as long as I can get it to do what I need, even with the warts and annoyances.
I don't wholly disagree, but personally it's still the tool I use and it's sort of fine. Perhaps not entirely for the money that's behind it, as you said, but it could be worse.
The CLI experience is pretty okay, although the auth is kinda weird (e.g. when trying to connect to AWS Bedrock). There's a permission system and sandboxing, plan mode and TODOs, decent sub-agent support, instruction files and custom skills, tool calls and LSP support and all the other stuff you'd expect. At least no weird bugs like I had with OpenCode where trying to paste multi-line content inside of a Windows Terminal session lead to the tool closing and every next line getting pasted in an executed in the terminal one by one, that was weird, though I will admit that using Windows feels messed up quite often nowadays even without stuff like that.
The desktop app gives you chat and cowork and code, although it almost feels like Cowork is really close to what Code does (and for some reason Cowork didn't seem to support non-OS drives?). Either way, the desktop app helps me not juggle terminal sessions and keeps a nice history in the sidebar, has a pretty plan display, easy ways of choosing permissions and worktrees, although I will admit that it can be sluggish and for some actions there just aren't progress indicators which feels oddly broken.
I wonder what they spend most of their time working on and why the basics aren't better, though to Anthropic's credit about a month ago the desktop Code section was borderline unusable on Windows when switching between two long conversations, which now seems to take a few seconds (which is still a few seconds too long, but at least usable).
What harness would you recommend instead?
Normally some software devs should be fired for that.
The tooling can be hacky and of questionable quality yet, with such a model, things can still work out pretty well.
The moat is their training and fine-tuning for common programming languages.
It's a bit of both. Claude Code was the tool that made Anthropic's developer mindshare explode. Yes, the models are good, but before CC they were mostly just available via multiplexers like Cursor and Copilot, via the relatively expensive API.
And at first glance, none of it was about complex runtime optimizations not present in Node, it was all "standard" closure-related JS/TS memory leak debugging (which can be a nightmare).
I don't have a link at hand because threads about it were mostly on Xitter. But I'm sure there are also more accessible retros about the posts on regular websites (HN threads, too).
After some experience, it feels to me (currently primarily a JS/TS developer) like most SPAs are ridden by memory leaks and insane memory usage. And, while it doesn't run in the browser, the same think seems to apply to Claude CLI.
Lexical closures used in long-living abstractions, especially when leveraging reactivity and similar ideas, seems to be a recipe for memory-devouring apps, regardless of browser rendering being involved or not.
The problems metastasize because most apps never run into scenarios where it matters, a page reload or exit always is close enough on the horizon to deprioritize memory usage issues.
But as soon as there are large allocations, such as the strings involved in LLM agent orchestration, or in non-trivial other scenarios, the "just ship it" approac requires careful revision.
Refactoring shit that used to "just work" with memory leaks is not always easy, no matter whose shit it is.
if you have one of the top models in a disruptive new product category where everyone else is sprinting also, sure..
Code quality only matters in maintainability to developers. IMO it's a very subjective metric
Code quality = less bugs long term.
Code quality = faster iteration and easier maintenance.
If things are bad enough it becomes borderline impossible to add features.
Users absolutely care about these things.
How do you measure code quality?
> Users absolutely care about these things.
No, users care about you adding new features, not in your ability to add new features or how much it cost you to add features.
You don’t have to go far on this site to find someone that doesn’t like Claude code.
If you want an example of something moronic, look at the ram usage of Claude code. It can use gigabytes of memory to work with a few megabytes of text.
In the current market, most people using one LLM are likely going to have a positive view of it. Very little is forcing you to stick with one you dislike aside from corporate mandates.
To be fair, their complaints are about very recent changes that break their workflow, while previously they were quite content with it.
Can't wait to see how much public money they need going forward! Hopefully our progeny don't die in the subsequent climate crisis before they can unleash true shareholder value.
Anthropic et al. better figure it out sooner rather than later because this game they’re all playing where they want all of us to use basically beta-release tools (very generous in some cases) to discover the “real value” of these tools while they attempt to reduce their burn with unsustainable subscription prices can’t go on forever.
A lot of dollars fix a lot of mistakes.
Countless Excel-Solutions are popular and successful but the Macro behind bad.
Bad code doesn’t not work, unfortunately it works.
The negative emotion regex, for example, is only used for a log/telemetry metric. Sampling "wtf?" along would probably be enough. Why would you use an agent for that?
I don't see how a vibe-coded app is freed from the same trade-offs that apply to a fast-moving human-coded one.
Especially since a human is still driving it, thus they will take the same shortcuts they did before: instead of a formal planning phase, they'll just yolo it with the agent. Instead of cleaning up technical debt, they want to fix specific issues that are easy to review, not touch 10 files to do a refactor that's hard to review. The highest priority issues are bugs and new integrations, not tech debt, just like it always was.
This is really just a reminder of how little upside there is to coding in the open.
Claude’s source code is fine for a 1-3 person team. It’s atrocious for a flagship product from a company valued over $380 BILLION.
Like if that’s the best ai coding can do given infinite money? Yeah, the emperor has no clothes. If it’s not the best that can be done, then what kinda clowns are running the show over there?
If they DIDN'T heavily vibe-code it they might fall behind. Speed of implementation short term might beat out long-term maintenance and iteration they'd get from quality code
They're just taking on massive tech debt
For you and I, sure - sprint as fast as we can using whatever means we can find. But when you have infinite money, hiring a solid team of traditional/acoustic/human devs is a negligible cost in money and time.
Especially if you give those devs enough agency that they can build on the product in interesting and novel ways that the ai isn’t going to suggest.
Everything is becoming slop now, and it almost always shows. I get why when you’re resource constrained. I don’t get why when you’re not.
Every dollar spent is a dollar that shareholders can't have and executives can't hope for in their bonuses
Seems like you're also under the impression that privately developed software should be immaculate if the company is worth enough billions, but you'd be wrong about that too.
Either they're massively overpaying some scrubs to underperform with the new paradigm, or they are squeezing every last drop out of vibe coding and this is the result.
It shows that you can have a garbage front end if people perceive value in your back end.
It also means that any competitor that improves on this part of the experience is going to eat your lunch.
For you, non-buggy software is important. You could also reasonably take a more business centered approach, where having some number of paying customers is an indicator of quality (you've built something people are willing to pay for!) Personally I lean towards the second camp, the bugs are annoying but there is a good sprinkling of magic in the product which overall makes it something I really enjoy using.
All that is to say, I don't think there is a straightforward definition of quality that everyone is going to agree on.
I can literally see my teams codebase becoming an unmaintainable nightmare in front of my eyes each day.
I use copilot and Claude code and I frequently have to throw away their massively verbose and ridiculously complex code and engage my withering brain to come up with the correct solution that is 80% less code.
I probably get to the solution in the same time when all is said and done.
Honestly what is going on. What are we doing here?
Well, if unmaintainable code gets in the way of the "sustained over time" part, then that is still a real problem.
They only seem to operate as "extract as much value as possible in a short amount of time and exit with your bag", these days
Obviously it does some fairly smart stuff under the hood, but it's not exactly comparable to a large software project.
But to your point, that doesn't mean you can't vibe code some poorly built product and sell it. But people have always been able to sell poorly built software projects. They can just do it a bit quicker now.
I don't know why people keep acting like harnesses are all the same but we know they aren't because people have swapped them out with the same models and receive vastly different results in code quality and token use.
This is similar to retarded builders in Turkey saying “wow, I can make the same building, sell for the same price, but spend way less” and then millions of people becoming victim when there is an earthquake.
This is not how responsible people should think about things in society
Getting money is 100% what it is about and Claude Code is great product.
You're not alone in thinking that, but unfortunately I think it's a minority opinion. The only thing most people and most businesses care about is money. And frankly not even longterm, sustainable money. Most companies seem happy to extract short term profits, pay out the executives with big bonuses, then rot until they collapse
It's not a proof vibe-coding doesn't work. It's a proof it's shitty, rube-goldberg, crappy code. It doesn't mean there aren't other shitty products out there (the heavy turds Microsoft produced throughout the years do comes to mind for example).
But when you've got a project upvoted here recently complaining that people do run into issue while quickly cut/pasting from Claude Code CLI to, say, Bash to test something because of Unicode characters in Claude Code CLI's output... And when you realize that it's because what Claude Code CLI shows you in the TUI is not the output of the model because there's an entire headless browser rendering the model's output to a webpage, which is then converted to text (and changing at that moment ASCII chars for Unicode ones), you realize that some serious level of fucktardery is ongoing.
It's normal that at times people aren't going full agentic and shall want to cut/paste what they see. I'm not the one complaining: I saw a project complaining about it and people are affected by that terribly dumb ASCII-to-Unicode conversion of characters.
When you can produce turds by the kilometer, a near infinity of turd is going to be produced.
We're not saying it's not working: I pay an Anthropic subscription and use it daily... We're saying it's a piece-of-shit of a codebase.
Shit that works is still shit.
If anyone from Anthropic is reading: STOP FUCKING CHANGING THE CHARACTERS THAT YOUR MODEL DOES OUTPUT.
(now of course it's just one issue out of thousands, but it'd be a nice one to fix)
To me it said, clearly: nobody cares about your code quality other than your ability to ship interesting features.
It was incredibly eye-opening to me, I went in expecting different lessons honestly.
That was always the case. Landlords still want rent, the IRS still has figurative guns. Shipping shit code to please these folks and keep the company alive will always win over code quality, unless the system can be edited to financially incentivize code quality. The current loss function on society is literally "ship shit now and pay your taxes and rent".
The product is also a bit wonky and doesn't always provide the benefits it's hyped for. It often doesn't even produce any result for me, just keeps me waiting and waiting... and nothing happens, which is what I expect from a vibe coded app.
What? Your comment makes absolutely zero sense. Legal team forces people to use Claude Code?
And they don't need a massive legal team to declare that you can't use their software subscription with other people's software.
Claude Code is being produced at AI Level 7 (Human specced, bots coded), whereas the author is arguing that AI Level 6 (Bots coded, human understands somewhat) yields substantially better results. I happen to agree, but I'd like to call out that people have wildly different opinions on this; some people say that the max AI Level should be 5 (Bots coded, human understands completely), and of course some people think that you lose touch with the ground if you go above AI Level 2 (Human coded with minor assists).
Building the rendering pipeline, algorithms, maths, I've turned off even level 2. It is just more of a distraction than it's worth for that deep state of focus.
So I imagine at least some of the disconnect comes from the area people work in and its novelty or complexity.
This attribute plus a bit of human tribalism, social echo-chambering, & some motivated reasoning by people with a horse in the race, easily explains the discord I see in rhetoric around AI.
The fact is, I think the art of building well with AI (and I'm not saying it's easy) is to have a heterogenously vibe-coded app.
For example, in the app I'm working on now, certain algorithmically novel parts are level 0 (I started at level 1, but this was a tremendously difficult problem and the AI actually introduced more confusion than it provided ideas.)
And other parts of the app (mostly the UI in this case) are level 7. And most of the middleware (state management, data model) is somewhere in between.
Identifying the appropriate level for a given part of the codebase is IMO the whole game.
I could probably get to a 7 with some additional tooling and a second max 20 account, but I care too much about the product I'm building right now. Maybe for something I cared less about.
IMO if you're going 7+, you might as well just pick a statically typed and very safe (small surface area) language anyways, since you won't be coding yourself.
This part of your post I think signals that you are either very new or haven't been paying attention; single developers were outperforming entire teams on the regular long before LLMs were a thing in software development, and they still are. This isn't because they're geniuses, but rather because you don't get any meaningful speedup out of adding team members.
I've always personally thought there is a sweet spot at about 3 programmers where you still might see development velocity increase, but that's probably wrong and I just prefer it to not feel too lonely.
In any case teams are not there to speed anything up, and anyone who thinks they are is a moron. Many, many people in management are morons.
There may be certain fields where you can't even get to 5.
Thanks for that list of levels, it's helpful to understand how these things are playing out and where I'm at in relation to other engineers utilizing LLM agents.
I can say that I feel comfortable at approximately AI level 5, with occasional forays to AI level 6 when I completely understand the interface and can test it but don't fully understand the implementation. It's not really that different from working on a team, with the agent as a team member.
> of course some people think that you lose touch with the ground if you go above AI Level 2
I really think that this framing sometimes causes a loss of granularity. As with most things in life, there is nuance in these approaches.
I find that nowadays for my main project I where I am really leaning into the 'autonomous engineering' concept, AI Level 7 is perfect - as long as it is qualified through rigorous QA processes on the output (ie it is not important what the code does if the output looks correct). But even in this project that I am really leaning into the AI 'hands-off' methodology, there are a few areas that dip into Level 5 or 4 depending on how well AI does them (Frontend Design especially) or on the criticality of the feature (in my case E2EE).
The most important thing is recognizing when you need to move 'up' or 'down' the scale and having an understanding of the system you are building
I’m not sure I believe that Level 7 exists for most projects. It is utterly *impossible* for most non-trivial programs to have a spec that doesn’t not have deep, carnal knowledge of the implementation. It can not be done.
For most interesting problems the spec HAS to include implementation details and architecture and critical data structures. At some point you’re still writing code, but in a different language, and it migtt hurt have actually been better to just write the damn struct declarations by hand and then let AI run with it.
The PRs that it comes with are rarely even remotely controversial, shrink the codebase, and are likely saving tokens in the end when working on a real feature, because there's less to read, and it's more boring. Some patterns are so common you can just write them down, and throw them at different repos/sections of a monorepo. It's the equivalent of linting, but at a larger scale. Make the language hesitant enough, and it won't just be a steamroller either, and mostly fix egregrious things.
But again, this is the opposite of the "vibe coding" idea, where a feature appears from thin air. Vibe Linting, I guess.
I sick Opus, GPT5.4, and Gemini on it, have them write their own hitlists, and then have a warden Opus instance go and try to counterprove the findings, and compose a final hitlist for me, then a fresh context instance to go fix the hitlist.
They always find some little niggling thing, or inconsistency, or code organization improvement. They absolutely introduce more churn than is necessary into the codebase, but the things they catch are still a net positive, and I validate each item on the final hitlist (often editing things out if they're being overeager or have found a one in a million bug that's just not worth the fix (lately, one agent keeps getting hung up on "what if the device returns invalid serial output" in which case "yeah, we crash" is a perfectly fine response)).
If you make an assertion in a blog post, I have no idea if you got the information from a respected scientific journal, or Reddit, or InfoWars, or the writing of a bathroom stall. It's hard to know if the assertion is grounded in reality or just something you made up.
The response I get to this is universally "LOL just look it up yourself man!", but that feels like a cop out. When I write blog posts, I put inline links all over the place to try and justify my assertions to show where I'm getting this information. If I sourced some bad information from a bad source, it's clear to know where I got it from and you can either notify me or disregard the assertion.
- Shills or people with a financial incentive
- Software devs that either never really liked the craft to begin with or who have become jaded over time and are kind of sick of it.
- New people that are actually experiencing real, maybe over-excitement about being able to build stuff for the first time.
Forgetting the first group as that one is obvious.
I’ve encountered a heap of group 2. They’re the ones sick of learning new things, for whatever reason. Software work has become a grind for them and vibe coding is actually a relief.
Group 3 I think are mostly the non-coders who are genuinely feeling that rush of being able to will their ideas into existence on a computer. I think AI-assisted coding could actually be a great on-ramp here and we should be careful not to shit on them for it.
I love coding. I taught myself from a book (no internet yet) when I was 10, and haven’t stopped for 30 years. Turned down becoming a manager several times. I loved it so much that I went through an existential crisis in February as I had to let go of that part of my identity. I seriously thought about quitting.
But for years, it has been so frustrating that the time it took me to imagine roughly how to build something (10-30 minutes depending on complexity) was always dwarfed by the amount of time it took to grind it out (days or sometimes weeks). That’s no longer true, and that’s incredibly freeing.
So the game now is to learn to use this stuff in a way that I enjoy, while going faster and maintaining quality where it matters. There are some gray beards out there who I trust who say it’s possible, so I’m gonna try.
Over a weekend, I used ChatGPT to set up Prometheus and Grafana and added node exporters to everything I could think of. I even told ChatGPT to create NOC-style dashboards for me, given the metrics I gave it. This is something that would have painstakingly take several weeks if not more to figure out, and it's something I've been wanting to do but the cognitive load and anticipatory frustration was too high for me to start. I love how it enables me to just do things.
My next step is to integrate some programs that I wrote that I still use every day to collect data and then show it on the dashboards as well.
On a side note, I don't know why Grafana hasn't more deeply integrated with AI. Having to sift through all the ridiculous metrics that different node exporters advertise with no hint of naming convention makes using Grafana so much harder. I cut and pasted all the metrics and dumped it into ChatGPT and told it to make the panels I wanted (ex. "Give me a dashboard that shows the status of all my servers" and it's able to pick and choose the correct metrics across my Windows server, Macbooks and studio, my Linux machines, etc), but Grafana should have this integrated themselves directly into themselves.
I think it's easy to dismiss that group, but the truth is there was a lot of flux in our industry in the last decade before AI, and I would say almost none of it was beneficial in any way whatsoever.
If I had more time I could write an essay arguing that the 2010s in software development was the rise of the complexity for complexity's sake that didn't make solving real world problems any easier and often massively increased the cost of software development, and worse the drudgery, with little actually achieved.
The thought leaders were big companies who faced problems almost no-one else did, but everyone copied them.
Which led to an unpleasant coding environment where you felt like a hamster spinning in a wheel, constantly having to learn the new hotness or you were a dinosaur just to do what you could already do.
Right now I can throw a wireframe at an AI and poof it's done, react, angular, or whatever who-gives-a-flying-sock about the next stupid javascript framework it's there. Have you switched from webpack to vite to bun? Poof, AI couldn't care less, I can use whatever stupid acronym command line tool you've decided is flavour of the month. Need to write some Lovecraftian-inspired yaml document for whatever dumbass deploy hotness is trending this week? AI has done it and I didn't have to spend 3 months trying to debug whatever stupid format some tit at netflix or amazon or google or meta came up with because they literally had nothing better to do with their life and bang my head against the wall when it falls over every 3 weeks but management are insisting the k8s is the only way to deploy things.
I say this kindly, but are you sure that _you_ aren't the one in group 2, and _they_ aren't the ones learning new things?
A lot of the discourse around ai coding reminds me of when I went to work for a 90s tech company around 2010 and all the linux guys _absolutely refused_ to learn devops or cloud stuff. It sucks when a lifetime of learned skills becomes devalued over night.
Both of these camps are the loudest voices on the internet, but there is a quiet but extremely productive camp somewhere in the middle that has enough optimism, open mindedness along with years of experience as an engineer to push Claude Code to its limit.
I read somewhere that the difference between vibe coding and "agentic engineering" is if you are able to know what the code does. Developing a complex website with claude code is not very different than managing a team of off shore developers in terms of risks.
Unless you are writing software for medical devices, banking software, fighter jets, etc... you are doing a disservice to your career by actively avoiding using LLMs as a tool in developing software.
I have used around $2500 in claude code credits (measured with `bunx ccusage` ) the last 6 months, and 95% of what was written is never going to run on someone else's computer, yet I have been able to get ridiculous value out of it.
How do you quantify and measure this productivity gain?
There were articles as late as the late 1990s that suggested that investing in IT was a waste of money and had not improved productivity.
You will not see obvious productivity gains until the current generation of senior engineers retires and you have a generation of developers who have only ever coded with AI, since they were in school.
Eventually companies figured out how to use them effectively and eventually useful software was created. But, at the start of the whole thing, there was a lot of waste.
Quite a lot of people are now paying a lot for ai that makes them produce less and lower quality. Because it feels good and novel.
As if 97% of web apps aren't just basic CRUD with some integration to another system if you are lucky.
99% of companies won't even have 50k users.
I think that citizen developers will be a thing--but not in the way you might be thinking.
More people will be enabled (and empowered) to "build" quick-and-dirty solutions to personal problems by just talking to their phone: "I need way to track my food by telling you what I ate and then you telling me how much I have left for today. And suggest what my next meal should be."
In the current paradigm--which is rapidly disappearing--that requires a UI app that makes you type things in, select from a list, open the app to see what your totals are, etc. And it's a paid subscription. In 6 months, that type of app can be ancient history. No more subscription.
So it's not about "writing apps for SaaS subscribers." It's about not needing to subscribe to apps at all. That's the disruption that's taking place.
Crappy code, maintenance, support, etc.--no longer even a factor. If the user doesn't like performance, they just say "fix ___" and it's fixed.
What subscription apps can't be replaced in this disruption? Tell me what you think.
- Would the agent go through current app user flows OpenClaw style? Wildly insecure, error-prone, expensive.
- Tapping in to some sort of third party APIs/MCPs. authed, metered, documented how and by which standard to be not abused and hacked?
The unhyped truth is that LLMs are just wildly more competent autocomplete, and there is no such disruption in sight. The status quo of developers and users mostly remains.
Today I asked ChatGPT to make me a weekly calorie plan and it was perfect. But then I still use MyFitnessPal to log my calories because their food database is outstanding, and the UX of scanning food barcodes is unbeatable. They have the most niche items in my country, Spain.
How are LLMs going any of that? An app is often much more than a CRUD interface.
Maybe I could build a custom app that scans the nutrition facts table and with voice I could explain how much I ate or something - I’m technical, but really, I have better things to do and I’d rather pay MFP 10 bucks a month.
https://world.openfoodfacts.org/
Both would make a great foundation for this sort of app. OFF is crowdsourced and does include barcode information. I have no idea how robust the dataset is for your geography though. If I were to build something like this for personal use, I'd be looking at a PWA that can leverage the camera for barcode scanning. I'd work with the existing crowd sourced database as well as provide a mechanism for "manual" entry which should just be scanning a barcode and taking a picture of the nutrition information. I've personally built systems like this before and all of these things are well within the capability of most SOTA LLM to build out.
When you move to the enterprise layer, suddenly you get the opposite problem, you have a low amount of "users" but you often need a load of CPU intensive or DB intensive processing to happen quickly.
One company I worked for had their system built by, ummmm, not the greatest engineers and were literally running out of time in the day to run their program.
Every client was scheduled over 24 hours, and they'd got to running the program for 22 hours per day and were desperately trying to fix it before they ran out of "time". They couldn't run it in parallel because part of the selling point of the program was that it amalgamated data from all the clients.
Some sort of check point system could likely save significant IO.
What am I missing that requires you to recompute all data every day?
It's an important distinction
Disruption happens when firms are disincentivized to switch to the new thing or address the new customer because the current state of it is bad, the margins are low. Intel missed out on mobile because their existing business was so excellent and making phone chips seemed beneath them.
The funny thing is that these firms are being completely rational. Why leave behind high margins and your excellent full-featured product for this half-working new paradigm?
But then eventually, the new thing becomes good enough and overtakes the old one. Going back to the Intel example, they felt this acutely when Apple switched their desktops to ARM.
For now, Claude Code works. It's already good enough. But unless we've plateaued on AI progress, it'll surpass hand crafted equivalents on most metrics.
Consider this overly simplified process of writing a logic to satisfy a requirement:
1. Write code
2. Verify
3. Fix
We, humans, know the cost of each step is high, so we come up various way to improve code quality and reduce cognitive burden. We make it easier to understand when we have to revisit.
On the other hand, LLMs can understand** a large piece of code quickly***, and in addition, compile and run with agentic tools like Claude Code at the cost of token****. Quality does not matter to vibe coders if LLMs can fill the function logic that satisfies the requirement by iterating the aforementioned steps quickly.
I don't agree with this approach and have seen too many things broken from vibe code, but perhaps they are right as LLMs get better.
* Anecdotal
** I see LLM as just a probabilistic function so it doesn't "reason" like humans do. It's capable of highly advanced problem solving yet it also fails at primitive task.
*** Relative to human
**** Cost of token I believe is relatively cheaper compared to a full-time engineer and it'll get cheaper over time.
I don't know how true this is going to be, at least in the short term. The big providers are likely running at a loss and, as models have gotten better, they've also crept up in price as well.
They/you are counting on them hitting a point where it is actually cheap for the value provided (after they take some off the top) but I don't see that as inevitable before these companies go under or pivot into much more specialized tools for big clients.
It's not clear to me that AI code is cheaper than human code (of equal functionality).
Also, to those who say "this is proof that code quality doesn't matter any more", let's have this chat 5 years from now when they're crumbling under the weight of their own technical debt :)
> AI is whatever hasn’t been done yet
From a tech discourse perspective, things have never been less productive than they are right now. I feel like we’re witnessing the implosion of an industry in real time. Thanks in no small part to venture capital and its henchmen.
Everyone seems to be drinking the proverbial kool-aid, and everyone else who is looking at the situation skeptically are labeled luddites. I expect we’ll get some clarity over the next few years on who is right. But I don’t know. It feels like the breakdown of shared epistemology. The kind of shared epistemology on which civilization was built.
A seasoned engineer using AI-assisted coding while maintaining judgment is genuinely different from someone using it as a substitute for judgment. The problem is the tools don't distinguish between these two users, and the marketing tends to speak to the latter group.
Users like the author must be the most valuable Claude asset, because AI itself isn't a product — people's feedback that shapes output is.
He’s a pretty interesting fella, I imagine his work influenced a lot of people over the years
Once you have learned enough from playing with sand castles, you can start over to build real castles with real bricks (and steel if you want to build skyscraper). Then it is your responsibility to make sure that they would not collapse when people move it.
But that isn't the hard part. The hard part is that some people are using the tool versions and some are using the agent versions, so consolidating them one way or another will break someone's workflow, and that incurs a real actual time cost, which means this is now a ticket that needs to be prioritized and scheduled instead of being done for free.
There's nothing wrong with saying that Claude Code is written shoddily. It definitely is. But I think it should come with the recognition that Anthropic achieved all of its goals despite this. That's pretty interesting, right? I'd love to be talking about that instead.
Saying "you can use any other agent, just pay 20x more through the API!" does not demonstrate a realistic choice.
So would I and a couple of others, but HNers don't want to have those kinds of conversations anymore.
want code that isn't shit? embrace a coding paradigm and stick to it without flip-flopping and sticking your toe into every pond, use a good vcs, and embrace modularity and decomposability.
the same rules when 'writing real code'.
9/10 times when I see an out-of-control vibe coded project it sorta-kinda started as OOP before sorta-kinda trying to be functional and so on. You can literally see the trends change mid-code. That would produce shit regardless of what mechanism used such methods, human/llm/alien/otherwise.
- Brooks' No Silver Bullet: no single technology or management technique will yield a 10-fold productivity improvement in software development within a decade. If we write a spec that details everything we want, we would write soemthing as specific as code. Currently people seem to believe that a lot of the fundamentals are well covered by existing code, so a vague lines of "build me XXX with YYY" can lead to amazing results because AI successfully transfers the world-class expertise of some engineers to generate code for such prompt, so most of the complex turns to be accidental, and we only need much fewer engineers to handle essential complexities.
- Kernighan's Law, which says debugging is twice as hard as writing the code in the first place. Now people are increasingly believing that AI can debug way faster than human (most likely because other smart people have done similar debugging already). And in the worst case, just ask AI to rewrite the code.
- Dijkstra on the foolishness of programming in natural language. Something along the line of which a system described in natural language becomes exponentially harder to manage as its size increases, whereas a system described in formal symbols grows linearly in complexity relative to its rules. Similar to above, people believe that the messiness of natural language is not a problem as long as we give detailed enough instructions to AI, while letting AI fills in the gaps with statistical "common sense", or expertise thereof.
- Lehman’s Law, which states that a system's complexity increases as it evolves, unless work is done to maintain or reduce it. Similar to above, people start to believe otherwise.
- And remotely Coase's Law, which argues that firms exist because the transaction costs of using the open market are often higher than the costs of directing that same work internally through a hierarchy. People start to believe that the cost of managing and aligning agents is so low that one-person companies that handle large number of transactions will appear.
Also, ultimately Jevons Paradox, as people worry that the advances in AI will strip out so much demand that the market will slash more jobs than it will generate. I think this is the ultimate worry of many software engineers. Luddites were rediculed, but they were really skilled craftsmen who spent years mastering the art of using those giant 18-pound shears. They were the staff engineers of the 19th-century textile world. Mastering those 18-pound shears wasn't just a job but an identity, a social status, and a decade-long investment in specialized skills. Yeah, Jevons Paradox may bring new jobs eventually, but it may not reduce the blood and tears of the ordinary people.
Intereting times.
I thought you were gonna go the opposite direction with this. Debugging is now 100x as hard as writing the code in the first place.
> Lehman’s Law, which states that as a system's complexity increases as it evolves, unless work is done to maintain or reduce it. Similar to above, people start to believe otherwise.
Gotta disagree with this too. I find a lot of work has to be done to be able to continue vibing, because complexity increases beyond LLM capabilities rapidly otherwise.
100x harder if a human were to debug AI-generated code. I was merely citing other people's beliefs: AI can largely, if not completely, take care of debugging. And "better", rewrite the code altogether. I don't see how that could be a better approach, but that might just be me.
I work on enterprise web apps for a few dozen people with Codex CLI and GPT-5.4, and haven't really run in to those issues.
> Brooks' No Silver Bullet
Just because a person can create code or "results" much faster now, it doesn't say anything about productivity. Don't mistake dev productivity for economic productivity.
> Kernighan's Law, which says debugging is twice as hard as writing the code
Debugging is such a vague term in these matters. An AI may be decent to figure out their error they introduced into their code after it runs its own tests. But a production bug, i.e. reported from a user, can be very hard for AIs due to their utter lack of context.
> Dijkstra on the foolishness of programming in natural language. > ... > Lehman’s Law, which states that as a system's complexity increases as it evolves, unless work is done to maintain or reduce it.
No clue what the argument is here, "people believe otherwise" isn't.
> Also, ultimately Jevons Paradox
Actually relevant tech people confirm the paradox in the long run. Companies slash jobs now because they tend consolidate in chaotic times.
> No Silver Bullet
As an industry, we do not know how to measure productivity. AI coding also does not increase reliability with how things are going. Same with simplicity, it's the opposite; we're adding obscene complexity, in the name of shipping features (the latter of which is not productivity).
In some areas I can see how AI doubles "productivity" (whatever that means!), but I do not see a 10x on the horizon.
> Kernighan's Law
Still holds! AI is amazing at debugging, but the vast majority of existing code is still human-written; so it'll have an easy time doing so, as indeed AI can be "twice as smart" as those human authors (in reality it's more like "twice as persistent/patient/knowledgeable/good at tool use/...").
Debugging fully AI-generated code with the same AI will fall into the same trap, subject to this law.
(As an aside, I do wonder how things will go once we're out of "use AI to understand human-generated content", to "use AI to understand AI-generated content"; it will probably work worse)
> just ask AI to rewrite the code
This is a terrible idea, unless perhaps there is an existing, exhaustive test harness. I'm sure people will go for this option, but I am convinced it will usually be the wrong approach (as it is today).
> Dijkstra on the foolishness of programming in natural language
So why are we not seeing repos of just natural language? Just raw prompt Markdown files? To generate computer code on-the-fly, perhaps even in any programming language we desire? And for the sake of it, assume LLMs could regenerate everything instantly at will.
For two reasons. The prompts would either need to raise to a level of precision as to be indistinguishable from a formal specification. And indeed, because complexity does become "exponentially harder"; inaccuracies inherent to human languages would compound. We need to persist results in formal languages still. It remains the ultimate arbiter. We're now just (much) better at generating large amounts of it.
> Lehman’s Law
This reminds me of a recent article [0]. Let AI run loose without genuine effort to curtail complexity and (with current tools and models) the project will need to be thrown out before long. It is a self-defeating strategy.
I think of this as the Peter principle applied to AI: it will happily keep generating more and more output, until it's "promoted" past its competence. At which point an LLM + tooling can no longer make sense of its own prior outputs. Advancements such as longer context windows just inflate the numbers (more understanding, but also more generating, ...).
The question is, will the market care? If software today goes wrong in 3% of cases, and with wide-spread AI use it'll be, say, 7%, will people care? Or will we just keep chugging along, happy with all the new, more featureful, but more faulty software? After all, we know about the Peter principle, but it's unavoidable and we're just happy to keep on.
> Jevons Paradox
My understanding is the exact opposite. We might well see a further proliferation of information technologies, into remaining sectors which have not yet been (economically) accessible.
This is THE question. I honestly think the majority will gladly take an imperfect app over waiting for a perfect app or perhaps having no app at all. Some devs might be able to stand out with a polished app taking the traditional approach but it takes a lot longer to achieve that and by that point the market may be different, which is a risk
"And in the worst case just pay for it twice."
That leads to a dead end.
The ship has sailed. Vibe coding works. It will only work better in the future.
I have been programming for decades now, I have managed teams of developers. Vibe coding is great, specially in the hands of experts that know what they are doing.
Deal with it because it is not going to stop. In the near future it will be local and 100x faster.
A pig with lipstick it's still a pig.
Or, aptly, as you quoted "Don Quixote":
'Con la iglesia hemos topado'.
(indeed Sancho), we just met the Church...
AI naysayers are heavily incentivized to find fault with it, but in my experience it's pretty rare to see a codebase of that size where it's not easy to pick out "bad code" examples.
Are there any relatively neutral parties who've evaluated the code and found it to be obviously junk?
I routinely write my own solutions in parallel to LLM-implemented features from varying degrees of thorough specs and the bloat has never been less than 2x my solution, and I have yet to find any bloat in there that would cover more ground in terms of reliability, robustness, and so on. The biggest bloat factor I've found so far was 6x of my implementation.
I don't know, it's hard to read your post and not feel like you're being a bit obtuse. You've been doing this enough to understand just how bad code gets when you vibecode, or even how much nonsense tends to get tacked onto a PR if someone generates from spec. Surely you can do better than an LLM when you write code yourself? If you can, I'm not sure why your question even needs to be asked.
I certainly wouldn't call Claude Code "trivial" - it's by far the most sophisticated TUI app I've ever interacted with. I can drag images onto it, it runs multiple sub-agents all updating their status rows at the same time, and even before the source code leaked I knew there was a ton of sophistication in terms of prompting under the hood because I'd intercepted the network traffic to see what it was doing.
If it was a million+ lines of code I'd be a little suspicious, but a few hundred thousand lines feels credible to me.
> Surely you can do better than an LLM when you write code yourself?
It takes me a solid day to write 100 lines of well designed, well tested code - and I'm pretty fast. Working with an LLM (and telling it what I want it to do) I can get that exact same level of quality in more like 30 minutes.
And because it's so much faster, the code I produce is better - because if I spot a small but tedious improvement I apply that improvement. Normally I would weigh that up against my other priorities and often choose not to do it.
So no, I can't do better that an LLM when I'm writing code by hand.
That said: I expect there are all sorts of crufty corners of Claude Code given the rate at which they've been shipping features and the intense competition in their space. I expect they've optimized for speed-of-shipping over quality-of-code, especially given their confidence that they can pay down technical debt fast in the future.
The fact that it works so well (I get occasional glitches but mostly I use it non-stop every day and it all works fine) tells me that the product is good quality, whether or not the lines of code underneath it are pristine.
I'll be honest, I think we just come to this from very different perspectives in that case. Agents are trivial, and I haven't seen anything in Claude Code that indicated to me that it was solving any hard problems, and certainly not solving problems in a particularly good way.
I create custom 3D engines from scratch for work and I honestly think those are pretty simple and straight forward; it's certainly not complicated and it's a lot simpler than people make it out to be... But if Claude Code is "not trivial", and even "sophisticated" I don't even know what to classify 3D engines as.
This is not some "Everything that's not what I do is probably super simple" rant, by the way. I've worked with distributed systems, web backend & frontend and more, and there are many non-trivial things in those sub-industries. I'm also aware of this bias towards thinking what other people do is trivial. The Claude Code TUI (and what it does as an agent) is not such a thing.
> So no, I can't do better that an LLM when I'm writing code by hand.
Again, I just think we come at this from very different positions in software development.
In the past, which is a different country, we would throw away the prototypes.
Nowadays vibe coding just keeps adding to them.
2024 - Utter Trash
2025 - Merely hotdog water
2026 - Aaaaaaaaaactually pretty good...
Every forward-leaning platform is building out an MCP interface, I think we're past the point of "soulless fad."
"I have been screaming at my computer this past week dealing with a library that was written by overpaid meatbags with no AI help."
And here we go: The famous "humans do it, too" argument. With the gratuitous "meatbag" propaganda.
Look Bram, if you work on bitcoin bullshit startups, perhaps AI is good enough for you. No one will care.
memory created!
So I set out to build an app with CC just to see what it's like. I currently use Copilot (copilot.money) to track my expenditures, but I've become enamored with sankey diagrams. Copilot doesn't have this charting feature, so I've been manually exporting all my transactions and massaging them in the sankey format. It's a pain in the butt, error prone, and my python skills are just not good enough to create a conversion script. So I had CC do it. After a few minutes of back and forth, it was working fine. I didn't care about spaghetti code at all.
So next I thought, how about having it generate the sankey diagrams (instead of me using sankeymatic's website). 30 minutes later, it had a local website running that was doing what I had been manually doing for months.
Now I was hooked. I started asking it to build a native GUI version (for macOS) and it dutifully cranked out a version using pyobjC etc. After ironing out a few bugs it was usable in less than 30 min. Feature adds consumed all my tokens for the day and the next day I was brimming with changes. Burned through that days tokens as well and after 3 days (I'm on the el cheapo plan), I have an app that basically does what I want in a reasonable attractive, and accurate manner.
I have no desire to look at the code. The size is relatively small, and resource usage is small as well. But it solved this one niche problem that I never had the time or skill to solve.
Is this a good thing? Will I be downvoted to oblivion? I don't know. I'm very very concerned about the long term impact of LLMs on society, technology and science. But it's very interesting to see the other side of what people are claiming.
LLM-driven develop lets me have the thing built without needing to build the thing, and at the same time I get to exercise some ways-to-build I don't use as often (management, spec writing, spec editing, proactive unblocking, etc.). I have no doubt my work with LLMs has strengthened mental muscles that are also be helpful in technical management contexts/senior+principal-level technical work.
Consider this, though: Your anecdote has nothing to do with software engineering (or an engineering mindset). No measurements were done, no technical aspects were taken into consideration (you readily admit that you lack the knowledge to do that), you're not expecting to maintain it or seemingly to further develop it much.
The above situation has never actually been hard; the thing you made is trivial to someone who knows the basics of a small set of things. LLMs (not Claude Code) have made this doable for someone who knows none of the things and that's very cool.
But all of this really doesn't mean anything for solutions to more complex problems where more knowledge is required, or solutions that don't even really exist yet, or something that people pay for, or things that are expected to be worked on continuously over time, perhaps by multiple people.
When people decry vibecoding as being moronic, the subtext really is (or should be) that they're not really talking to you; they're talking to people who are delivering things that people are expected to pay for, or rely on as part of their workflow, and people who otherwise act like their output/product is good when it's clearly a mess in terms of UX.
While I downplayed my job experience, I'm very in touch with developers and their workflows; the challenges they face. And I'm scared because they won't be making these decisions about LLM usage; their bosses, the guy who vibe coded a dumb app over the weekend will.
I completely agree that people are going to be forced into using things that basically do not really work for anything non-trivial without massive handholding, and they will be forced to use those things by people who are out of touch and are mostly setting up to eventually get rid of as many people as they possibly can.
There's really not much of a place for AI in my work. We're not cutting edge, we're just a large, safe business protected by a regulatory moat. We don't want to be on the cutting edge, since the bleeding is bad for profits and reputation. But the incentives our IT execs operate under is all about resume/credential building and moving on to bigger things. Our C level officers are not even slightly technical, so they defer to the CIO. Nothing new at all in this company, it's a story told a thousand times.
So I was just very curious how it would be to approach vibe coding as if I was my VP. You don't know what you don't know, right? And the ease at creating a simple app that would be beyond 99% of the people in my company gives way too much confidence. And with misplaced confidence comes poor decision-making.
I can see where someone who currently is an Excel jockey would benefit from some of this stuff. As long as they can compare and test the outputs. But the danger from false confidence has to be an institutional risk that's being ignored.
Bad code or good code is no longer relevant anymore. What matters is whether or not AI fulfills the contract as to how the application is supposed to work. If the code sucks, you just rerun the prompt again and the next iteration will be better. But better doesn't matter because humans aren't reading the code anymore. I haven't written a line of code since January and I've made very large scale improvements to the products I work on. I've even stopped looking at the code at all except a cursory look out of curiosity.
Worrying about how the sausage is made is a waste of time because that's how far AI has changed the game. Code doesn't matter anymore. Whether or not code is spaghetti is irrelevant. Cutting and pasting the same code over and over again is irrelevant. If it fulfills the contract, that's all that matters. If there's a bug, you update the contract and rerun it.
It's extremely relevant inasmuch as garbage code pollutes the AI's context and misleads it into writing more crap. "How the sausage is made" still matters.
I can off the top of my head think of at least three ways in which being careless with the code powering "your personal blog" could have real consequences. Suppose it has a bug which allows unauthenticated users to manage your pages, or even worse remote code execution. Then it could be used as a jumping-off point to attack other systems, for instance by turning it into a C&C server for some malware. It could be used in a "watering hole attack" against your readers. Or someone could edit the blog articles to make it appear that you said something you didn't.
"Not reading the code" is irresponsible for any software exposed to the global network.
In case of damages, vibe coding should be an aggravating circumstance, i.e. gross negligence.
When the use of a program cannot have any nefarious consequences, obviously vibe coding is fine. However, I do not use many such applications.
creating a product in a span of mere months that millions of developers use everday is opposite of ridiculous. we wouldn't even have known about the supposed ridiculousness of code if it hadnt leaked.
This is painful to read. It feels like rant from person who does not use version control, testing and CI.
It is cruel to force machine into guessing game with a todler whose spec is "I do not like it". If you have a coding standarts and preferences, they should be already destiled and exlained somewhere, and applied automatically (like auto linter in not so old days). Good start is to find OS projects you like, let claude review it, and generate code rule. Than run it on your code base over night, until it passes tests and new coding standarts automated code review.
The "vibe coding" is you run several agants in parallel, sometimes multiple agents on the same problem with different approach, and just do coding reviews. It is mistake to have a synchronous conversation with a machine!
This type of works needs severe automation and parallelisation.
This can be easily automated away!
People were given faster typers with incredible search capabilities and decided quality doesn’t matter anymore.
I don’t even mean the code. The product quality is noticeably sub par with so many vibe-coded projects.
I get that people love saying LLMs are just compilers from human language to $OUTPUT_FORMAT but... they simply are not except in a stretchy metaphorical sense.
That's only true if you reduce the definition of "compiler" to a narrow `f = In -> Out`. But that is _not_ a compiler. We have a word for that: function. And in LLM's case an impure one.
I dislike arguing semantics but I bet it's not an abstraction by most engineers' definition of the word.
A fundamentally unreliable one: even an AI system that is entirely correctly implemented as far as any human can see can yield wrong answers and nobody can tell why.
That’s not entirely the fault of the technology, as natural language just doesn’t make for reliable specs, especially in inexperienced hands, so in a sense we finally got the natural-language that some among our ancestors dreamed of and it turned out to be as unreliable as some others of our ancestors said all along.
It partly is the fault of the technology, however, because while you can level all the same complaints against a human programmer, a (motivated) human will generally be much better at learning from their mistakes than the current generation of LLM-based systems.
(This even if we ignore other issues, such as the fact that it leaves everybody entirely reliant on the continued support and willingness to transact of a handful of vendors in a market with a very high barrier to entry.)
If I‘m purely focused on the general outcome as written in a requirement or specification document, I‘d consider everything below that as „abstracted away“.
For example, this weekend I built my own MCP server for some services I‘m hosting on my personal server (*arr, Jellyfin, …) to be integrated with claude.ai. I‘ve written down all the things I want it to do, the environment it has to work in and let Claude go.
Not once have I looked at the code. And quite frankly, I don‘t care. As long as it fulfills my general requirements, it can write Python one time and TypeScript the other time should I choose to regenerate from that document. It might behave slightly differently but that is ok to a degree.
From my perspective, that is an abstraction. Deterministic? No, but it also doesn‘t have to be.
I agree it's not a layer of abstraction in the traditional sense though. AI isn't an abstraction of existing code, it's a new way to produce code. It's an "abstraction layer" in the same way an IDE is is an abstraction layer.
Actually yes, because Humans can be held accountable for the code they produce
Holding humans accountable for code that LLMs produce would be entirely unreasonable
And no, shifting the full burden of responsibility to the human reviewing the LLM output is not reasonable either
Edit: I'm of the opinion that businesses are going to start trying to use LLMs as accountability sinks. It's no different than the driver who blames Google Maps when they drive into a river following its directions. Humans love to blame their tools.
Why? LLMs have no will nor agency of their own, they can only generate code when triggered. This means that either nature triggered them, or people did. So there isn't a need to shift burdens around, it's already on the user, or, depending on the case, whoever forced such user to use LLMs.
Producing outputs you don’t understand is novel
Set up an AI bot to analyze the code for spaghetti code parts and clean up these parts to turn it into a marvel. :-)