I'm not (just) being glib. That earlier article displays some introspection and thoughtful consideration of an old debate. The writing style is clearly personal, human.
Today's post is not so much. It has LLM fingerprints on it. It's longer, there are more words. But it doesn't strike me as having the same thoughtful consideration in it. I would venture to guess that the author tried to come up with some new angles on the news of the Claude Code leak, because it's a hot topic, and jotted some notes, and then let an LLM flesh it out.
Writing styles of course change over time, but looking at these two posts side by side, the difference is stark.
I made a commitment to write more this year and put my thoughts out quicker than I used to, so that’s likely the primary reason it’s not as deep of a piece of writing as the post you’re referencing. But I do want to note that this wasn’t written using AI, it just wasn’t intended to be as rich of a post.
The reason it came out longer is that I’ve honestly been thinking about these ideas for a while, and there is so much to say about this subject. I didn’t have any particular intention of hopping on a news cycle, but once I started writing the juices were flowing and I found myself coming up with five separate but interrelated thoughts around this story that I thought were worth sharing.
If you have a strategy for jotting down (or dictating) notes while walking about, I would be curious how you manage that. I spend plenty of time walking outside, and tend to get (at the time) ideas that I'd like to explore further, most of which have evaporated from my mind by the time I get back home. Or even before I can get my phone out to jot down the keywords to help me recall the details later.
Cannot even imagine how someone would manage both walking and writing at the same time.
Some are just born with it.
What is interesting and has possibly bled over from heavy LLM use by the author is the style of simplistic bullet point titles for the argument with filler in between. It does read like they wrote the 5 bullet points then added the other text (by hand).
But who knows!
We're starting to become wary due to the abuse of AI and proliferation of sloppy content, but also because we often have trouble distinguishing authentic from sloppy content.
Another feature of this AI era that I hate.
There's even a GUI called claudia for a piecemeal extraction with a PRD.
https://github.com/kristopolous/Claudette
I've got a web, rust and tkinter version (for fun) being coded up right now just making sure this approach works.
The answer is.... Mostly...
Enjoy
Code doesn't matter IN THE EARLY DAYS.
This is similar to what I've observed over 25 years in the industry. In a startup, the code doesn't really matter; the market fit does.
But as time goes on your codebase has to mature, or else you end up using more and more resources on maintenance rather than innovation.
In less than four years the AI coding workflow has been overhauled at least twice: from Chat interface (ChatGPT) to editor integration (Cursor), then to CLI agent harnesses (CC/Codex). It would be crazy to assume that harnesses are the end of evolution.
Claude Code 3.0 (and other agent tools) are not expected to be mature. They'll all be obsolete in two or three years, replaced by the next generation of AI tools. Everyone knows that.
And so on and on and on.
A promise of AI was mature software
But you can use AI to improve your codebase too. Plus models are only going to get smarter from here (or stay the same).
Training models on AI generated content leads to model collapse so they hardly become smarter if more and more code is from AI
But now everything is, "ship as fast as is humanly possible, literally" from management, and "garbage Claude-written PRs" from devs. Trying to maintain sanity over my monorepo is impossible.
We have nearly a century of examples of "somebody who only mostly understands making a breaking change" and decided, "what the hell, this thing is called Claude, so it can wreak havoc for as long as corporate decides"
If dealing with a functionality that is splittable into microfeatures/microservices, then anything that you need right now can potentially be vibe-coded, even on the fly (and deleted afterwards). Single-use code.
>But as time goes on your codebase has to mature, or else you end up using more and more resources on maintenance rather than innovation.
tremendous resource sink in enterprise software. Solving it, even if making it just avoidable - may be Anthropic goes that way and leads the others - would be a huge revolution.
Seems like the phrase "clean room" is the new "nonplussed"... how does this make any sense?
[^1]: https://bsky.app/profile/mergesort.me/post/3mihhaliils2y
Then use Anthropic's own argument that LLM output is original work and thus not subject to copyright.
Does this still count as clean-room? Or what if the model wasn't the same exact one, but one trained the same way on the same input material, which Anthropic never owned?
This is going to be a decade of very interesting, and probably often hypocritical lawsuits.
if one person writes the spec from the implementation, and then also writes the new implementation, it is not clean-room design.
There are other details of course (is the old code in the training data?) but I'm not trying to weigh in on the argument one way or the other.
The product hasn't been around long enough to decide whether such an approach is "sustainable". It is currently in a hype state and needs more time for that hype to die down and the true value to show up, as well as to see whether it becomes the 9th circle of hell to keep in working order.
I have come to the conclusion that we just do not know yet. There is a part of me that believes there is a point somewhere on the grand scale where the code quality genuinely does not matter if the outcome is reliably and deterministically achieved. (As an image, I like to think of Wall—E literally compressing garbage into a cube shape.)
This would ignore maintenance costs (time and effort inclusive.) Those matter to an established user base (people do not love change in my experience, even if it solves the problem better.)
On the other hand, maybe software is meant to be highly personal and not widely general. For instance, I have had more fun in the past two years than the entire 15 years of coding before it, simply building small custom-fitted tools for exactly what I need. I aimed to please an audience of one. I have also done this for others. Code quality has not mattered all that much, if at all. It will be interesting to see where things go.
Non-trivial things tend to be much more sensitive to code quality in my experience, and will by necessity be kept around for longer and thus be much more sensitive to maintenance issues.
I hear this narrative being pushed quite a bit, and it makes my spidey senses tingle every time. Secure programs are a subset of correct programs, and to write and maintain correct programs you need to have a quality mindset.
A 0-day doesn't care if it's in a part of your computer you consider trivial or not.
Mind you, I'm not using LLMs for professional programming since I prefer knowing everything inside and out in the code that I work on, but I have tried a bunch of different modes of use (spec-driven + entire implementation by Opus 4.6, latest Codex and Composer 2, and entirely "vibecoded", as well as minor changes) and can say that for trivial in-house things it's actually usable.
Do I prefer to rewrite it entirely manually if I want something that I actually like? Yes. Do I think that not everything needs to be treated that way if you just want an initial version you can tinker with? Also yes.
It's creators clearly care not for the efficiency of how it is built, which translates directly into how it runs.
This blog post is effectively being apologetic about the fact that this is alright, since at least they got product market fit. Except Anthropic is never going to go back and clean up the mess once (if) they become profitable.
I doubt anyone will like how things will be in 5 years time if this trend of releasing badly engineered spaghetti continues.
Sure, the weights are where the real value lives, but if the quality is so lax they leak their whole codebase, maybe they are just lucky they didn’t leak customer data or the model weights? If that did happen, the entire business might evaporate overnight.
Seems wrong. Devs will whine, moan and nitpick about even free software but they can understand failure modes, navigate around bugs and file issues on GitHub. The quality bar is 10-100x amongst non-techno-savvy folks and enterprise users that are paying for your software. They’re far more “picky”.
Wut? The value in the ecosystem is the model. Harnesses are simple. Great models work nearly identically in every harness
I tried to build my own harness once. The amount of work that is required is incredible. From how external memory is managed per session to the techniques to save on the context window, for example, you do not want the llm to read in whole files, instead you give it the capability to read chunks from offsets, but then what should stay in context and what should be pruned.
After that you have to start designing the think - plan - generate - evaluation pipeline. A learning moment for me here was to split up when the llm is evaluating the work, because same lllm who did the work should not evaluate itself, it introduces a bias. Then you realize you need subagents too and statt wondering how their context will be handled (maybe return a summarized version to the main llm?).
And then you have to start thinking about integration with mcp servers and how the llm should invoke things like tools, prompts and resources from each mcp. I learned llms, especially the smaller ones tend to hiccup and return malformed json format.
At some point I started wondering about just throwing everything and just look at PydanticAi or Langchain or Langgraph or Microsoft Autogen to operate everything between the llm and mcps. Its quite difficult to make something like this work well, especially for long horizontal tasks.
I agree that good models have more value because a harness can't magically make a bad model good, but there's a lot that would be inordinately difficult without a proper harness.
Keeping models on rails is still important, if not essential. Great models might behave similarly in the same harness, but I suppose the value prop is that they wouldn't behave as well on the same task without a good harness.
It is not everyone’s experience that models work the same in every harness.
1. The code is garbage and this means the end of software.
Now try maintaining it.
2. Code doesn’t matter (the same point restated).
No, we shouldn’t accept garbage code that breaks e.g. login as an acceptable cost of business.
3. It’s about product market fit.
OK, but what happens after product market fit when your code is hot garbage that nobody understands?
4. Anthropic can’t defend the copyright of their leaked code.
This I agree with and they are hoist by their own petard. Would anyone want the garbage though?
5. This leak doesn’t matter
I agree with the author but for different reasons - the value is the models, which are incredibly expensive to train, not the badly written scaffold surrounding it.
We also should not mistake current market value for use value.
Unlike the author who seems to have fully signed up for the LLM hype train I don’t see this as meaning code is dead, it’s an illustration of where fully relying on generative AI will take you - to a garbage unmaintainable mess which must be a nightmare to work with for humans or LLMs.
I feel the author is just stating the obvious: code quality has very little to do with whether a product succeeds
Seriously, if Anthropic were like oAI and let you use their subscription plans with any agent harness, how many users would CC instantly start bleeding? They're #39 in terminal bench and they get beaten by a harness that provides a single tool: tmux. You can literally get better results by giving Opus 4.6 only a tmux session and having it do everything with bash commands.
It seems premature to make sweeping claims about code quality, especially since the main reason to desire a well architected codebase is for development over the long haul.
Yes, exactly. Products.
It seems like me and all the engineers I've known always have this established dichotomy: engineers, who want to write good code and to think a lot about user needs, and project managers/ executives/sales people, who want to make the non-negative numbers on accounting documents larger.
The truth is that to write "good software," you do need to take care, review code, not single-shot vibe code and not let LLMs run rampant. The other truth is that good software is not necessary good product; the converse is also true: bad product doesn't necessarily mean bad software. However there's not really a correlation, as this article points out: terrible software can be great product! In fact if writing terrible software lets you shit out more features, more quickly, you'll probably come ahead in business world than someone carefully writing good software but releasing more slowly. That's because the priorities and incentives in business world are often in contradiction to priorities and incentives in human world.
I think this is hard to grasp for those of us who have been taught our whole lives that money is a good scorekeeper for quality and efficacy. In reality it's absolutely not. Money is Disney bucks recording who's doing Disney World in the most optimal way. Outside of Disney World, your optimal in-park behavior is often suboptimal for out-of-park needs. The problem is we've mistaken Disney World for all of reality, or, let Walt Disney enclose our globe within the boundaries of his park.
> The object which labor produces confronts it as something alien, as a power independent of the producer.