Naur (https://gwern.net/doc/cs/algorithm/1985-naur.pdf) called it "theory building":
> The death of a program happens when the programmer team possessing its theory is dissolved. A dead program may continue to be used for execution in a computer and to produce useful results. The actual state of death becomes visible when demands for modifications of the program cannot be intelligently answered. Revival of a program is the rebuilding of its theory by a new programmer team.
Lamport calls it "programming ≠ coding", where programming is "what you want to achieve and how" and coding is telling the computer how to do it.
I strongly agree with all of this. Even if your dev team skipped any kind of theory-building or modelling phase, they'd still passively absorb some of the model while typing the code into the computer. I think that it's this last resort of incidental model building that the LLM replaces.
I suspect that there is a strong correlation between programmers who don't think that there needs to be a model/theory, and those who are reporting that LLMs are speeding them up.
I have some anecdotal evidence that suggests that we can accomplish far more value-add on software projects when completely away from the computer and any related technology.
It's amazing how fast the code goes when you know exactly what you want. At this point the LLM can become very useful because its hallucinations instantly flag in your perspective. If you don't know what you want, I don't see how this works.
I really never understood the rationale of staring at the technological equivalent of a blank canvas for hours a day. The LLM might shake you loose and get you going in the right direction, but I find it much more likely to draw me into a wild goose chase.
The last 10/10 difficulty problem I solved probably happened in my kitchen while I was chopping some onions.
To quote Russ Ackoff[1]:
> Improving a system requires knowing what you could do if you could do whatever you wanted to. Because if you don't know what you would do if you could do whatever you wanted to, how on earth are you going to know what you can do under constraints?
This isn't true in programming or real-world tasks where you are trying to accomplish some external objective.
If I could move my Rook there it’s a win, is there any way I can make that happen? How about if I sacrifice my knight etc.
Let me guess, you had tears in your eyes when you found the solution?
It's amazing how fast the code goes when you know exactly what you want"
Yeah its the same reason why demand for pen and paper still exists. Its the absolute best way for one to think and get their thoughts out. I can personally attest to this - no digital whiteboard can ever compete with just a pen and paper. My best and original ideas come from a blank paper and a pen.
Solutions can emerge from anywhere. But its likely to happen when the mind is focused in a calm state - thats why walking for instance is great.
Goes back to Fred Brooks' Mythical Man-Month: Start with understanding the requirements; then design the architecture. Only after that, begin programming.
Were you the one who developed TOR?
And some code are solving additional complexities not essential ones (like making it POSIX instead of using bashisms). In this case, it’s just familiarity with the tools that can help you to derive alternatives approaches.
It's like you don't know how to ski and you're going down a really steep hill...now with AI, imagine that really steep hill is iced over.
With AI this loop is much easier. It is cheap to even build 3 parallel implementations of something and maybe another where you let the system add whatever capability it thinks would be interesting. You can compare and use that to build much stronger "theory of the program" with requirements, where the separation of concerns are, how to integrate with the larger system. Then having AI build that, with close review of the output (which takes much less time if you know roughly what should be being built) works really well.
That only works for certain type of simpler products (mostly one-man projects, things like web apps) - you're not going to be building a throw-away prototype, either by hand or using AI, of something more complex like your company's core operating systems, or an industrial control system.
When it comes to general software development for customers in the everyday world (phones, computers, web). I often write once for proof, iterate as product requirements becomes clearer/refined, rewrite if necessary (code smell, initial pattern was inefficient for the final outcome).
On a large project, often I’ll touch something I wrote a year ago and realize I’ve evolved the pattern or learned something new in the language/system and I’ll do a little refactor while I’m in there. Even if it’s just code organization for readability.
I do this, too. And it makes me awful at generating "preliminary LOEs", because I can't tell how long something will take until I get in there and experiment a little.
Self created or formalized methods work, but they have to have habits or practices in place that prevent disengagement and complacency.
With LLMs there is the problem with humans and automation bias, which effects almost all human endeavors.
Unfortunately that will become more problematic as tools improve, so make sure to stay engaged and skeptical, which is the only successful strategy I have found with support from fields like human factors research.
NASA and the FAA are good sources for information if you want to develop your own.
Maybe I am more of a Leet coder than I think?
The primary reason is because what you are rapidly refactoring in these early prototypes/revisions are the meta structure and the contacts.
Before AI the cost of putting tests on from the beginning or TTD slowed your iteration speed dramatically.
In the early prototypes what you are figuring out is the actual shape of the problem and what the best division of responsibilities and how to fit them together to fit the vision for how the code will be required to evolve.
Now with AI, you can let the AI build test harnesses at little velocity cost, but TDD is still not the general approach.
Like any framework they all have costs,benefits, and places they work and others that they don’t.
Unless taking time to figure out what your inputs and expected outputs, the schools of thought that targeted writing all tests and even implement detail tests I would agree with you.
If you focus on writing inputs vs outputs, especially during a spike, I need to take prompt engineering classes from you
Yep, and I believe that one will be harder to overcome.
Nudging an LLM into the right direction of debugging is a very different skill from debugging a problem yourself, and the better the LLMs get, the harder it will be to consciously switch between these two modes.
So I look up at the token usage, see that it cost 47 cents, and just `git reset --hard`, and try again with an improved prompt. If I had hand-written that code, it would have been much harder to do.
In my experience this is a bad workflow. "Build it crappy and fast" is how you wind up with crappy code in production because your manager sees you have something working fast and thinks it is good enough
The question is, will the ability of LLMs to whip out boilerplate code cause managers to be more willing to rebuild currently "working" code into something better, now that the problem is better understood than when the first pass was made? I could believe it, but it's not obvious to me that this is so.
That's insightful how you connected the "comprehension debt" of LLM-generated code with the idea of programming as theory building.
I think this goes deeper than the activity of programming, and applies in general to the process of thinking and understanding.
LLM-generated content - writing and visual art also - is equivalent to the code, it's what people see on the surface as the end result. But unless a person is engaged in the production, to build the theory of what it means and how it works, to go through the details and bring it all into a whole, there is only superficial understanding.
Even when LLMs evolve to become more sophisticated so that it can perform this "theory building" by itself, what use is such artificial understanding without a human being in the loop? Well, it could be very useful and valuable, but eventually people may start losing the skill of understanding when it's more convenient to let the machine do the thinking.
I also strongly agree with Lamport, but I'm curious why you don't think Ai can help in the "theory building" process, both for the original team, and a team taking over a project? I.e., understanding a code base, the algorithms, etc.? I agree this doesn't replace all the knowledge, but it can bridge a gap.
In other words, they can help you identify what fairly isolated pieces of code are doing. That's helpful, but it's also the single easiest part of understanding legacy code. The real challenges are things like identifying and mapping out any instances of temporal coupling, understanding implicit business rules, and inferring undocumented contracts and invariants. And LLM coding assistants are still pretty shit at those tasks.
You could paste your entire repo into Gemini and it could map your forest and also identify the "trees".
Assuming your codebase is smaller than Gemini context window. Sometimes it makes sense to upload a package,s code into Gemini and have it summarize and identify key ideas and function. Then repeat this for every package in the repository.then combine the results . It sounds tedious but it is a rather small python program that does this for me.
Concrete example, last week a colleague of mine used a tool like this to help with a code & architectural review of a feature whose implementation spanned four repositories with components written in four different programming languages. As I was working my way through the review, I found multiple instances where the information provided by the LLM missed important details, and that really undermined the value of the code review. I went ahead and did it the old fashioned way, and yes it took me a few hours but also I found four defects and failure modes we previously didn't know about.
The client loved him, for obvious reasons, but it's hard to wrap my head around such an approach to software construction.
Another time, I almost took on a gig, but when I took one look at the code I was supposed to take over, I bailed. Probably a decade would still not be sufficient for untangling and cleaning up the code.
True vibe coding is the worst thing. It may be suitable for one-ff shell script s of < 100 line utilities and such, anything more than that and you are simple asking for trouble
The big problem is that LLMs do not *understand* the code you tell them to "explain". They just take probabilistic guesses about both function and design.
Even if "that's how humans do it too", this is only the first part of building an understanding of the code. You still need to verify the guess.
There's a few limitations using LLMs for such first-guessing: In humans, the built up understanding feeds back into the guessing, as you understand the codebase more, you can intuit function and design better. You start to know patterns and conventions. The LLM will always guess from zero understanding, relying only on the averaged out training data.
A following effect is that which bunderbunder points out in their reply: while LLMs are good at identifying algorithms, mere pattern recognition, they are exceptionally bad at world-modelling the surrounding environment the program was written in and the high level goals it was meant to accomplish. Especially for any information obtained outside the code. A human can run a git-blame and ask what team the original author was on, an LLM cannot and will not.
This makes them less useful for the task. Especially in any case where you intent to write new code; Sure, it's great that the LLM can give basic explanations about a programming language or framework you don't know, but if you're going to be writing code in it, you'd be better off taking the opportunity to learn it.
To clarify my question: Based on my experience (I'm a VP for a software department), LLMs can be useful to help a team build a theory. It isn't, in and of itself, enough to build that theory: that requires hands-on practice. But it seems to greatly accelerate the process.
But people still think and reason just fine, but now at a higher level that gives them greater power and leverage.
Do you feel like you're missing something when you "cook for yourself" but you didn't you didn't plant and harvest the vegetables, raise and butcher the protein, forge the oven, or generate the gas or electricity that heats it?
You also didn’t write the CPU microcode or the compiler that turns your code into machine language.
When you cook or code, you're already operating on top of a very tall stack of abstractions.
Sure, manager-types will generally be pleased when they ask AI for some vanilla app. But when it doesn't work, who will show up to make it right? When they need something more complex, will they even know how to ask for it?
It's the savages praying to Vol, the stone idol that decides everything for them, and they've forgotten their ancestors built it and it's just a machine.
Now, we're driving such things with AI; it follows that we will see better results if we do some of the work climbing down into the supporting abstractions to make their interface more suitable for AI use. To extend your cooking metaphor, it's time to figure out the manufactured food megafactory now; yes, we're still "cooking" in there, but you might not recognize the spatulas.
Things like language servers (LSPs) are a step in this direction: making it possible to interact with the language's parser/linter/etc before compile/runtime. I think we'll eventually see that some programming languages end up being more apropos to efficiently get working, logically organized code out of an AI; whether that is languages with "only one way to do things" and extremely robust and strict typing, or something more like a Lisp with infinite flexibility where you can make your own DSLs etc remains to be seen.
Frameworks will also evolve to be AI-friendly with more tooling akin to an LSP that allows an MCP-style interaction from the agent with the codebase to reason about it. And, ultimately, whatever is used the most and has the most examples for training will probably win...
I really like this definition of "life" and "death" of programs, quite elegant!
I've noticed that I struggle the most when I'm not sure what the program is supposed to do; if I understand this, the details of how it does it become more tractable.
The worry is that LLMs make it easier to just write and modify code without truly "reviving" the program... And even worse, they can create programs that are born dead.
I was once on a project where all the original developers suddenly disappeared and it was taken over by a new team. All institutional knowledge had been lost.
We spent a ridiculous amount of time trying to figure out the original design. Introduced quite a few bugs until it was better understood. But also fixed a lot of design issues after a much head bashing.
By the end, it had been mostly rewritten and extended to do things not originally planned.
But the process was painful.
Point is LLM's makes this problem 1000 times worse and so it really is a ticking time bomb thats totally new - most people, most programmers, most day to day work will not include some head in the clouds abstract metaprogramming but now LLM's both force programmers to "do more" and constantly destroys anyones flow state, memory, and the 99% of the talent and skill that comes from actually writing good code for hours a day.
LLM's are amazing but they also totally suck because they essentially steal learning potential, focus and increase work pressure and complexity, and this really is new, because also senior programmers are affected by this, and you really will feel this at some point after using these systems for a while.
They make you kind of demented, and no you can't fight this with personal development and forced book reading after getting up at 4 am in the morning just as with scrolling and the decrease in everyones focus, even bibliophiles.
At least at this point, LLMs are great at the "how", but are often missing context for the "what" and "why" (whether that's because it's often not written down or not as prevalent in their training data).
Additionally and IMO critically to this discussion: Its easy for products or features to "die" not when the engineers associated with it lose coherence on how it is implemented from a technical perspective, but also when the product people associated with it lose coherence on why it exists or who it exists for. The product can die even if one party (e.g. engineers) still maintains coherence while the other party (e.g. product/business) does not. At this point you've hit a state where the system cannot be maintained or worked on because everyone is too afraid of breaking an existing workflow.
LLMs are, like, barely 3% of the way toward solving the hardest problems I and my coworkers deal with day-to-day. But the bigger problem is that I don't yet know which 3% it is. Actually, the biggest problem is maybe that its a different, dynamic 3% of every new problem.
Once they're gone or no longer applying pressure, the strain is relieved, and we can shift to a more natural operation, application, or programming model.
For this reason, it helps to set expectations that people are cycled through teams at slow intervals - stable enough to build rapport, expertise, and goodwill, but transient enough to avoid stalls based on shared assumptions.
What I'm finding with LLMs is that, if you follow good modularization principles and practices, then LLMs actually make it easier to start working on a codebase you don't know very well yet, because they can help you a lot in navigating, understanding "as much as you need", and do specific changes. But that's not something that LLMs do on their own, at least from my own experience - you still need a human to enforce good, consistent modularization.
Yes, I think point is, LLM's are making it 'a-lot' worse.
And then compounding that will be in 10 years when no Senior Devs were being created, so nobody will be around to fix it. Extreme of course, there will be dev's, they'll just be under-water, piled on with trying to debug the LLM stuff.
So in that theory the senior devs of those days will still be able to command large salaries if they know their stuff, in specific how to untangle the mess of LLM code.
If and when technical debt becomes a paralyzing problem, we'll come up with solutions. Probably agents with far better refactoring skills than we currently have (most are kind of bad at refactoring right now). What's crazy to me is how tolerant the consumer has become. We barely even blink when a program crashes. A successful AAA game these days is one that only crashes every couple hours.
I could show you a Java project from 20+ years ago and you'd have no idea wtf is going on, let alone why every object has 6 interfaces. Hey, why write SQL (a declarative, somewhat functional language, which you'd think would be in fashion today!), when you could instead write reams of Hibernate XML?! We've set the bar pretty low for AI slop.
Even were I to store the prompts & model parameters, I suspect that I wouldn't get an exact duplicate of the code running the LLM again.
Also, as soon as a human is involved in implementation, it becomes less clear. You often won't be able to assume intent correctly. There will also be long lived bugs, pointer references that are off, etc.
I concede that the opacity and inconsistency of LLMs is a big (and necessary) downside though for sure.
My hope is that people keep the dialogue going because you may be right about the feeling of LLMs speeding things up. It could likely be because people are not going through the proper processes including planning and review. That will create mountains of future work; bugs, tech debt, and simply learning. All of which still could benefit from AI tools of course. AI is a very helpful tool, but it does require responsibility.
In the medium to longer term, we might be in a situation where only the most powerful next-generation AI models are able to make sense of giant vibe-coded balls of spaghetti and mud we're about to saddle ourselves with.
You can't just replace your whole coding team and think you can proceed at the same development pace. Even if the code is relatively good and the new developers relatively skilled. Especially if you lack "architecture model" level docs.
But yeah LLM's push it to like an absurd level. What if all your coders were autistic savant toddlers who get changed out for a new team of toddlers every month.
I could ask questions about how things were done, have it theorize about why, etc.
Obviously it's not perfect, but that's fine, humans aren't perfect either.
Strongly agree with your comment. I wonder now if this "theory building" can have a grammar, and be expressed in code; be versioned, etc. Sort of like a 5th-generation language (the 4th-generation being the SQL-likes where you let the execution plan be chosen by the runtime).
The closest I can think of:
* UML
* Functional analysis (ie structured text about various stakeholders)
* Database schemas
* Diagrams
Brownfield tasks are harder for the LLM at least in part because it’s harder to retroactively explain regular structure in a way the LLM understands and can serialize into eg CLAUDE.md.
As in linear programming or dynamic programming.
> I suspect that there is a strong correlation between programmers who don't think that there needs to be a model/theory, and those who are reporting that LLMs are speeding them up.
This is an interesting prediction. I think you'll get a correlation regardless of the underlying cause because most programmers don't think there needs to be a model/theory and most programmers report LLMs speeding them up.
But if you control for that, there are also some reasons you might expect the opposite to be true. It could be that programmers who feel the least sped up by LLMs are the ones who feel their primary contributing is in writing code rather than having the correct model. And people who view their job as finding the right model are more sped up because the busy work of getting the code in the right order is taken off their plate.
Ah rationalism vs empiricism again
Kant up in heaven laughing his ass off
Yeah but we can ask an LLM to read the code and write documentation, if that happens.
Also, no "small" program is ever at risk of dying in the sense that Naur describes it. Worst case, you can re-read the code. The problem lies with the giant enterprise code bases of the 60s and 70s where thousands of people have worked on it over the years. Even if you did have good documentation, it would be hundreds of pages and reading it might be more work than just reading the code.
Hopefully we can iterate and get the system producing useful documents automagically but my worry is that it will not generalise across different system and as a result we will have invested a huge amount of effort into creating "AI" generated docs for our system that could have been better spent just having humans write the docs.
But sure let's just have it generate docs, that's gonna work great.
Was some thread on here the other day, where someone said they routinely give Claude many paragraphs specifying what the code should and shouldn't do. Take 20 minutes just to type it up.
I mean even if that did work you still gotta read the docs to roughly the same degree as you would have had to read the code and you have to read the code to work with it anyway.
I'd see it like transcribing a piece of music where an LLM, or an uninformed human, would write down "this is a sequence of notes that follow a repetitive pattern across multiple distinct blocks. The first block has the lyrics X, Y ...", but a human would say "this is a pop song about Z, you might listen to it when you're feeling upset."
An LLM is not capable of subtext or reading between the lines or understanding intention or capability or sarcasm or other linguistic traits that apply a layer of unspoken context to what is actually spoken. Unless it matches a pattern.
It has one set of words, provided by you, and another set of words, provided by its model. You will get the bang average response every single time and mentally fill in the gaps yourself to make it work.
But "Teams that care about quality will take the time to review and understand LLM-generated code" is already failing. Sounds nice to say, but you can't review code being generated faster than you can read it. You either become a bottleneck (defeats the point) or you rubber-stamp it (creates the debt). Pick your poison.
Everyone's trying to bolt review processes onto this. That's the wrong layer. That's how you'd coach a junior dev, who learns. AI doesn't learn. You'll be arguing about the same 7 issues forever.
These things are context-hungry but most people give them nothing. "Write a function that fixes my problem" doesn't work, surprise surprise.
We need different primitives. Not "read everything the LLM wrote very carefully" ways to feed it the why, the motivation, the discussion and prior art. Otherwise yeah, we're building a mountain of code nobody understands.
Gemini and Claude at least seem to work well with it, but sometimes still make mistakes (e.g. not using c++ auto is a recurrent thing, even though the context markdown file clearly states not to). I think as the models improve and get better at instruction handling it will get better.
Not saying this is "the solution" but it gets some of the way.
I think we need to move away from "vibe coding", to more caring about the general structure and interaction of units of code ourselves, and leave the AI to just handle filling in the raw syntax and typing the characters for us. This is still a HUGE productivity uplift, but as an engineer you are still calling the shots on a function by function, unit by unit level of detail. Feels like a happy medium.
You might know this, but telling the LLM what to do instead of what not to do generally works better, or so I heard.
Same thing with syntax - so far we've been optimizing for humans, and humans work best at a certain level of terseness and context-dependent implicitness (when things get too verbose, it's visually difficult to parse), even at the cost of some ambiguity. But for LLMs verbosity can well be a good thing to keep the model grounded, so perhaps stuff like e.g. type inference, even for locals, is a misfeature in this context. In fact, I wonder if we'd get better results if we forced the models to e.g. spell out the type of each expression in full, maybe even outright stuff like method chains and require each call result to be bound to some variable (thus forcing LM to give it a name, effectively making a note on what it thinks it's doing).
Literate programming also feels like it should fit in here somewhere...
So, basically, a language that would be optimized specifically for LLMs to write, and for humans to read and correct.
Going beyond the language itself, there's also a question of ecosystem stability. Things that work today should continue to work tomorrow. This includes not just the language, but all the popular libraries.
And what are we doing instead? We're having them write Python and JavaScript, of all things. One language famous for its extreme dynamism, with a poorly bolted on static type system; another also like that, but also notorious for its footguns and package churn.
I think there's more juice to squeeze there. A lot of what we're going to learn is how to pick the right altitude of engagement with AI, I think.
It's better if the bottleneck is just reviewing, instead of both coding and reviewing, right?
We've developed plenty of tools for this (linting, fuzzing, testing, etc). I think what's going on is people who are bad at architecting entire projects and quickly reading/analyzing code are having to get much better at that and they're complaining. I personally enjoy that kind of work. They'll adapt, it's not that hard.
The problem is that LLM-driven changes require this adversarial review on every line, because you don't know the intent. Human changes have a coherence to them that speeds up review.
(And you your company culture is line-by-line review of every PR, regardless of complexity ... congratulations, I think? But that's wildly out of the norm.)
Not really. There's something very "generic" about LLM generated code that makes you just want gloss over it, no matter how hard you try not to.
It still takes a lot of thought and effort up front to put that together and I'm not quite sure where the breakover line between easier to do-it-myself and hand-off-to-llm is.
The correct primitives are the tests. Ensure your model is writing tests as you go, and make sure you review the tests, which should be pretty readable. Don't merge until both old and new tests pass. Invest in your test infrastructure so that your test suite doesn't get too slow, as it will be in the hot path of your model checking future work.
Legacy code is that which lacks tests. Still true in the LLM age.
How...?
When I found code snippets from StakcOverflow, I read them before pasting them into my IDE. I'm the bottleneck. Therefore there is no point to use StackOverflow...?
First, your prompts should be direct enough to the LLM doesn't wander around producing complexity for no reason.
Second, you should add rules/learning/context to always solve problems in the simplest way possible.
Lastly, after generation, you can prompt the LLM to reduce the complexity of the solution.
Coding in an obj oriented language in an enormous code base (big tech). Junior dev is making a new class and they start it off with LLM generation. LLM adds in three separate abstract classes to the inheritance structure, for a total of seven inherited classes. Each of these inherited classes ultimately comes with several required classes that are trivial to add but end up requiring another hundred lines of code, mostly boilerplate.
Tell me how you, without knowing the code base, get the LLM to not add these classes? Our language model is already trained on our code base, and it just so happens that these are the most common classes a new class tends to inherit. Junior dev doesn't know that the classes should only be used in specific instances.
Sure, you could go line by line and say "what does this inherited class do, do I need it?" and actually, the dev did that. It cut down the inherited classes from three to two, but missed two of them because it didn't understand on a product side why they weren't needed.
Fast forward a year, these abstract classes are still inherited, no one knows why or how because there's no comprehension but we want to refactor the model.
"Well we have this starting function which clearly can solve the task at hand. Its something 99 developers would be happy with, but I can't help but see that if we just reformulate it into a do-while instead we now can omit the checks here and here, almost cutting it in half."
Now obviously it doesn't suffice as real-world example but, when scaled up, is a great view at what waste can accumulate at the macro level. I would say the ability to do this is tied to a survival instinct, one which, undoubtedly will be touted as something that'll be put in the 'next-iteration' of the model. Its not strictly something I think that can be trained to be achievable though, as in pattern matching, but its clearly not achievable yet as in your example from above.
Stop talking to it like a chatbot.
Draft, in your editor, the best contract-of-work you can as if you were writing one on behalf of NASA to ensure the lowest bidder makes the minimum viable product without cutting corners.
---
Goal: Do X.
Sub-goal 1: Do Y.
Sub-goal 2: Do Z.
Requirements:
1. Solve the problem at hand in a direct manner with a concrete implementation instead of an architectural one.
2. Do not emit abstract classes.
3. Stop work and explain if the aforementioned requirements cannot be met.
---For the record: Yes, I'm serious. Outsourcing work is neither easy nor fun.
If doing those is easy, then I would assume that the software isn't that novel in the first place. Maybe get something COTS
I've been coding for 25 years. It is easier for me to describe what I need in code than it is to do so in English. May as well just write it.
20 here, mostly in C; mixture of systems programming and embedded work.
My only experience with vibe-coding is when working under a time-crunch very far outside of my domain of expertise, e.g., building non-transformer-based LLMs in Python.
My only experience with vibe-coding is when working under a time-crunch very far outside of my domain of expertise.
No amount of "knowing how to program" is going to give me >10 years of highly-specialized PhD-level Mathematics experience in under three months.
You tell them not to create extra abstract classes and put that in your onboarding docs.
You literally do the same thing with llms. Instead of onboarding code standards docs you make rules files or whatever the llm needs.
I started reading it and a key plot point is that there is a computer system that is thousands of years old. One of the main characters has "cold sleeped" for so long that he's the only one who knows some of the hidden backdoors. That legacy knowledge is then used to great effect.
Highly recommend it for a great fictional use of institutional knowledge on a legacy codebase (and a great story overall).
Another great example:
In Fire Upon the Deep, due to the delay in communications between star systems, everyone use a descendant of Usenet.
The future I dream of.
Looks like it is the second in a trilogy. Can you just dive in or did you read the first book before?
However, I would recommend skipping Children of the Sky. It's not as good, and was clearly intended as the first installment of a series which Vinge was unable to complete. :(
Chronologically, DitS takes place before FotD. But there is exactly one character in common between the two books, and while he is a major character in both, none of the events of DitS are relevant to the story in FotD (which makes sense since FotD was written first).
So it's really largely a matter of preference as to which one to read first. I would say that FotD has more action and, for the lack of better term, "weirdness" in the setting; while DitS is more slow-paced, with more character development and generally more fleshed-out characters, and explores its themes deeper. But both books have plenty for your mind to chew on.
All in all I think FotD is an easier read, and DitS is a more rewarding one, but this is all very subjective.
One upside to the books being decoupled as much as they are is that whichever one you start with, you get a complete story, so even if you're a completionist you can disregard the other book if you don't like the first one.
General recommendation is to read them in order (Fire first, Deepness second) but I don't really think it matters.
The second book is just as good, but doesn't try as hard to get you addicted early on. The assumption is that you already know how good Vinge's work is.
I recommend starting with Fire Upon the Deep.
LLM is pushing that layer towards natural language and spec-driven development. The only *big* difference is that high level programming languages are still deterministic but natural language is not.
I'm guessing we've reached an irreducible point where the amount of information needed specify the behavior of a program is nearly optimally represented in programming languages after decades of evolution. More abstraction into the natural language realm would make it lossy. And less abstraction down to the low level code would make it verbose.
The previous tools (assemblers, compilers, frameworks) were built on hard-coded logic that can be checked and even mathematically verified. So you could trust what you're standing on. But with LLMs we jump off the safely-built tower into a world of uncertainty, guesses, and hallucinations.
JavaScript has a ton of behavior that is very uncertain at times and I'm sure many JS developers would agree that trusting what you're standing on is at times difficult. There is also a large percentage of developers that don't mathematically verify their code, so the verification is kind of moot in those cases, hence bugs.
The current world of LLM code generation lacks the verification you are looking for, however I am guessing that these tools will soon emerge in the market. For now, building as incrementally as possible and having good tests seems to be a decent path forward.
We call a C->asm compiler "correct" if the meaning of every valid C program turns into an assembly program with equivalent meaning.
The reason LLMs don't work like other compilers is not that they're non-deterministic, it's that the source language is ambiguous.
LLMs can never be "correct" compilers, because there's no definite meaning assigned to english. Even if english had precise meaning, LLMs will never be able to accurately turn any arbitary english description into a C program.
Imagine how painful development would be if compilers produced incorrect assembly for 1% of all inputs.
The LLM in this loop is the equivalent of a human, which also has ambiguous source language if we’re going by your theory of English being ambiguous. So it sounds like you’re saying that if a human produces a C program, it is not verifiable and testable because the human used an ambiguous source language?
I guess for some reason people thought I meant that the compiler would be LLM > machine code, where actually I meant the compiler would still be whatever language the LLM produces down to machine code. Its just that the language the LLM produces can be checked through things like TDD or a human, etc...
I don't think you have thought about this deeply enough. Who or what would do the checking, and according to what specifications?
I understand that an input to an LLM will create a different result in many cases, making the output not deterministic, but that doesn’t mean we can’t use probability to arrive to results eventually.
Verifying code produced is a much simpler task for some code because I, as a human, can look at a generated snippet and reason about it and determine if it is what I want. I can also create tests to say “does this code have this effect on some variable” and then proceed to run the test.
Most programmers that write JavaScript for a living don't really understand how to scale applications in JavaScript, which includes data structures in JavaScript. There is a very real dependence on layers of abstractions to enable features that can scale. They don't understand the primary API to the browser, the DOM, at all and many don't understand the Node API outside the browser.
For an outside observer it really begs the Office Space question: What would you say you do here? Its weird trying to explain it to people completely outside software. For the rest of us in software we are so used to this we take the insanity for granted as an inescapable reality.
Ironically, at least in the terms of your comment, is that when you confront JavaScript developers about this lack of fundamental knowledge comparisons to assembly frequently come up. As though writing JavaScript directly is somehow equivalent to writing machine code, but for many people in that line of work they are equivalent distant realities.
The introduction of LLMs makes complete sense. When nobody knows how any of this code works then there isn't a harm to letting a machine write it for you, because there isn't a difference in the underlying awareness.
Although I'm sure you are correct, I would also want to mention that most programmers that write JavaScript for a living aren't working for Meta or Alphabet or other companies that need to scale to billions, or even millions, of users. Most people writing JavaScript code are, realistically, going to have fewer than ten thousand users for their apps. Either because those apps are for internal use at their company (such as my current project, where at most the app is going to be used by 200-250 people, so although I do understand data structures I'm allowing myself to do O(N^2) business logic if it simplifies the code, because at most I need to handle 5-6 requests per minute), or else because their apps are never going to take off and get the millions of hits that they're hoping for.
If you don't need to scale, optimizing for programmer convenience is actually a good bet early on, as it tends to reduce the number of bugs. Scaling can be done later. Now, I don't mean that you should never even consider scaling: design your architecture so that it doesn't completely prevent you from scaling later on, for example. But thinking about scale should be done second. Fix bugs first, scale once you know you need to. Because a lot of the time, You Ain't Gonna Need It.
Case in point: I'm seeing much more success in LLM driven coding with Rust, because the strong type system prevents many invalid states that can occur in more loosely or untyped languages.
It takes longer, and often the LLM has to iterate through `cargo check` cycles to get to a state that compiles, but once it does the changes are very often correct.
The Rust community has the saying "if it compiles, it probably works". You can still have plenty of logic bugs of course , but the domain of possible mistakes is smaller.
What would be ideal is a very strict (logical) definition of application semantics that LLMs have to implement, and that ideally can be checked against the implementation. As in: have a very strict programming language with dependent types , littered with pre/post conditions, etc.
LLMs can still help to transform natural language descriptions into a formal specification, but that specification should be what drives the implementation.
> By then Mike had voder-vocoder circuits supplementing his read-outs, print-outs, and decision-action boxes, and could understand not only classic programming but also Loglan and English, and could accept other languages and was doing technical translating—and reading endlessly. But in giving him instructions was safer to use Loglan. If you spoke English, results might be whimsical; multi-valued nature of English gave option circuits too much leeway.
For those unfamiliar with it, it's not that Lojban is perfectly unambiguous. It's that its design strives to ensure that ambiguity is always deliberate by making it explicit.
The obvious problem with all this is that Lojban is a very niche language with a fairly small corpus, so training AI on it is a challenge (although it's interesting to note that existing SOTA models can read and write it even so, better than many obscure human languages). However, Lojban has the nice property of being fully machine parseable - it has a PEG grammar. And, once you parse it, you can use dictionaries to construct a semantic tree of any Lojban snippet.
When it comes to LLMs, this property can be used in two ways. First, you can use structured output driven by the grammar to constrain the model to output only syntactically valid Lojban at any point. Second, you can parse the fully constructed text once it has been generated, add semantic annotations, and feed the tree back into the model to have it double-check that what it ended up writing means exactly what it wanted to mean.
With SOTA models, in fact, you don't even need the structured output - you can just give them parser as a tool and have them iterate. I did that with Claude and had it produce Lojban translations that, while not perfect, were very good. So I think that it might be possible, in principle, to generate Lojban training data out of other languages, and I can't help but wonder what would happen if you trained a model primarily on that; I suspect it would reduce hallucinations and generally improve metrics, but this is just a gut feel. Unfortunately this is a hypothesis that requires a lot of $$$ to properly test...
The nature of programming might have to shift to embrace the material property of LLM. It could become a more interpretative, social, and discovery-based activity. Maybe that's what "vibe coding" would eventually become.
This sounds like an unmaintainable, tech debt nightmare outcome to me
It is, for example, possible to formally verify or do 100% exhaustive testing as you go lower down the stack. I can't imagine this would be possible between NLs and PLs.
Arguably, determinism isn't everything in programming: It's very possible to have perfectly deterministic, yet highly surprising (in terms of actual vs. implied semantics to a human reader) code.
In other words, the axis "high/low level of abstraction" is orthogonal to the "deterministic/probabilistic" one.
Without determinism, learning becomes less rewarding.
Specs are not more abstract but more ambiguous, which is not the same thing.
This is why you see so many failed startup around slack/email/jira efficiency. Half the time you do not know if you missed critical information so you need to go to the source, negating gains you had with information that was successfully summarized.
It took a ton of effort on his part to convince his manager that this wasn't ready to be merged.
I wonder how much vibe coded software is out there in the wild that just appears to work?
More dangerous thing is such idiot managers can judge you by their lens of shipping LLM garbage they didn't applied in reality to see consequences, living in fantasy due to lack of technical knowledge. Of course it directly leads to firing people and adding more tasks/balloning expectation on leftover team who are force trapped to burn out and be replaced as trash as that makes total sense in their world view and "evidence".
https://www.scottsmitelli.com/articles/altoids-by-the-fistfu...
I haven't seen a truly non-technical manager in over 15 years.
Even a calculator doesn't work if one doesn't use it correctly. Agentic coding works very well if used correctly, such as in the following way:
1. Define your task prompt as well as possible. Refine it via the LLM, having the LLM review it, and repeat this process ad infinitum until there are no important issues left to fix. If possible, use multiple LLMs to identify gaps in your task prompt. You now have your refined task specification. This is the most time consuming step. Sometimes it's necessary to add API docs and SDKs to the context.
2. Use a good reasoning LLM by OpenAI or Claude or Gemini or Grok to execute the spec.
3. Review the generated code line by line. Make any necessary changes, either manually or again using the LLM. With any luck there won't be anything to fix.
If used in this way, it works so well.
0. Pick a task that's not too complicated for the LLM, and use language and frameworks that it knows about.
If you're within that zone, it all feels magical. Step outside and things fall apart really quickly.
And why should they? Most people will pay them, churn out whatever code, it will likely never be deployed or used by anyone (this is true of most code created by a real engineer too). By the time the user has figured out what they have "created" isn't real, Loveable is on to the next mark/user.
You just don't build up the necessary mental model of what the code does when vibing, and so although you saved time generating the code, you lose all that anyway when you hit a tricky bug and have to spend time building up the mental model to figure out what's wrong.
And saying "oh just do all the planning up front" just doesn't work in the real world where requirements change every minute.
And if you ever see anyone using "accepted lines" as a metric for developer productivity/hours saved, take it with a grain of salt.
Why? It was almost meant in jest and as a joke, no one seriously believes you don't need to review code, you end up in spaghetti land so quickly I can't believe anyone tried "vibe coding" for more than a couple of hours then didn't quickly give up on something that is obviously infeasible.
Now, reviewing whatever the LLM gives you back, carefully massage it into the right shape then moving on, definitely helps my programming a lot, but careful review is needed that the LLM had the right context so it's actually correct. But then we're in "pair programming" territory rather than blindly accepting whatever the LLM hands you, AKA "vibe coding".
However, code that is well-designed by humans tends to be easier to understand than LLM spaghetti.
Additionally you may have institutional knowledge accessible. I can ask a human and they can explain what they did. I can ask an LLM, too and they will give me a plausible-sounding explanation of what they did.
The weights won't have that by default, true, that's not how they were built.
But if you're a developer and can program things, there is nothing stopping you from letting LLMs have access to those details, if you feel like that's missing.
I guess that's why they call LLMs "programmable weights", you can definitely add a bunch of context to the context so they can use it when needed.
LLMs can barely do 2+2, humans don't even understand the weights if they see them. LLMs can have all the access they want to their own weights and they won't be able to explain their thinking.
Modern Addendum: And if you have an LLM generate your code, you'll need one twice as smart to debug it.
In other words, debugging can be at the same "intelligence" level, but since an LLM doesn't really know what it is doing, it can make errors it won't comprehend on its own. The experience is a lot like working with a junior programmer, who may write a bunch of code but cannot figure out what they got wrong.
Just maybe, it's the difference between the "medium" and "high" suffixed thinking modes of an LLM.
Fwiw, for complicated functions that must exist, I have the LLM write a code comment explaining the intent and the approach.
The challenge of navigating rapidly changing or poorly documented code isn’t new: It’s been a constant at every company I’ve worked with. At larger organizations the sheer volume of code, often written by adjacent teams, will outpace your ability to fully understand it. Smaller companies tend to iterate so quickly (and experience so much turnover) that code written two weeks ago might already be unrecognizable, if the original author is even still around after those two weeks!
The old adage still applies: the ability to read code is more crucial than the ability to write it. LLMs just amplify that dynamic. The only real difference is that you should assume the author is gone the moment the code lands. The author is ephemeral, or they went on PTO/quit immediately afterward: Whatever makes you more comfortable.
LLMs don't "just" amplify that dynamic
They boost it to impossibly unsustainable levels
First, refactoring code. Specifically, recently I used it on a library that had solid automated testing coverage. I needed to change the calling conventions of a bunch of methods and classes in the library, but didn’t want to rewrite the 100+ unit tests by hand. Claude did this quickly and without fuss.
Second is one time use code. Basically let’s say you need to convert a bunch of random CVS files to a single YAML file, or convert a bunch of video files in different formats to a single standard format, or find any photos in your library that are out of focus. This works reasonably well.
Bonus one is just generating sample code for well known libraries.
I have been curious what would happen if I handed something like Claude a whole server and told it to manage it however it wants with relatively little instruction.
In the past the problem was about transferring a mental model from one developer to the other. This applied even when people copy-pasted poorly understood chunks of example code from StackOverflow. There was specific intent and some sort of idea of why this particular chunk of code should work.
With LLM-generated software there can be no underlying mental model of the code at all. None. There is nothing to transfer or infer.
I’ve had to give feedback to some junior devs who used quite a bit of LLM created code in a PR, but didn’t stop to question if we really wanted that code to be “ours” versus using a library. It was apparent they didn’t consider alternatives and just went with what it made.
IE, if I need a method (or small set of methods) that have clearly defined inputs and outputs, probably because they follow a well-known algorithm, AI is very useful. But, in this case, wider comprehension isn't needed; because all the LLM is doing is copying and adjusting.
E.g. "extract the logic in MyFunc() in foo.cc into a standalone helper and set up all the namespaces and headers so that it can be called from MyFunc() and also in bar.cc. Add tests and make sure it all compiles and works as expected, then call it in bar.cc in the HTTP handler stub there."
It never needs to make architectural decisions. If I watch it and it looks like it is starting to go off the rails and do something odd, I interrupt it and say "Look at baz.cc and follow the coding style and pattern there" or whatever.
Seems to work well.
I feel like as an engineer I am moving away from concrete syntax, and up an abstraction level into more of abstract form where I am acting more like a TL reviewing code and making the big-brush decisions on how to structure things, making course corrects as I go. Pure vibe-coding is rare.
And while I could catch that because I wrote the code in question and know the answers to those questions, others do not have that benefit. The notion that someone new to the codebase - especially a relatively unexperienced dev - would have AI "documentation" as a starting point is honestly quite terrifying, and I don't see how it could possibly end with anything other than garbage out.
I'm not sure how or why the conversation shifted from LLMs helping you "consume" vs helping you "produce". Maybe there's not as much money in having an Algolia-on-steroids as there is in convincing execs that it will replace people's jobs?
However I think we should be thinking harder about how coding will change as LLMs change the economics of writing code: - If the cost of delivering a feature is ~0, what's the point in spending weeks prioritizing it? Maybe Product becomes more like an iterative QA function? - What are the risks that we currently manage through good software engineering practices and what's the actual impact of those risks materializing? For instance, if we expose customer data that's probably pretty existential, but most companies can tolerate a little unplanned downtime (even if they don't enjoy it!). As the economics change, how sustainable is the current cost/benefit equilibrium of high-quality code?
We might not like it but my guess is that in ≤ 5 years actual code is more akin to assembler where sure we might jump in and optimize but we are really just monitoring the test suites and coverage and risks rather than tuning whether or not the same library function is being evolved in a way which gives leverage across the code base.
"High quality code"? The standard today is "barely functional", if we lower the standards any further we will find ourselves debating how many crashes a day we're willing to live with, and whether we really care about weekly data loss caused by race conditions.
Writing code and delivering a feature are not synonymous. The time spent writing code is often significantly less than the time spent clarifying requirements, designing the solution, adjusting the software architecture as necessary, testing, documenting, and releasing. That effort won't be driven to 0 even if an LLM could be trusted to write perfect code that didn't need human review.
My so far experiences boil down to: APIs, function descriptions, overall structures and testing. In other words, ask a dev to become an architect that defines the project and lay out the structure. As long as the first three points are well settled, code gen quality is pretty good. Many people believe the last point (testing) should be done automatically as well. While LLM may help with unit tests or tests on macro structures, I think people need to define high-levle, end-to-end testing goals from a new angel.
Just like strong typing reduces the amount of tests you need (because the scope of potential errors is reduced), there is a giant increase in error scope when you can’t assume the writer to be rational.
Staying realistic, we can say with some confidence that within the next 6-12 months alone, there are good reasons to believe that local, open source models will equate their bigger cloud cousins in coding ability, or get very close. Within the next year or two, we will quite probably see GPT6 and Sonnet 5.0 come out, dwarfing all the models that came before. With this, there is a high probability that any comprehension or technical debt accumulated over the past year or more will be rendered completely irrelevant.
The benefits given by any development made until then, even sloppy, should more than make up for the downside caused by tech debt or any kind of overly high complexity problem. Even if I'm dead wrong, and we hit a ceiling to LLM's ability to grok huge/complex codebases, it is unlikely to appear within the next few months. Additionally, behind closed doors the progress made is nothing short of astounding. Recent research at Stanford might quite simply change all of these naysayers' mind.
When I really need to understand what's happening with code, I generally will write it each step.
LLMs make it much easier for me to do this step and more. I've used LLMs to quickly file PRs for new (to me) code bases.
A lot of these criticisms are valid and I recognise there's a need for people to put their own personal stake in the ground as being one of the "true craftsmen" but we're now at the point where a lot of these articles are not covering any real new ground.
At least some individual war stories about examples where people have tried to apply LLMs would be nice, as well as not pretending that the problem of sloppy code didn't exist before LLMs.
Certainly not remotely the same volume of sloppy code
Impossibly high volumes of bad code is a new problem
Is this something people are doing?
https://github.com/github/spec-kit
""" Spec-Driven Development flips the script on traditional software development. For decades, code has been king — specifications were just scaffolding we built and discarded once the "real work" of coding began. Spec-Driven Development changes this: specifications become executable, directly generating working implementations rather than just guiding them. """
The takeaway is that instead of vibecoding you write specs and you get the LLM to align the generated code to the specs.
An llm-assisted engineer writes code faster than a careful person can review.
Eventually the careful engineers get ran over by the sheer amount of work to check, and code starts passing reviews when it shouldn’t.
It sounds obvious, that careless work is faster than careful one, but there are psychological issues in play - expectation by management of ai as a speed multiplier, personal interest in being perceived as someone who delivers fast, concerns of engineers of being seen as a bottleneck for others…
In many cases, it's more than expectation. For top management especially, these are the people who have signed off on massive AI spending on the basis that it will improve productivity. Any evidence to the contrary is not just counter to their expectations - it's a giant flashing neon sign screaming "YOU FUCKED UP". So of course organizations run by those people are going to pretend that everything is fine, for as long as anything works at all.
And then the other side of this is the users. Who have already been conditioned to shrug at crappy software because we made that the norm, and because the tech market has so many market-dominant players or even outright monopolies in various niches that users often don't have a meaningful choice. Which is a perfect setup for slowly boiling the frog - even if AI is used to produce sloppy code, the frog is already used to hot water, and already convinced that there's no way out of the pot in any case, so if it gets hotter still they just rant about it but keep buying the product.
Which is to say, it is a shitshow, but it's a shitshow that can continue for longer than most engineers have emotional capacity to sustain without breaking down. In the long term, I expect AI coding in this environment to act as a filter: it will push out all the people who care about quality and polish out of the industry, and reward those who treat clicking "approved" on AI slop as their real job description.
The things I struggle more with when I use LLMs to generate entire features with limited guidance (so far only in hobby projects) is the LLM duplicating functionality or not sticking to existing abstractions. For example if in existing code A calls B to get some data, and now you need to do some additional work on that data (e.g. enriching or verifying) that change could be made in A, made in B, or you could make a new B2 that is just like B but with that slight tweak. Each of those could be appropriate, and LLMs sometimes make hillariously bad calls here
Yes...? Why wouldn't you always do this LLM or not?
"Please generate unit tests for the website that exercise documented functionality" into the LLM used to generate the website should do it.
The quote that is interesting in the context of the fast-pacing LLM development is this
> The Dark Matter Developer will never read this blog post because they are getting work done using tech from ten years ago and that's totally OK
[1] https://www.hanselman.com/blog/dark-matter-developers-the-un...
Well they kept limping along with that mess for another ten years while the industry sprinted ahead. The finally released a new product recently, but I don't think anyone cares because everyone else did it better five years ago
And when you think about it, LLMs are pretty much, by design, machines that look for truth in mediocrity in its etymological sense.
LLMs have made the value of content worth precisely zero. Any content can be duplicated with a prompt. That means code is also worth precisely zero. It doesn't matter if humans can understand the code, what matters is if the LLM can understand the code and make modifications.
As long as the LLM can read the code and adjust it based on the prompt, what happens on the inside doesn't matter. Anything can be fixed with simply a new prompt.
You can have functional tests, sure, but if there's one thing that LLMs (and AI in general) is good at, it's finding unconventional ways to game metrics.
> Teams that care about quality will take the time to review and understand (and more often than not, rework) LLM-generated code before it makes it into the repo. This slows things down, to the extent that any time saved using the LLM coding assistant is often canceled out by the downstream effort.
I recently tried a mini experiment for myself to (dis)prove similar notions. I feel more convinced we'll figure out a way to use LLMs and keep maintainable repositories.
i intentionally tried to use a language I'm not as proficient in (but obv have a lot of bg in programming) to see if I could keep steering the LLM effectively
https://kau.sh/blog/container-traffic-control/
and I saved a *lot* of time.
I think this might be the wrong assumption. In the same way the news happens to be wrong about topics you know, I think it's probably better to judge code you know over code you don't.
It's easy to accept whatever the output was if you don't know what you're looking at.
It'll be interesting to see what it tells experts about sloppy, private code bases (you can't use, existing OSS examples because opinions and docs would be in the LLM corpus and not just derived from the code itself.)
but i'm not proficient != i don't know (i.e. i have worked on javascript many moons ago, but i wouldn't consider myself an expert at it today).
i like to think i can still spot unmaintainable vs maintainable code but i understand your point that maybe the thinking is to have an expert state that opinion.
the code is [oss](https://github.com/kaushikgopal/ff-container-traffic-control) btw so would love to get other takes.
But nearly every engineer I've ever spoken to has over-indexed on 'tech debt bad'. Tech debt is a lot like normal debt - you can have a lot of it and still be a healthy business.
The other side of the equation is that it's easier to understand and make changes to code with LLMs. I've been able to create "Business Value" (tm) in other people's legacy code bases in languages I don't know by making CRUD apps do things differently from how they currently do things.
Before, I'd needed to have hired a developer who specialises in that language and paid them to get up to speed on the code base.
So I agree with the article that the concerns are valid, but overall I'm optimistic that it's going to balance out in the long run - we'll have more code, throw away more code, and edit code faster, and a lot of that will cancel.
If the assertion is, I want to use non-LLM methods to maintain LLM-generated code, then I agree, there is a looming problem.
The solution to making LLM-generated code maintainable involves:
1) Using good design practices before generating the code, e.g. have a design and write it down. This is a good practice regardless of maintainability issues because it is part of how you get good results getting LLMs to generate code.
2) Keeping a record of the prompts that you used to generate the code, as part of the code. Do NOT exclude CLAUDE.md from your git repo, for instance, and extract and save your prompts.
3) Maintain the code with LLMs, if you generated it with LLMs.
Mandatory car analogy:
Of course there was a looming maintenance problem when the automobile was introduced, because livery stables were unprepared to deal with messy, unpredictable automobiles.
They won't. In a year or two these will be articles that get linked back to similar to "Is the internet just a fad?" articles of the late 90s.
The issue is that LLMs don't "understand." They merely copy without contributing original thought or critical thinking. This is why LLMs can't handle complicated concepts in codebases.
What I think we'll see in the long run is:
(Short term) Newer programming models that target LLMs: IE, describe what you want the computer to do in plain English, and then the LLM will allow users to interact with the program in a more conversational manner. Edit: These will work in "high tolerance" situations where small amounts of error is okay. (Think analog vs digital, where analog systems tend to tolerate error more gracefully than digital systems.)
(Long term) Newer forms of AI that "understand." These will be able to handle complicated programs that LLMs can't handle today, because they have critical thinking and original thought.
“The Internet? Bah! Hype alert: Why cyberspace isn't, and will never be, nirvana” by Clifford Stoll (1995)
Excerpt: “How about electronic publishing? Try reading a book on disc. At best, it's an unpleasant chore: the myopic glow of a clunky computer replaces the friendly pages of a book. And you can't tote that laptop to the beach. Yet Nicholas Negroponte, director of the MIT Media Lab, predicts that we'll soon buy books and newspapers straight over the Internet. Uh, sure.”
https://www.nysaflt.org/workshops/colt/2010/The%20Internet.p...
“Why most economists' predictions are wrong” by Paul Krugman (1998)
Excerpt: “By 2005 or so, it will become clear that the Internet's impact on the economy has been no greater than the fax machine's.”
https://web.archive.org/web/19980610100009/http://www.redher...
The growth of the Internet will slow drastically, as the flaw in "Metcalfe's law"--which states that the number of potential connections in a network is proportional to the square of the number of participants--becomes apparent: most people have nothing to say to each other! By 2005 or so, it will become clear that the Internet's impact on the economy has been no greater than the fax machine's.
As the rate of technological change in computing slows, the number of jobs for IT specialists will decelerate, then actually turn down; ten years from now, the phrase information economy will sound silly.
There is certainly real market penetration with LLMs. However, there is a huge gap between fantasy and reality - as in what is being promised vs what is being delivered and the effects on the economy are yet to play out.
Less than a year ago I was generated somewhat silly and broken unit tests with copilot. Now I'm generating entire feature sets while doing loads of laundry.
> But those of us who’ve experimented a lot with using LLMs for code generation and modification know that there will be times when the tool just won’t be able to do it.
The pace of change here--the new normal pace--has the potential to make this look outdated in mere months, and finding that the curve topped out exactly in late 2025, such that this remains the state of development for many years, seems intuitively very unlikely.
The last percentage point for something to get just right are the hardest, why are you so sure that the flaws in LLMs will be gone in such a short time frame?
For complex tasks, I use it just to help me plan or build a draft (and hacky) pull request, to explore options. Then I rewrite it myself, again leaving the best part to myself.
LLMs made writing code even more fun than it was before, to me. I guess the outcomes only depends on the user. At this point, it's clear that all my peers that can't have fun with it are using it as they use ChatGPT, just throwing a prompt, hoping for the best, and then getting frustrated.
You can't change your stance later, it will just give you a headache.
When the former breaks, you fix it like conventional bug hunting. When the latter breaks, you fix it by either asking LLM to fix it or scrap it and ask LLM to regenerate it.
The wave’s still breaking, so I’m going to ride it out until it smooths into calm water. Maybe it never will. I don't know.
This is a pretty seriously bad difference imo
Tests prevent regressions and act as documentation. You can use them to prove any refactor is still going to have the same outcome. And you can change the production code on purpose to break the tests and thus prove that they do what they say they do.
And your AI can use them to work on the codebase too.
I like writing code because eventually I have to fix code. The writing will help me have a sense for what's going on. Even if it will only be 1% of the time I need to fix a bug, having that context is extremely valuable.
Then reserve AI coding when there's true boilerplate or near copy-paste of a pattern.
This is not full blown vibe coding of a web application to be sure.
The analogy of debt breaks when you can discard the program and start anew, probably at great cost to the company. But since that cost is externalized to developers, no developer is actually paying the debt because greenfield development is almost always more invigorating than maintaining legacy code. It's a bailout (really debt forgiveness) of technical debt by the company, who also happens to be paying the developers a good wage on the very nebulous promise that this won't happen again (spoiler: it will).
What developers need to do to get a bailout is enough reputation and soft skills to convince someone a rewrite is feasible and the best option. And leadership who is not completely convinced you should never rewrite programs from scratch.
Joel Spolsky's beliefs here are worth a revisit in the face of hastened code generation by LLMs too, as it was based completely on human-created code: https://www.joelonsoftware.com/2000/04/06/things-you-should-...
Some programs still should not be rewritten: Excel, Word, many of the more popular and large programs. However many smaller/medium applications that are being maintained by developers using LLMs in this way will more easily have a larger fraction of LLM generated code that is harder to understand (again, if you believe the article). Where-as before you might have rewritten a small program, you might now rewrite a medium program.
I can ask questions like, “how is this code organized” and, “where does [thing] happen?”
The market will eventually self correct once folks get more burned by that.
And this is where I stop reading. You cannot make such a descriptive statement without some sort of corroborating evidence other than your intuition/anecdotes.
Most complex production systems do not have this level of documentation and/or regression coverage, nor I suspect will any AI-generated system. The requirements you fed the AI to "specify" the system aren't even close to a 100% coverage regression test suite, even of the product features, let alone all the more detailed behaviors that customers may be used to.
It's hard to see mission-critical code (industrial control, medical instruments, etc) ever being written in this way since the cost of failure is so high.
Fix your tests not your resulting code
When velocity and quantity are massively incentivized over understanding, strategy, and quality, this is the result. Enshittification of not only the product, but our own professional minds.
Large organizations are increasingly made up of technical specialists who are very good at their little corner of the operation. In the past, you had employees present at firms for 20+ years who not only understand the systems in a holistic way, but can recall why certain design or engineering decisions were made.
There is also a demographic driver. The boomer generation with all the institutional memory have left. Gen-X was a smaller cohort, and was not able to fully absorb that knowledge transfer. What is left are a lot of organizations run by people under the age of 45 working on systems where they may not fully understand the plumbing or context.
At first I was frustrated but my boss said it was actually a perfect sequence, since that "crappy code" did generate a working demo that our future customers loved, which gave us the validation to re-write. And I agree!
LLMs are just another tool in the chest; a curious, lighting fast jr developer with an IQ of 85 who can't learn and needs a memory wipe whenever they make a design mistake.
When I use it knowing its constraints it's a great tool! But yeah if used wrong you are going to make a mess, just like any powerful tool
"Comprehension debt" is a perfect description for the thing I've been the most concerned about with AI coding.
One I got past the Dunning-Kruger phase and started really looking at what was being generated, I ran into this comprehension issue.
With a human, even a very junior one, you can sort of "get in the developer's head". You can tell which team member wrote which code and what they were thinking at the time. This leads to a narrative, or story of execution which is mostly comprehensible.
With the AI stuff, it's just stochastic parrot stuff. It may work just fine, but there will be things like random functions that are never called, hundreds or thousands of lines of extra code to do very simple things. References to things that don't exist, and never have.
I know this stuff can exist in human code bases too - but generally I can reason about why. "Oh, this was taken out for this issue and the dev forgot to delete it".
I can track it, even if it's poor quality.
With the AI stuff, it's just randomly there. No idea why, if it is used, was ever used, makes sense, is extra fluff or brilliant.
It takes a lot of work to figure out.
The goal is to see what how far I can push the LLM. How good is it... really?
I'm also not sure about your basic premise that understanding will improve. That depends on the size of network's internal representation(s), which will start overfitting at some point.
What happened? I don’t use Llms really so I’m not sure how people have completely lost their ability to problem solve. They surely must remember 6 months ago when they were debugging just fine?
1. Initially euphoria both with having the tool and seeing how much can be done quickly, not having a good sense of its limits or reach. Mining too deep, and disturbing the Balrog. Basically: doing too much.
2. Not sufficiently reviewing the work it produces.
3. The tools themselves being badly designed from a UX POV to encourage #1 and #2.
From my perspective, there's a fundamental mis-marketing of the agentic tools, and a failure on the part of the designers of these products -- what they could be producing is a tool to work with developers in a Socratic dialogue, in an interactive manner, having the engineer have more of mandatory review and discussion process that makes sure there's a guided authoring process.
When guided and fenced with a good foundational architecture, Claude can produce good work. But the long term health of the project depends on the engineer doing the prompting to be 100% involved. And this can actually be an insanely exhausting process.
In the last 6 months, I have gone from highly skeptical and cynical about LLMs as coding agents, to euphoric and delighted, back to a more cautious approach. I use Claude Code daily and constantly. But I try to use it in a very supervised fashion.
What I'd like to see is agentic tools that are less agentic and more interactive. Claude will prompt you Yes/No diff by diff but this is the wrong level of granularity. What we need is something more akin to a pair programming process and instead of Yes/No prompts there needs to be a combination of an educational aspect (tool tells you what it's discovered, and you tell it what you've discovered) with review.
The makers of these tools need to have them slow down and stop pretending to automate us out of work, and instead take their place as tools used by skilled engineers. If they don't, we're in for a world of mess
It might be a way to collect the paycheck if that's the only thing you care about. But for people who want to find at least some enjoyment in what they do, it's a shortcut to hell.