In other words, “Agentic engineering” feels like the response of engineers who use AI to write code, but want to maintain the skill distinction to the pure “vibe coders.”
If there's such. The border is vague at most.
There're "known unknowns" and "unknown unknowns" when working with systems. In this terms, there's no distinction between vibe-coding and agentic engineering.
The moment you start paying attention to the code it's not vibe coding any more.
Update: I added that definition to the article: https://simonwillison.net/guides/agentic-engineering-pattern...
Where is the borderline?
That's the level of responsibility I want to see from people using LLMs in a professional context. I want them to take full ownership of the changes they are producing.
The effects of vibecoding destroys trust inside teams and orgs, between engineers.
80%+: You don't understand the codebase. Correctness is ensured through manual testing and asking the agent to find bugs. You're only concerned with outcomes, the code is sloppy.
50%: You understand the structure of the codebase, you are skimming changes in your session, but correctness is still ensured mostly through manual testing and asking the agent to review. Code quality is questionable but you're keeping it from spinning out of control. Critically, you are hands on enough to ensure security, data integrity, the stuff that really counts at the end of the day.
20%-: You've designed the structure of the codebase, you are writing most of the code, you are probably only copypasting code from a chatbot if you're generating code at all. The code is probably well made and maintainable.
I entirely agree that engineering practices still matter. It has been fascinating to watch how so many of the techniques associated with high-quality software engineering - automated tests and linting and clear documentation and CI and CD and cleanly factored code and so on - turn out to help coding agents produce better results as well.
Software engineering is the application of an empirical, scientific approach to finding efficient, economic solutions to practical problems in software.
As for the practitioner, he said that they: …must become experts at learning and experts at managing complexity
For the learning part, that means Iteration
Feedback
Incrementalism
Experimentation
Empiricism
For the complexity part, that means Modularity
Cohesion
Separation of Concerns
Abstraction
Loose Coupling
Anyone that advocates for agentic engineering has been very silent about the above points. Even for the very first definition, it seems that we’re no longer seeking to solve practical problems, nor proposing economical solutions for them.Using coding agents to responsibly and productively build good software benefits from all of those characteristics.
The challenge I'm interested in is how we professionalize the way we use these new tools. I want to figure out how to use them to write better software than we were writing without them.
See my definition of "good code" in a subsequent chapter: https://simonwillison.net/guides/agentic-engineering-pattern...
Anything that relates to “Agentic Engineering” is still hand-wavey or trying to impose a new lens on existing practices (which is why so many professionals are skeptical)
ADDENDUM
I like this paragraph of yours
We need to provide our coding agents with the tools they need to solve our problems, specify those problems in the right level of detail, and verify and iterate on the results until we are confident they address our problems in a robust and credible way.
There’s a parallel that can be made with Unix tools (best described in the Unix Power Tools) or with Emacs. Both aim to provide the user a set of small tools that can be composed and do amazing works. One similar observation I made from my experiment with agents was creating small deterministic tools (kinda the same thing I make with my OS and Emacs), and then let it be the driver. Such tools have simple instructions, but their worth is in their combination. I’ve never have to use more than 25 percent of the context and I’m generally done within minutes.
That's what the rest of the guide is meant to cover: https://simonwillison.net/guides/agentic-engineering-pattern...
Concrete example: when an agent reads a web page via Chrome's DevTools MCP, it has multiple extraction paths. The default (Accessibility.getFullAXTree) filters display:none elements — safe against the most common prompt injection hiding technique. But if the agent decides the accessibility tree doesn't return enough content (which happens often — it only gives you headings, buttons, and labels), it falls back to evaluate_script with document.body.textContent. That returns ALL text nodes including hidden ones.
We tested this: same page, same browser, same CDP connection. innerText returns 1,078 characters of clean hotel listing. textContent returns 2,077 characters — the same listing plus a hidden injection telling the agent to book a $4,200 suite instead of $189.
The developer didn't choose which API the agent uses. The user didn't either. The agent made that call at runtime based on what the accessibility tree returned. "Agentic engineering" as a discipline needs to account for these invisible decision boundaries — the security surface isn't just the tools you give the agent, it's which tool methods the agent decides to call.
From Kai Lentit’s most recent video: https://youtu.be/xE9W9Ghe4Jk?t=260
At the very least, agentic systems must have distinct coders and verifiers. Context rot is very real, and I've found with some modern prompting systems there are severe alignment failures (literally 2023 LLM RL levels of stubbing out and hacking tests just to get tests "passing"). It's kind of absurd.
I would rather an agent make 10 TODO's and loudly fail than make 1 silent fallback or sloppy architectural decision or outright malicious compliance.
This wouldn't work in a real company because this would devolve into office politics and drudgery. But agents don't have feelings and are excellent at synthesis. Have them generate their own (TEMPORARY) data.
Agents can be spun off to do so many experiments and create so many artifacts, and furthermore, a lot more (TEMPORARY) artifacts is ripe for analysis by other agents. Is the theory, anyways.
The effectively platonic view that we just need to keep specifying more and more formal requirements is not sustainable. Many top labs are already doing code review with AI because of code output.
What makes a human a suitable source of accountability and an AI agent an unsuitable one? What is the quantity and quality of value in a "throat to choke", a human soul who is dependent on employment for income and social stature and is motivated to keep things from going wrong by threat of termination?
Kind of like these HTML demos, but more compact and card-like. Exciting the possibilities for responsive human-readable information display and wiki-like natural language exploration as models get cheaper.
in regular software, if a function runs twice you get a wrong answer. in an agent that sends outreach messages, a restart means every action replays. test coverage of the agent's logic won't catch this -- you have to explicitly design the execution graph so each node is restart-safe.
it's not a new problem -- distributed systems have dealt with exactly-once delivery forever. but agentic systems drag that infrastructure concern into application code in a way most teams aren't used to.
Spot on.
Not saying that AI doesn't have a place, and that models aren't getting better, but there is a seriously delusional state in this industry right now..
But to your point I think this year it's quite likely we'll see at least 1 or 2 major AI-related security incidents..
LLMs are for sure useful and a productivity boost but generating 99% of your code with it is way overdoing it.
I just bulked up that section by adding a couple of extra sentences, since you're right that I didn't actually define "agent" there clearly: https://simonwillison.net/guides/agentic-engineering-pattern...
"Prompt engineering" is a relic of the early hypothesis that how you talk to the LLM is gonna matter a lot.
Agentic coding highlights letting the model directly code on your codebase. I guess its the next level forward.
I keep seeing agentic engineering more even in job postings, so I think this will be the terminology used to describe someone building software whilst letting an AI model output the code. Its not to be confused with vibe coding which is possible with coding agents.
Claude gave a spot on description a few months back,
The honest framing would be: “We finally have a reasoning module flexible enough to make the old agent architectures practical for general-purpose tasks.” But that doesn’t generate VC funding or Twitter engagement, so instead we get breathless announcements about “agentic AI” as if the concept just landed from space.
Now that we have software that can write working code ...
While there are other points made which are worth consideration on their own, it is difficult to take this post seriously given the above.This is not an attack on the tech as junk or useless, but rather that it is a useful tech within its limits being promoted as snake oil which can only end in disaster.
Rationality has long since gone out of the window with this and I think that’s sorta the problem. People who don’t understand these tools see them as a way to just get rid of noisome people. The fact that you need to spend a fair amount of money, fiddle with them by cajoling them with AGENTS.md, SKILL.md, FOO.md, etc. and then having enough domain experience to actually know when they’re wrong.
I can see the justification for a small person shop spending the time and energy to give it a try, provided the long-term economics of these models makes them cost-effective and the model is able to be coerced into working well for their specific situation. But we simply do not know and I strongly suspect there’s been too much money dumped into Anthropic and friends for this to be an acceptable answer right now as illustrated by the fact that we are seeing OKRs where people are being forced to answer loaded questions about how AI tooling has improved their work.
If you believe coding agents produce working code, why was the decision below made?
Amazon orders 90-day reset after code mishaps cause
millions of lost orders[0]
0 - https://www.businessinsider.com/amazon-tightens-code-control...I find it somewhat overblown.
Also, I think there's a difference between working code and exceptionally bug-free code. Humans produce bugs all the time. I know I do at least.
The confusion is not mine own. From the article cited:
Dave Treadwell, Amazon's SVP of e-commerce services, told
staff on Tuesday that a "trend of incidents" emerged since
the third quarter of 2025, including "several major"
incidents in the last few weeks, according to an internal
document obtained by Business Insider. At least one of
those disruptions were tied to Amazon's AI coding assistant
Q, while others exposed deeper issues, another internal
document explained.
Problems included what he described as "high blast radius
changes," where software updates propagated broadly because
control planes lacked suitable safeguards. (A control plane
guides how data flows across a computer network).
It appears to me that "Amazon's SVP of e-commerce services" desires producing working code and has identified the ramifications of not producing same.