https://www.anthropic.com/engineering/building-effective-age...
"- Workflows are systems where LLMs and tools are orchestrated through predefined code paths.
- Agents, on the other hand, are systems where LLMs dynamically direct their own processes and tool usage, maintaining control over how they accomplish tasks."
What Anthropic calls a "workflow" in the above definition is what most of the big enterprise software companies (Salesforce, ServiceNow, Workday, SAP, etc.) are building and calling AI Agents.
What Anthropic calls an "agent" in the above definition is what AI Researchers mean by the term. It's also something that mainly exists in their labs. Real world examples are fairly primitive right now, mainly stuff like Deep Research. That will change over time, but right now the hype far exceeds the reality.
The problem with all of these AI specific workflow engines is they are not durable, so they are process local, suffer crashes, cannot resume, don't have good visibility or distribution, etc. They often only allow limited orchestration instead of code freedom, only one language, etc
So what's the solution?
Agentic systems can be simply the LLM + prompting + tools[1]. LLMs are more than capable (especially chain-of thought models) to breakdown problems into steps, analyze necessary tools to use and then executing the steps in sequence. All of this is done with the model in the driver seat.
I think the system described in the post need a different name. It's a traditional workflow system with an agent operating on individual tasks. Its more rigid in that the workflow is setup ahead of time. Typical agentic systems are largely undefined or defined via prompting. For some use cases this rigidity is a feature.
[1 https://docs.anthropic.com/en/docs/build-with-claude/tool-us...
Sort of, kind of. It's still a directed graph. Dynamically generated graph, but still a graph. Your prompted LLM is the decision/dispatch block. When the model decides to call a tool, that's going from the decision node to another node. The tool usually isn't another LLM call, but nothing stops it from being one.
The "traditional workflow" exists because even with best prompting, LLMs don't always stick to the expected plan. It's gotten better than it used to, so people are more willing to put the model in the driving seat. A fixed "ahead of time" workflow is still important for businesses powering products with LLMs, as they put up a facade of simplicity in front of the LLM agentic graph, and strongly prefer for it to have bounded runtime and costs.
(The other thing is that, in general, it's trickier to reason about code flow generated at runtime.)
Corporate buzzwords have co-opted "Agent" to describe workflows with an LLM in the loop. While these can be represented as graphs, I'm not convinced "Agent" is the right term, even if they exhibit agentic behavior. The key distinction is that workflows define specific rules and processes, whereas a true agent wouldn’t rely on a predetermined graph—it would simply be given a task in natural language.
You're right that reasoning about runtime is difficult for true agents due to their non-deterministic nature, but different groups are chipping away at the problem.
When you work with agentic LLMs you should worry about prompt chaining, parallel execution, deciding points, loops and more of these complex decisions.
People who didn’t know what’s in first article shouldn’t use Pocketflow and go with N8N or even Zapier.
This still returns a string. You need to explicitly program the branch to the right function. For example, check out how OpenAI Agents, released a week ago, rely on a workflow: https://github.com/openai/openai-agents-python/blob/48ff99bb...
Not sure about Cursor you mentioned as its agent is not open sourced.
The whole workflow and the Runner class is for one agent.
Check out this line: https://github.com/openai/openai-agents-python/blob/48ff99bb...
A single `run_agent` is implemented based on the Runner class and workflow. So usually the workflow is for one agent (unless there is handoff).
> An agent is an AI model configured with instructions, tools, guardrails, handoffs and more.
Agents can hand off to other agents, but even the hand-off is decided by the agent itself, not a pre-defined orchestration.
OpenAI Agents: for the workflow logic: https://github.com/openai/openai-agents-python/blob/48ff99bb...
Pydantic Agents: organizes steps in a graph: https://github.com/pydantic/pydantic-ai/blob/4c0f384a0626299...
Langchain: demonstrates the loop structure: https://github.com/langchain-ai/langchain/blob/4d1d726e61ed5...
If all the hype has been confusing, this guide shows how they actually work under the hood, with simple examples. Check it out!
https://zacharyhuang.substack.com/p/llm-agent-internal-as-a-...
It would be interesting to dig deeper into the "thinking" part: how does an LLM know what it doesn't know / how to fight hallucinations in this context?
Forget about boxes and deterministic control and start thinking of error tolerance and recovery. That is what agents are all about.
To me "workflow" is just what agent means: the rules under which an automated action occurs. Without some central concept "agent" just a magic wand that does stuff that may or may not be what you want it to do. If we can't use state machines at all I'm just going to go out and say LLMs are a dead end. State machines are the bread and butter of reliable software.
> Forget about boxes and deterministic control and start thinking of error tolerance and recovery.
First you'd have to define what an error even is. Then you're just writing deterministic software again (a workflow), just with less confidence. Nice for stuff with low risk and confidence to begin with (eg semantic analysis etc whose error tends to wash out in aggregate), but not for stuff acting on my behalf.
LLMs are cool bits of software, but I can't say I see much use for "agents" whose behavior is not well-defined and whose non-determinism is formally bounded.
Your point is moot since many of these modern workflows already use LLMs as gating functions to determine the next steps.
It’s a different way of approaching problems, and while the future is uncertain, LLMs have moved beyond being just "cool software" to becoming genuinely useful in specific domains.
Anyway, LLMs will remain at "cool software" like other niche-specific patterns until I see something general emerge. You'd have to pitch LLMs pretty savvily to show it as a clear value-add. Engineers are extremely expensive, so LLMs need to have a very low error rate to be integrated into the revenue-path of a product to not incur higher costs or a lower-quality service. I still see text- and code-generation for immediate consumption by a human (or possible classification to be reviewed by a human) as the only viable uses cases today. It's just way too easy to manipulate them with standard english.
In job orchestration systems, workflows are structured sequences of tasks that define how data moves and transforms over time. Workflows are typically defined as Directed Acyclic Graphs (DAGs) but they don't have to be. I don't believe I am referring to anything more specific than how orchestration systems generally use them. LLM-based agents shift the focus from rigidly defined transitions to adaptable problem-solving mechanisms. They don’t replace state machines entirely but introduce a layer where strict determinism isn’t always necessary or even desirable.
> Anyway, LLMs will remain at "cool software" like other niche-specific patterns until I see something general emerge. You'd have to pitch LLMs pretty savvily to show it as a clear value-add. Engineers are extremely expensive, so LLMs need to have a very low error rate to be integrated into the revenue-path of a product to not incur higher costs or a lower-quality service. I still see text- and code-generation for immediate consumption by a human (or possible classification to be reviewed by a human) as the only viable uses cases today. It's just way too easy to manipulate them with standard English.
I get the skepticism, especially about error rates and reliability. But the “cool software” label underestimates where this is heading. There’s already evidence of LLMs being useful beyond text/code-gen (e.g., structured reasoning in research, RAG-enhanced search, or dynamically adapting workflows based on complex input). The real shift isn’t just about automation but about adaptive automation, where LLMs reduce the need for brittle, predefined paths.
Of course, the general-use case is still evolving, and I agree that direct, high-stakes automation remains a challenge. But dismissing LLM-driven agents as just niche tools ignores their growing role in augmenting traditional software paradigms.
> This tutorial is focusing on the low-level internals of how agents are implemented
We have very different definitions of what "low-level" means. Exact opposites in fact. "Low-level" means in the inner workings. Like a low-level language is assembly (some consider C low-level but this is debatable), whereas Python would be high-level.I don't think this tutorial is "near the metal" of LLMs nor do I think it should be considering it is aimed at "Dummies". Low-level would really need to get into the inner workings of the processing, probing agents, and getting into the weeds.
Does it really matter if you can understand them? waiting for strongly-opinionated engineers to finish their pedantic spiels (...even when they're wrong or there is no obvious standard of correctness) when everyone already understands each other is one of the most miserable part of being in this industry.
I—and I emphatically don't include the above poster in this view as it takes continual & repeated behavior to accrue such judgement—see this as a small tantrum, essentially, for people who never learned to regulate their emotions in professional spaces. I don't understand why this sort of bickering is considered acceptable behavior in the workplace or adjacent spaces. It's rude, arrogant, trivially avoidable with slight change in tone and rhetoric, and it makes you look like an asshole if you're not 100% right and approach it in good humor.
Why do you see this among engineers frequently? Well because it's the job of an expert to be concerned with nuance and details. The low-level in fact. This requires a high precision in communication too. The back and forth you see as bickering also ends up getting those details communicated. The reason being is that much of what's being intended is implicit. So the other approach is to use a lot of words. Unfortunately when you do that you are often ignored.
Although I personally don't think the graph implementation for agents is necessarily as established or widely standardized, it's helpful to know about why such an implementation was chosen and how it works.
> the inner workings of the processing, probing agents, and getting into the weeds
These feel to me like empty words... "inner workings of the processing"? You can say that about anything.
> You can say that about anything.
That is true. But it is also true that you can approach any topic from low-level or high-level. So I'm not sure I get your point here.> How they interpret one another and respond.
That sounds like it just falls back to "how LLMs work". It's the wrong level of abstraction in this case, because it's one level down from the topic being discussed here.
> because it's one level down
So we're in agreement?Aren't we after the "low-level"? That's this whole conversation... yes, it is a level down, that's my whole point. Just as my original analogy with assembly being a level down from C. Working at the metal, as they say. In the weeds.
I honestly don't know how to respond because I'm saying "this is too high-level" and you're arguing "you're too low-level". I'm sorry, but when you do stuff at the low-level you in fact have to crouch down and put your face to the ground. The lower the better. You're trying to see something very small, we're not trying to observe mountains here
The original purpose is to help people understand how the inner agent framework is internally implemented, like those:
OpenAI Agents: https://github.com/openai/openai-agents-python/blob/48ff99bb... Pydantic Agents: https://github.com/pydantic/pydantic-ai/blob/4c0f384a0626299... Langchain: https://github.com/langchain-ai/langchain/blob/4d1d726e61ed5... LangGraph: https://github.com/langchain-ai/langgraph/blob/24f7d7c4399e2...
In the case of LLMs knowing it does boil down to matrix multiplication is insightful and useful because now you know what kind of hardware is best suited to executing a model.
What is actually not insightful or useful is believing LLMs are AGI or conscious.
Then again, I don't think anyone who can follow this article believed that LLMs were conscious to begin with, so I'm not sure what your point is. You're preaching on behalf of a demographic that won't read this article to begin with, and presumably the people who are can see how useless, distracting, and unproductive this reductionism is.
Pursue the hypothesis? Sure. But belief is a different beast entirely. It's not even clear AGI is a meaningful concept yet, and I'd bet my life savings everyone reading this comment in 2025 will die before it's answered. Skepticism is the barometer.
Our approach is so unrelated to any of the other hyped up stuff. We have not written a single line of ML, it has been all math & physics until now.
> this reductivism is not exactly insightful.
I really agree with this. I think it has been bad for a lot of people's understanding when they have trivialized ML to "just matrix multiplications" (or GMMs). This does not help differentiate AI/ML from... well.. really any data processing algorithm. Matrices are fairly general structures in mathematics and you can formulate almost anything as one. In fact, this is a very common way to parallelize or speed up programs (e.g. numpy vectorization).We wouldn't call least squares, even a bunch of them, ML nor would we call rasterization or ray tracing. Fundamentally all these things are "just GMMs". It also does not make apparent any differentiation from important distinctions like Linear Networks, CNNs, or Transformers. It brushes off a key element, the activation function, which is necessary for neural nets to do non-linear transformations! And what about the residual units? These are one of the most important factors in enabling Deep Learning. They're "just" addition. So we say it's all just matrix addition since we can convert multiplication to addition?
There is such a thing as oversimplification and I worry that we have hyper-optimized (over-optimized) for this. So I agree, saying they just "boil down to matrix multiplications" is fundamentally misleading. It provides no insight and only serves to mislead people.
For example, for software projects, the algorithmic level is where most people focus because that’s typically where the biggest optimizations happen. But in some critical scenarios, you have to peel back those layers—down to how the hardware or compiler works—to make the best choices (like picking the right CPU/GPU).
Likewise, with agents, you can work with high-level abstractions for most applications. But if you need to optimize or compare different approaches (tool use vs. MCP vs. prompt-based, for instance), you have to dig deeper into how they’re actually implemented.
The graph rendering is simply for illustrative purposes and most to cater for people who think in terms of graphs but the underlaying mechanics are not nodes and edges and a flow that goes from one to the next.
tl;dr from Anthropic:
> Workflows are systems where LLMs and tools are orchestrated through predefined code paths.
> Agents, on the other hand, are systems where LLMs dynamically direct their own processes and tool usage, maintaining control over how they accomplish tasks.
Most “agents” today fall into the workflow category.
The foundation model makers are pushing their new models to be better at the second, “pure” agent, approach.
In practice, I’m not sure how effective the “pure” approach will work for most LLM-assisted tasks.
I liken it to a fresh intern who shows up with amnesia every day.
Even if you tell them what they did yesterday, they’re still liable to take a different path for today’s work.
My hunch is that we’ll see an evolution of this terminology, and agents of the future will still have some “guiderails” (note: not necessarily _guard_rails), that makes their behavior more predictable over long horizons.
[0]https://www.anthropic.com/engineering/building-effective-age...
The workflow can vary. For example, it can involve multiple LLM calls chained together without branching or looping. It can also be built using a graph.
I know the terms "graph" and "workflow" can be a bit confusing. It’s like we have a low-level 'cache' at the CPU level and then a high-level 'cache' in software.
In a sense there’s still a graph of execution, but the graph isn’t known until the “agent” runs and decides what tools to use, in what order, and for how long.
There is no scaffold, just LLM + MCP (or w/e) in a loop.