And for most languages, they shouldn't even be operating on strings, they should be operating on token streams and ASTs
Also, LLMs aren't trained on ASTs, they're trained on strings -- just like programmers.
In theory, there is a type that describes what will parse, but it’s implicit.
Even though efficient use of CLI tools might make the token burn not too bad, the models will still need to spent extra effort thinking about references in comments, readmes, and method overloading.
https://scalameta.org/metals/blog/2025/05/13/strontium/#mcp-...
After being pleasantly surprised at how well an AI did at a task I asked of it a few months ago that I thought was much more complicated, I was amused at how badly it did when I asked it to refactor some code to change variable names in one single source file to match a particular coding standard. After doing the work that a good junior developer might have needed a couple of days for, it failed hard at refactoring, working more at the level of a high school freshman.
* Way way way more code in the training set.
* Code is almost always a more concise representation.
There has been work in the past training graph neural networks or transformers that get AST edge information. It seems like some sort of breakthrough (and tons of $) would be needed for those approaches to have any chance of surpassing leading LLMs.
Experimentally having agents use ast-grep seems to work pretty well. So, still representing a everything as code, but using a syntax aware search replace tool.
On the topic of LLMs understanding ASTs, they are also quite good at this. I've done a bunch of applications where you tell an LLM a novel grammar it's never seen before _in the system prompt_ and that plus a few translation examples is usually all it takes for it to learn fairly complex grammars. Combine that with a feedback loop between the LLM and a compiler for the grammar where you don't let it produce invalid sentences and when it does you just feed it back the compiler error, and you get a pretty robust system that can translate user input into valid sentences in an arbitrary grammar.
Why not convert the training code to AST?
Specifically, I'd love to see widespread structured output support for context free grammars. You get a few here and there - vLLM for example. Most LLMs as a service only support JSON output which is better than nothing but doesn't cover this case at all.
Something with semantic analysis - scope informed output, would be a cherry on the top, but while technically possible, I don't see arriving anytime soon. But hey - maybe an opportunity for product differentiation.
opencode comes to mind off the top of my head
it still tends to do a lot of grep and sed though.
In my experience, it’s actually quite the opposite.
By giving an LLM a set of tools, 30 in the Playwright case from the article, you’re essentially restricting what it can do.
In this sense, MCP is more of a guardrail/sandbox for an LLM, rather than a superpower (you must choose one of these Stripe commands!).
This is good for some cases, where you want your “agent”[1] to have exactly some subset of tools, similar to a line worker or specialist.
However it’s not so great when you’re using the LLM as a companion/pair programmer for some task, where you want its output to be truly unbounded.
[0]https://modelcontextprotocol.io/docs/getting-started/intro
[1]For these cases you probably shouldn’t use MCP, but instead define tools explicitly within one context.
I even use it to troubleshoot issues with my linux laptop that in the past I would totally have done myself, but can't be bothered. Which led to the most relatable AI moment I have encountered: "This is frustrating" - Claude Code thought, after 6 tries in a row to get my bluetooth headset working.
I imagine you run into something similar with bash - while bash is a single "tool" for an agent, a similar decision still need to be made about the many CLI tools that are available from enabling bash.
Also, there are MCP servers that allow running any command in your terminal, including apt install / brew install etc.
[1] https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/
Maybe “fettered” is better?
Compared to giving the LLM full access to your machine (direct shell, Python executable as in the article), I still think it’s right way to frame MCP.
We should view the whole LLM <> computer interface as untrusted, until proven otherwise.
MCP can theoretically provide gated access to external resources, unfortunately many of them provide direct access to your machine and/or the internet, making them ripe as an attack vector.
ask> what all tools u have?
I have access to the following tools:
1 code_search: Searches for a pattern in the codebase using ripgrep.
2 extract_code: Extracts a portion of code from a file based on a line range.
3 file_operations: Performs various file operations like ls, tree, find, diff, date, mkdir, create_file.
4 find_all_references: Finds all references to a symbol (function, class, etc.) from the AST index.
5 get_definition: Gets the definition of a symbol (function, class, etc.) from the AST index.
6 get_library_docs: Gets documentation for a library given its unique ID.
7 rename_symbol: Renames a symbol using VS Code. 8 resolve_library_id: Resolves a library name to a unique library ID.
what do i need MCP and other agents for? This is solving most of my problems already.
For your use cases, maybe you don't. Not every use case for an LLM is identical to your coding usage pattern.
MCP isn't a prompt (though prompts are a resource an MCP server can provide). An MCP client that is also the direct LLM manager toolchain has to map the data from MCP servers tool/prompt/resource definition into the prompt, and it usually does so using prompt templates that are defined for each model, usually by the model provider. So the meaningful part of having a “really well-structured prompt” part isn't from MCP at all, just something that already exists that the MCP client leverages.
500b sounds like a value prop in those regards.
At some point the hope for both is that they result in a net benefit society.
Started on working on an alternative protocol, which lets agents call native endpoints directly (HTTP/CLI/WebSocket) via “manuals” and “providers,” instead of spinning up a bespoke wrapper server: https://github.com/universal-tool-calling-protocol/python-ut...
even connects to MCP servers
if you take a look, would love your thoughts
The primary differentiator is that MCP includes endpoint discovery. You tell the LLM about the general location of the MCP tool, and it can figure out what capabilities that tool offers immediately. And when the tool updates, the LLM instantly re-learns the updated capability.
The rest of it is needlessly complicated (IMO) and could just be a bog standard HTTP API. And this is what every MCP server I've encountered so far actually does, I haven't seen anyone use the various SSE functionality and whatnot.
MCP v.01 (current) is both a step in the right direction (capability discovery) and an awkward misstep on what should have been the easy part (the API structure itself)
The actual thing that's different is that an OpenAPI spec is meant to be an exhaustive list of every endpoint and every parameter you could ever use. Whereas an MCP server, as a proxy to an API, tends to offer a curated set of tools and might even compose multiple API calls into a single tool.
Everyone in this thread is aware that LLMs aren't performing our jobs.
MCP discoverability is designed to be ingested by an LLM, rather than used to implement an API client like OAI specs. MCP tools describe themselves to the LLM in terms of what they can do, rather than what their API contract is.
It also removes the responsibility of having to inject the correct version of the spec into the prompt from the user, and moves it into the protocol.
Because the LLM can't "just connect" to an existing API endpoint. It can produce input parameters for an API call, but you still need to implement the calling code. Implementing calling code for every API you want to offer the LLM is at minimum very annoying and often error-prone.
MCP provides a consistent calling implementation that only needs to be written once.
(without needing an MCP server that adds extra security vulnerabilities)
I'll spare the audience the implied XKCD link
EDIT: This has since been fixed in link, so it is outdated.
Always consider your audience, but for most non-casual writing it’s a good default for a variety of reasons.
I can attest the abbr is also mobile friendly, although I am for sure open to each browser doing its own UI hinting that a long-press is available for the term
[1] https://issues.chromium.org/issues/337222647 -> https://issues.chromium.org/issues/41130053
<abbr> is not what you seem to think it is. But the "typical use cases" section of your link does explain what it's actually for.
> Spelling out the acronym or abbreviation in full the first time it is used on a page is beneficial for helping people understand it, especially if the content is technical or industry jargon.
> Only include a title if expanding the abbreviation or acronym in the text is not possible. Having a difference between the announced word or phrase and what is displayed on the screen, especially if it's technical jargon the reader may not be familiar with, can be jarring.
This is like complaining that HTTP or API isn't explained.
The balance isn’t really clear cut. On one hand, MCP isn’t ubiquitous like, say, DNS or ancient like BSD. On the other, technical audiences can be expected to look up terms that are new to them. The point of a headline is to offer a terse summary, not an explanation, and adding three full words makes it less useful. However, that summary isn’t particularly useful if readers don’t know what the hell you’re talking about, either, and using jargon nearly guarantees that.
I think it’s just one of those damned-if-you-do/don’t situations.
I have no idea what any of the abbreviations in stock market news mean and those stock market people won't know their CLIs from their APIs and LLMs, but that doesn't mean the articles are bad.
There is a link to a previous post by the same author (within the first ten words even!), which contains the context you're looking for.
https://arxiv.org/html/2506.11180v1
SCADA systems got famous, because they previously required STUXNET to be hacked. In the future you can just vibe hack them.
It’s pretty well known by now what MCP stands for, unless you were referring to something else…
Minecraft Coder Pack
https://minecraft.fandom.com/wiki/Tutorials/Programs_and_edi...
Also... that's some dedication. A user dedicated to a single comment.
The first case doesn't matter at all if you already know what an MCP actually is.
At least for the task of understanding the article.
https://github.com/scosman/hooks_mcp
The interactive lldb session here is super cool for deeper debugging. For security, containers seem like the solution - sketch.dev is my fav take on containerizing agents at the moment.
It’s talking about passing Python code in that would have a Python interpreter tool.
Even if you had guardrails setup that seems a little chancery, but hey this is the time of development evolution where we’re letting AI write code anyway, so why not give other people remote code execution access, because fuck it all.
Give them an eval() with a couple of useful libraries (say, treesitter), and they are able not only to use it well, but to write their own "tools" (functions) and save massively on tokens.
They also allow you to build "ephemeral" apps, because who wants to wait for tokens to stream and a LLM to interpret the result when you could do most tasks with a normal UI, only jumping into the LLM when fuzziness is required.
Most of my work on this is sadly private right now, but here's a few repos github.com/go-go-golems/jesus https://github.com/go-go-golems/go-go-goja that are the foundation.
I wrote about how to do it with Guix: https://200ok.ch/posts/2025-05-23_sandboxing_ai_tools:_how_g...
Since then, I have switched to using Bubblewrap: https://github.com/munen/dotfiles/blob/master/bin/bin/bubble...
https://github.com/CharlieDigital/runjs
Lets the LLM safely generate and execute whatever code it needs. Bounded by statement count, memory limits, and runtime limits.
It has a built in secrets manager API (so generated code can make use of remote APIs) can, HTTP fetch analogue, JSONPath for JSON handling, and Polly for HTTP request resiliency.
This is because defining a formal system, that can do everything MCP promises to enable, is a logical impossibility.
Put GPT5 into agent mode then give it that URL and the token 'linkedinPROMO1' and once it loads the tools tell it to use curl in a terminal (it's faster) and then run the random tool.
This is authenticated at the moment with that token, plus bearer tokens, but I've got the new auth system up and its working. I still have to do the integration with all the other services (the website, auth, AHP and the crawler and OCR engine), so will be a while before all that's done.
Would be nice if there was a way for agents to work with MCPs as code, preview or debug the data flowing through them. At the moment it all seems not a mature enough solution and Id rather mount a Python sandbox with API keys to what it needs than connect an MCP tool on my own machine.
Although I could easily imagine the external robot(?) being a "hold my beer" to the interview cheat arms race
To extra ruin the joke, the 96GB versions seem to be going for $24,000 on ebay right now
That's the C in the protocol.
Sure you can add a session key to the swagger api and expose it that way so that llm can continue their conversation, but it's going to be a fragile integration at best.
A MCP tied to the conversation state abstract all that away, for better or worse.
Sure in some cases it might be overkill and letting the assistant write & execute plain code might be best. There are plenty of silly MCP servers out there.
Two options (out of multiple):
- Have sub-agents with different subsets of tools. The main agent then delegates. - Have dedicated tools that let the main agent activate subsets of tools as needed.
an LLM natively knows bash and how to run things
MCP is forcing a weird set of non-normal rules that most of the writing of the web doesn't support. Most of the web writes a lot about bash and getting things done.
Maybe in a few years LLMs will "natively" understand them, but I see MCP more as a buzzword right now.
Most models that it is used with natively know what tools are (they are trained with particular prompt formats for the use of arbitrary tools), and the model never sees MCP at all, it just sees tool definitions, or tool responses, in the format it expects in prompts. MCPs are a way to communicate information about tools to the toolchain running the LLM, when the LLM sees information that came via MCP it is indistinguishable from tools that might be built into the toolchain or provided by another mechanism.
Or you can use fetch_record followed by a formatted code block of the name of a google search you want to perform.
The LLM will likely use bash and curl because it NATIVELY knows what it is and is capable of, while this other tool you have to feed it all these parameters that it is not used to.
I'm not saying go ahead and throw that in chatgpt, I'm talking from experience at our company using MCP vs bashable stuff, it keeps ignoring the other tools.
I'd be cautious inferring generalizations about behavior and then explanations of those generalizations from observation of a particular LLM used via a particular toolchain.
That said, that it does that in that environment is still an interesting observation.
Wow, you better be sure you have that Python environment locked down.
Fails and i've no idea why, meanwhile python code works without issues but i can't use that one as it conflicts with existing dependencies in aider, see: https://pastebin.com/TNpMRsb9 (working code after 5 failed attempts)
I am never gonna bother with this again, it can be built as a simple rest API, why we even need this ugly protocol?
From my experience context7 just does not work, or at least does not help. I did plenty of experiments with it and that approach just does not go anywhere with the tools and models available today.