First, MCP tools are sent on every request. If you look at the notion MCP the search tool description is basically a mini tutorial. This is going right into the context window. Given that in most cases MCP tool loading is all or nothing (unless you pre-select the tools by some other means) MCP in general will bloat your context significantly. I think I counted about 20 tools in GitHub Copilot VSCode extension recently. That's a lot!
Second, MCP tools are not compossible. When I call the notion search tool I get a dump of whatever they decide to return which might be a lot. The model has no means to decide how much data to process. You normally get a JSON data dump with many token-unfriendly data-points like identifiers, urls, etc. The CLI-based approach on the other hand is scriptable. Coding assistant will typically pipe the tool in jq or tail to process the data chunk by chunk because this is how they are trained these days.
If you want to use MCP in your agent, you need to bring in the MCP model and all of its baggage which is a lot. You need to handle oauth, handle tool loading and selection, reloading, etc.
The simpler solution is to have a single MCP server handling all of the things at system level and then have a tiny CLI that can call into the tools.
In the case of mcpshim (which I posted in another comment) the CLI communicates with the sever via a very simple unix socket using simple json. In fact, it is so simple that you can create a bash client in 5 lines of code.
This method is practically universal because most AI agents these days know how to use SKILLs. So the goal is to have more CLI tools. But instead of writing CLI for every service you can simply pivot on top of their existing MCP.
This solves the context problem in a very elegant way in my opinion.
This way, agents can either choose to execute tools directly (bringing output into context), or to run them via a script (or just by piping to jq), which allows for precise arithmetic calculations and further context debloating.
Makes you wonder whats the point of MCP
MCP is a thin toolcall auth layer that has to be there so that ChatGPT and claude.ai can "connect to your Slack", etc.
After I got the MCP working my case the performance difference was dramatic
It’s a convention.
That everyone follows.
I do agree that MCP context management should be better. Amazon kiro took a stab at that with powers
> Before your agent can do anything useful, it needs to know what tools are available. MCP’s answer is to dump the entire tool catalog into the conversation as JSON Schema. Every tool, every parameter, every option.
Because this simply isn't true anymore for the best clients, like Claude Code.
Similar to how Skills were designed[1] to be searchable without dumping everything into context, MCP tools can (and does in Claude Code) work the same way.
See https://www.anthropic.com/engineering/advanced-tool-use and https://x.com/trq212/status/2011523109871108570 and https://platform.claude.com/docs/en/agents-and-tools/tool-us...
[1] https://agentskills.io/specification#progressive-disclosure
Regardless, most MCPs are dumping. I know Cloudflare MCP is amazing but other 1000 useful MCPs are not.
The CLI approach definitely has practical benefits for token reduction. Not stuffing the entire schema into the runtime context is a clear win. But my main interest lies less in "token cost" and more in "how we structure the semantic space."
MCP is fundamentally a tool-level protocol. Existing paradigms like Skills already mitigate context bloat and selection overhead pretty well via tool discovery and progressive disclosure. So framing this purely as "MCP vs CLI" feels more like shifting the execution surface rather than a fundamental architectural shift.
The direction I'm exploring is a bit different. Instead of treating tools as the primary unit, what if we normalize the semantic primitives above them (e.g., "search," "read," "create")? Services would then just provide a projection of those semantics. This lets you compress the semantic space itself, expose it lazily, and only pull in the concrete tool/CLI/MCP adapters right at execution time.
You can arguably approximate this with Skills, but the current mental model is still heavily anchored to "tool descriptions"—it doesn't treat normalized semantics as first-class citizens. So while the CLI approach is an interesting optimization, I'm still on the fence about whether it's a real structural paradigm shift beyond just saving tokens.
Ultimately, shouldn't the core question be less about "how do we expose fewer tools," and more about "how do we layer and compress the semantic space the agent has to navigate?"
Trying to dictate the abstractions that should be used is not bitter lesson pilled.
If we can abstract the tools one layer further for ai, it might reduce the attention it needs to spend navigating them and leave more context window for actual reasoning
I do understand anthropic's Tool Search helps with mcp bloat, but it's limited only to claude.
CMCP currently supports codex and claude but PRs are welcome to add more clients.
[1]https://blog.cloudflare.com/code-mode-mcp/ [2]https://github.com/assimelha/cmcp
Awesome TUIs: https://github.com/rothgar/awesome-tuis
Awesome CLIs: https://github.com/agarrharr/awesome-cli-apps
Terminal Trove: https://terminaltrove.com/
I guess this is another one shows that the CLI and Unix is coming back in 2026.
I've also launched https://mcpshim.dev (https://github.com/mcpshim/mcpshim).
The unix way is the best way.
https://github.com/philschmid/mcp-cli
Edit: Turns out was https://github.com/steipete/mcporter noted elsewhere in the thread, but mcp-cli looks like a very similar thing.
Compared both
---
TL;DR CLIHUB compiles MCP servers into portable, self-contained binaries — think of it like a compiler. Best for distribution, CI, and environments where you can't run a daemon.
mcpshim is a runtime bridge — think of it like a local proxy. Best for developers juggling many MCP servers locally, especially when paired with LLM agents that benefit from persistent connections and lightweight aliases.
---
https://cdn.zappy.app/b908e63a442179801e406b01cf412433.png (table comparison)
---
My use cases are almost all 3rd party integrations.
Have you seen any improvements converting on MCPs that require persistency into CLI?
One important aspect of mcpshim which you might want to bring into clihub is the history idea. Imagine if the model wants to know what it did couple of days ago. It will be nice to have an answer for that if you record the tool calls in a file and then allow the agent to query the file.
This way you build all your MCPs into the system prompt, save the prompt to the AI provider, then use it without overpaying API costs.
The current "tools-on-demand" workarounds should be great for infrequent tools but the future will probably bring agents with dozens of tools that need them in context to flexibly many of them in the same context window. So we just need to make the context windows longer and make this capability cheaper to use.
One thing I have read recently is that when you make a tool call it forces the model to go back to the agent. The effect of this is that the agent then has to make another request with all of the prompt (include past messages), these will be "cached" tokens, but they're still expensive. So if you can amortize the tool calls by having the model either do many at once or chaining them with something like bash you'll be better off.
I suspect this might be why cursor likes writing bash scripts so much, simple shell commands are going to be very token heavy because of the frequency of interrupts.
MCP's token cost is the price of availability. The fix isn't to replace the protocol, it's to only activate the tools that matter for the current context. Claude's Skills already work this way -> lightweight descriptions loaded upfront, full definitions fetched on demand. That's essentially the same lazy-loading pattern CLIHub describes, just built into the model's native workflow.
But yeah, a concrete example is playwright-mcp vs playwright-cli: https://testcollab.com/blog/playwright-cli
That makes sense for some of the examples the described (e.g. a QA workflow asking the agent to take a screenshot and put it into a folder).
However, this is not true for an active dev workflow when you actually do want it to see that the elements are not lining up or are overlapping or not behaving correctly. So token savings are possible...if your use case doesn't require the bytes in context (which most active dev use cases probably do)*
I was actually thinking if I should support daemons just to support playwright. Now I don't have a use case for it
For other integrations, I first try to find an official or unofficial CLI tool (a wrapper around the API), and only then do I consider using MCP
https://jannikreinhard.com/2026/02/22/why-cli-tools-are-beat...
Even the smallest models are RL trained to use shell commands perfectly. Gemini 3 flash performs better with a cli with 20 commands vs 20+ tools in my testing.
cli also works well in terms of maintaining KV cache (changing tools mid say to improve model performance suffers from kv cache vs cli —help command only showing manual for specific command in append only fashion)
Writing your tools as unix like cli also has a nice benefit of model being able to pipe multiple commands together. In the case of browser, i wrote mini-browser which frontier models use much better than explicit tools to control browser because they can compose a giant command sequence to one shot task.
LLM only know `linear` tool exists.
I ask "get me the comments in the last issue"
Next call LLM does is
`linear --help 2>&1 | grep -i -E "search|list.issue|get.issue")` then `linear list-issues --raw '{"limit": 3}' -o json 2>&1 | head -80)` then `linear list-comments --issue-id "abc1ceae-aaaa-bbbb-9aaa-6bef0325ebd0" 2>&1)`
So even the --help has filtering by default. Current models are pretty good
CLIHub
- written in go
- zero-dependency binaries
- cross-compilation built-in (works on all platforms)
- supports OAuth2 w/ PKCE, S2S, Google SA, API key, basic, bearer. Can be extended further
MCPorter
- TS
- huge dependency list
- runtime dependency on bun
- Auth supports OAuth + basic token
- Has many features like SDK, daemons (for certain MCPs), auto config discovery etc.
MCPorter is more complete tbh. Has many nice to have features for advanced use cases.
My use case is simple. Does it generate a CLI that works? Mainly oauth is the blocker since that logic needs to be custom implemented to the CLI.
ALSO... the permission boundary is clearer. You can whitelist commands, flags, working dir... it becomes manageable.
HOWEVER... packaging still matters. A “small” CLI that pulls in a giant runtime kills the benefit.
I want the discipline of small protocol plus big cache. Cheap models can summarize what they did and avoid full context in every step...
Maybe MCP can help segregate auto-approve vs ask more cleanly, but I don't actually see that being done.
But tbh there's no reason agents can't abstract this out. As long as a CLI has a --help or similar (which 99% do) with a description of how to login, then it can figure it out for you. This does take context and tool calls though so not hugely efficient.
The real killer is the input tokens on each step. If you have 100k tokens in the conversation, and the LLM calls an MCP tool, the output and the existing conversation is sent back. So now you've input 200k tokens to the LLM.
Now imagine 10 tool calls per user message - or 50. You're sending 1-5M input tokens, not because the MCP definitions or tool responses are large, but because at each step, you have to send the whole conversation again.
"what about caching" - Only 90% savings, also cache misses are surprisingly common (we see as low as 40% cache hit rate)
"MCP definitions are still large" - not compared to any normal conversation. Also these get cached
We've seen the biggest savings by batching/parallelizing tool calls. I suspect the future of LLM tool usage will have a different architecture, but CLI doesn't solve the problems either.
[0] https://ziva.sh, it's an agent specialized for Godot[1]
The article misses imo the main benefit of CLIs vs _current_ MCP implementations [1], the fact that they can be chained together with some sort of scripting by the agent.
Imagine you want to sum the total of say 150 order IDs (and the API behind the scenes only allows one ID per API calls).
With MCP the agent would have to do 150 tool calls and explode your context.
With CLIs the agent can write a for loop in whatever scripting language it needs, parse out the order value and sum, _in one tool call_. This would be maybe 500 tokens total, probably 1% of trying to do it with MCP.
[1] There is actually no reason that MCP couldn't be composed like this, the AI harnesses could provide a code execution environment with the MCPs exposed somehow. But noone does it ATM AFIAK. Sort of a MCP to "method" shim in a sandbox.
for long agent sessions, I would expect a very high cache hit rate unless you're editing the system prompt, tools, or history between turns, or some turns take longer than the cache timeout
But MCP today isn’t ideal. I think we need to have some catalogs where the agents can fetch more information about MCP services instead of filling the context with not relevant noise.
The point is push vs pull.
I can see a future where software is built with a CLI interface underneath the (optional) GUI, letting an LLM hook directly into the underlying "business" logic to drive the application. Since LLM's are basically text machines, we just need somebody to invent a text-driven interface for them to use...oh wait!
Imagine booking a flight - the LLM connects to whatever booking software, pulls a list of commands, issues commands to the software, and then displays the output to the user in some fashion. It's basically just one big language translation task, something an LLM is best at, but you still have the guardrails of the CLI tool itself instead of having the LLM generate arbitrary code.
Another benefit is that the CLI output is introspectable. You can trace everything the LLM is doing if you want, as well as validate its commands if necessary (I want to check before it uses my credit card). You don't get this if it's generating a python script to hit some API.
Even before LLM's developers have been writing GUI applications as basically a CLI + GUI for testability, separation of concerns etc. Hopefully that will become more common.
Also this article was obviously AI generated. I'm not going to share my feelings about that.
https://github.com/thellimist/thellimist.github.io/blob/mast...
https://github.com/thellimist/thellimist.github.io/blob/mast...
I dump a voice message, then blog comes out. Then I modify a bunch of things, and iterate 1-2 hours to get it right
Official MCPs are trusted. Official MCPs CLIs are trusted.
I know I saw something about the Next.js devs experimenting with just dumping an entire index of doc files into AGENTS.md and it being used significantly more by Claude than any skills/tool call stuff.
It still needs to do discovery (--help etc.), always gets the job done
The biggest difference is state, but that's also kind of easy from CLI, the tool just have to store it on disk, not in process memory.
Edit: took out because I think that was something different.
I didn't release the website yet. I'll remove the link