At least for Codex, the agent runs commands inside an OS-provided sandbox (Seatbelt on macOS, and other stuff on other platforms). It does not end up "making the agent mostly useless".
This is controlled by the `--sandbox` and `--ask-for-approval` arguments to `codex`.
I’m not sure why everyone seems to have forgotten about Unix permissions, proper sandboxing, jails, VMs etc when building agents.
Even just running the agent as a different user with minimal permissions and jailed into its home directory would be simple and easy enough.
- Minimal, configurable context - including system prompts [2]
- Minimal and extensible tools; for example, todo tasks extension [3]
- No built-in MCP support; extensions exist [4]. I'd rather use mcporter [5]
Full control over context is a high-leverage capability. If you're aware of the many limitations of context on performance (in-context retrieval limits [6], context rot [7], contextual drift [8], etc.), you'd truly appreciate Pi lets you fine-tune the WHOLE context for optimal performance.
It's clearly not for everyone, but I can see how powerful it can be.
---
[1] https://lucumr.pocoo.org/2026/1/31/pi/
[2] https://github.com/badlogic/pi-mono/tree/main/packages/codin...
[3] https://github.com/mitsuhiko/agent-stuff/blob/main/pi-extens...
[4] https://github.com/nicobailon/pi-mcp-adapter
[5] https://github.com/steipete/mcporter
[6] https://github.com/gkamradt/LLMTest_NeedleInAHaystack
The Claude sub is the only think keeping me on Claude Code. It's not as janky as it used to be, but the hooks and context management support are still fairly superficial.
I only wish the author changed his stance on vendor extensions: https://github.com/badlogic/pi-mono/discussions/254
This makes it even more baffling why anthropic went with bun, a runtime without any sandboxing or security architecture and will rely in apple seatbelt alone?
If your fear is exfiltration of your browser sessions and your computer joining a botnet, or accidental deletion of your data, then a sandbox helps.
If your fear is the llm exfiltrating code you gave it access to then a sandbox is not enough.
I'm personally more worried about the former.
I hadn't realized that Pi is the agent harness used by OpenClaw.
The YOLO mode is also good, but having a small ‘baby setting mode’ that’s not full-blown system access would make sense for basic security. Just a sensible layer of "pls don't blow my machine" without killing the freedom :)
I built on ADK (Agent Development Kit), which comes with many of the features discussed in the post.
Building a full, custom agent setup is surprisingly easy and a great learning experience for this transformational technology. Getting into instruction and tool crafting was where I found the most ROI.
edit: referring to Anthropic and the like
Also data, see https://news.ycombinator.com/item?id=46637328
(this is also why all the labs, including some chinese ones, are subsidising / metoo-ing coding agents)
One thing I do find is that subagents are helpful for performance -- offloading tasks to smaller models (gpt-oss specifically for me) gets data to the bigger model quicker.
You can sandbox off the data.
Small and observable is excellent.
Letting your agent read traces of other sessions is an interesting method of context trimming.
Especially, "always Yolo" and "no background tasks". The LLM can manage Unix processes just fine with bash (e.g. ps, lsof, kill), and if you want you can remind it to use systemd, and it will. (It even does it without rolling it's eyes, which I normally do when forced to deal with systemd.)
Something he didn't mention is git: talk to your agent a commit at a time. Recently I had a colleague check in his minimal, broken PoC on a new branch with the commit message "work in progress". We pointed the agent at the branch and said, "finish the feature we started" and it nailed it in one shot. No context whatsoever other than "draw the rest of the f'ing owl" and it just.... did it. Fascinating.