FilterHN

What I learned building an opinionated and minimal coding agent

121 points

by SatvikBeri

4 hours ago

| past

| 15 comments

| mariozechner.at

| HN

▲

valleyer

2 hours ago

[-]

> If you look at the security measures in other coding agents, they're mostly security theater. As soon as your agent can write code and run code, it's pretty much game over.

At least for Codex, the agent runs commands inside an OS-provided sandbox (Seatbelt on macOS, and other stuff on other platforms). It does not end up "making the agent mostly useless".

▲

beacon294

2 hours ago

[-]

My codex just uses python to write files around the sandbox when I ask it to patch a sdk outside its path.

▲

valleyer

2 minutes ago

[-]

Is it asking you permission to run that python command? If so, then that's expected: commands you approve run without the sandbox. The point is that Codex can (by default) run commands on its own, without approval (like running `make` on the project) but they're subject to the imposed OS sandbox.

This is controlled by the `--sandbox` and `--ask-for-approval` arguments to `codex`.

▲

Sharlin

1 hour ago

[-]

It's definitely not a sandbox if you can just "use python to write files" outside of it o_O

▲

chongli

9 minutes ago

[-]

Hence the article’s security theatre remark.

I’m not sure why everyone seems to have forgotten about Unix permissions, proper sandboxing, jails, VMs etc when building agents.

Even just running the agent as a different user with minimal permissions and jailed into its home directory would be simple and easy enough.

▲

maleldil

2 hours ago

[-]

Does Codex randomly decide to disable the sandbox like Claude Code does?

▲

mustaphah

1 hour ago

[-]

I've seen a couple of power users already switching to Pi [1], and I'm considering that too. The premise is very appealing:

- Minimal, configurable context - including system prompts [2]

- Minimal and extensible tools; for example, todo tasks extension [3]

- No built-in MCP support; extensions exist [4]. I'd rather use mcporter [5]

Full control over context is a high-leverage capability. If you're aware of the many limitations of context on performance (in-context retrieval limits [6], context rot [7], contextual drift [8], etc.), you'd truly appreciate Pi lets you fine-tune the WHOLE context for optimal performance.

It's clearly not for everyone, but I can see how powerful it can be.

---

[1] https://lucumr.pocoo.org/2026/1/31/pi/

[2] https://github.com/badlogic/pi-mono/tree/main/packages/codin...

[3] https://github.com/mitsuhiko/agent-stuff/blob/main/pi-extens...

[4] https://github.com/nicobailon/pi-mcp-adapter

[5] https://github.com/steipete/mcporter

[6] https://github.com/gkamradt/LLMTest_NeedleInAHaystack

[7] https://research.trychroma.com/context-rot

[8] https://arxiv.org/html/2601.20834v1

▲

CuriouslyC

7 minutes ago

[-]

Pi is the part of moltXYZ that should have gone viral. Armin is way ahead of the curve here.

The Claude sub is the only think keeping me on Claude Code. It's not as janky as it used to be, but the hooks and context management support are still fairly superficial.

▲

zby

2 hours ago

[-]

Pi has probably the best architecture and being written in Javascript it is well positioned to use the browser sandbox architecture that I think is the future for ai agents.

I only wish the author changed his stance on vendor extensions: https://github.com/badlogic/pi-mono/discussions/254

▲

brimtown

1 hour ago

[-]

“standardize the intersection, expose the union” is a great phrase I hadn’t heard articulated before

▲

jFriedensreich

2 hours ago

[-]

I dont know how to feel about being the only one refusing to run yolo mode until the tooling is there, which is still about 6 months away for my setup. Am I years behind everyone else by then? You can get pretty far without completely giving in. Agents really dont need to execute that many arbitrary commands. linting, search, edit, web access should all be bespoke tools integrated into the permission and sandbox system. agents should not even be allowed to start and stop applications that support dev mode, they edit files, can test and get the logs what else would they need to do? especially as the amount of external dependencies that make sense goes to a handful you can without headache approve every new one. If your runtime supports sandboxing and permissions like deno or workerd this adds an initial layer of defense.

This makes it even more baffling why anthropic went with bun, a runtime without any sandboxing or security architecture and will rely in apple seatbelt alone?

▲

WhyNotHugo

2 hours ago

[-]

You use YOLO mode inside some sandbox (VM, container). Give the container only access to the necessary resources.

▲

jFriedensreich

7 minutes ago

[-]

apart from nearly no one using vms as far as i can tell, even if they were, a vm does not magically solve all the issues, its just a part of the needed tools.

▲

jdkoeck

45 minutes ago

[-]

But even then, the agent can still exfiltrate anything from the sandbox, using curl. Sandboxing is not enough when you deal agents that can run arbitrary commands.

▲

TheDong

10 minutes ago

[-]

It depends on what you're trying to prevent.

If your fear is exfiltration of your browser sessions and your computer joining a botnet, or accidental deletion of your data, then a sandbox helps.

If your fear is the llm exfiltrating code you gave it access to then a sandbox is not enough.

I'm personally more worried about the former.

▲

philipp-gayret

29 minutes ago

[-]

That depends on how you configure or implement your sandbox. If you let it have internet access as part of the sandbox, then yes, but that is your own choice.

▲

simonw

3 hours ago

[-]

Armin Ronacher wrote a good piece about why he uses Pi here: https://lucumr.pocoo.org/2026/1/31/pi/

I hadn't realized that Pi is the agent harness used by OpenClaw.

▲

yuzhun

45 minutes ago

[-]

I really like pi and have started using it to build my agent. Mario's article fully reveals some design trade-offs and complexities in the construction process of coding agents and even general agents. I have benefited a lot!

▲

xcodevn

3 hours ago

[-]

I did something similar in Python, in case people want to see a slightly different perspective (I was aiming for a minimal agent library with built-in tools, similar to the Claude Agent SDK):

https://github.com/NTT123/nano-agent

▲

v0id_user

1 hour ago

[-]

Being minimalist is real power these days as everything around us keeps shoving features in our face every week with a million tricks and gimmicks to learn. Something minimalist like this is honestly a breath of fresh air!

The YOLO mode is also good, but having a small ‘baby setting mode’ that’s not full-blown system access would make sense for basic security. Just a sensible layer of "pls don't blow my machine" without killing the freedom :)

▲

verdverm

3 hours ago

[-]

Glad to see more people doing this!

I built on ADK (Agent Development Kit), which comes with many of the features discussed in the post.

Building a full, custom agent setup is surprisingly easy and a great learning experience for this transformational technology. Getting into instruction and tool crafting was where I found the most ROI.

▲

sghiassy

3 hours ago

[-]

I always wonder what type of moat systems / business like these have

edit: referring to Anthropic and the like

▲

rglynn

2 hours ago

[-]

Capital, both social and economic.

Also data, see https://news.ycombinator.com/item?id=46637328

▲

mellosouls

2 hours ago

[-]

Its open source. Where does it say he wants to monetise it?

▲

keyle

3 hours ago

[-]

None, basically.

▲

xcodevn

3 hours ago

[-]

I do think Claude Code as a tool gave Anthropic some advantages over others. They have plan mode, todolist, askUserQuestion tools, hooks, etc., which greatly extend Opus's capabilities. Agree that others (Codex, Cursor) also quickly copy these features, but this is the nature of the race, and Anthropic has to keep innovating to maintain its edge over others

▲

NitpickLawyer

2 hours ago

[-]

The biggest advantage by far is the data they collect along the way. Data that can be bucketed to real devs and signals extracted from this can be top tier. All that data + signals + whatever else they cook can be re-added in the training corpus and the models re-trained / version++ on the new set. Rinse and repeat.

(this is also why all the labs, including some chinese ones, are subsidising / metoo-ing coding agents)

▲

bschwarz

3 hours ago

[-]

The only moat in all of this is capital.

▲

theturtletalks

1 hour ago

[-]

Can I replace Vercel’s AI SDK with Pi’s equivalent?

▲

evalstate

3 hours ago

[-]

An excellent piece of writing.

One thing I do find is that subagents are helpful for performance -- offloading tasks to smaller models (gpt-oss specifically for me) gets data to the bigger model quicker.

▲

charcircuit

3 hours ago

[-]

>The only way you could prevent exfiltration of data would be to cut off all network access for the execution environment the agent runs in

You can sandbox off the data.

▲

yosefk

2 hours ago

[-]

"Also, it [Claude Code] flickers" - it does, doesn't it? Why?.. Did it vibe code itself so badly that this is hopeless to fix?..

▲

falloutx

1 hour ago

[-]

Claude code programmers are very open that they vibe code it.

▲

jeffrallen

3 hours ago

[-]

As a user of a minimal, opinionated agent (https://exe.dev) I've observed at least 80% of this article's findings myself.

Small and observable is excellent.

Letting your agent read traces of other sessions is an interesting method of context trimming.

Especially, "always Yolo" and "no background tasks". The LLM can manage Unix processes just fine with bash (e.g. ps, lsof, kill), and if you want you can remind it to use systemd, and it will. (It even does it without rolling it's eyes, which I normally do when forced to deal with systemd.)

Something he didn't mention is git: talk to your agent a commit at a time. Recently I had a colleague check in his minimal, broken PoC on a new branch with the commit message "work in progress". We pointed the agent at the branch and said, "finish the feature we started" and it nailed it in one shot. No context whatsoever other than "draw the rest of the f'ing owl" and it just.... did it. Fascinating.