Most frameworks want a long-lived session with a massive context window doing everything at once. That's expensive, slow, and fragile. Good software is small, focused, and composable... AI agents should be too.
Axe treats LLM agents like Unix programs. Each agent is a TOML config with a focused job. Such as code reviewer, log analyzer, commit message writer. You can run them from the CLI, pipe data in, get results out. You can use pipes to chain them together. Or trigger from cron, git hooks, CI.
What Axe is:
- 12MB binary, two dependencies. no framework, no Python, no Docker (unless you want it)
- Stdin piping, something like `git diff | axe run reviewer` just works
- Sub-agent delegation. Where agents call other agents via tool use, depth-limited
- Persistent memory. If you want, agents can remember across runs without you managing state
- MCP support. Axe can connect any MCP server to your agents
- Built-in tools. Such as web_search and url_fetch out of the box
- Multi-provider. Bring what you love to use.. Anthropic, OpenAI, Ollama, or anything in models.dev format
- Path-sandboxed file ops. Keeps agents locked to a working directory
Written in Go. No daemon, no GUI.
What would you automate first?
The first question that comes to mind is: how do you think about cost control? Putting a ton in a giant context window is expensive, but unintentionally fanning out 10 agents with a slightly smaller context window is even more expensive. The answer might be "well, don't do that," and that certainly maps to the UNIX analogy, where you're given powerful and possibly destructive tools, and it's up to you to construct the workflow carefully. But I'm curious how you would approach budget when using Axe.
Great question and it's something that I've not dig into yet. But I see no problem adding a way to limit LLMs by tokens or something similar to keep the cost for the user within reason.
- claude takes a -p option
- i have a bunch of tiny scripts, each script is an agent but it only does one tiny task
- scripts can be composed in a unix pipeline
For example: $ git diff --staged | ai-commit-msg | git commit -F -
Where ai-commit-msg is a tiny agent: #!/usr/bin/env bash
# ai-commit-msg: stdin=git diff, stdout=conventional commit message
# Usage: git diff --staged | ai-commit-msg
set -euo pipefail
source "${AGENTS_DIR:-$HOME/.agents}/lib/agent-lib.sh"
SYSTEM=$(load_skills \
core/unix-output.md \
core/be-concise.md \
domain/git.md \
output/plain-text.md)
SYSTEM+=$'\n\nTask: Given a git diff on stdin, output a single conventional commit message. One line only.'
run_agent "$SYSTEM"
And you can see to keep the agents themselves tiny, they rely on a little lib to load the various skills and optionally apply some guard / post-exec validator. Those validators are usually simple grep or whatever to make sure there were no writes outside a given dir but sometimes they can be to enforce output correctness (always jq in my examples so far...). In theory the guard could be another claude -p call if i needed a semantic instruction. Context: You are working with Git repositories.
- Commit messages follow Conventional Commits: type(scope): description
- Types: feat, fix, docs, refactor, test, chore, ci, perf
- Subject line max 72 chars, imperative mood, no trailing period
- Reference issue numbers when relevant
So it produces messages like: $ git diff HEAD~1 | bin/ai-commit-msg
fix(guards): pass input to claude and tighten verdict handlingWould love to see your tiny agents project. But understand that it might contain something sensitive and will therefore stay private.
#!/usr/bin/env bash # agent-lib.sh — shared plumbing for all claude -p agents
AGENTS_DIR="${AGENTS_DIR:-$HOME/Code/github.com/craigjperry2/tiny-agents}" SKILLS_DIR="$AGENTS_DIR/skills" CLAUDE_OPTS="${CLAUDE_OPTS:-}"
# Build a system prompt by concatenating skill files # Usage: load_skills core/unix-output.md domain/git.md output/plain-text.md load_skills() { local combined="" for skill in "$@"; do local path="$SKILLS_DIR/$skill" if [[ -f "$path" ]]; then combined+=$'\n\n'"$(cat "$path")" else echo "[agent-lib] WARNING: skill not found: $skill" >&2 fi done echo "$combined" }
# Core invocation: reads stdin, prepends system prompt, calls claude -p # Usage: run_agent <system_prompt> [extra claude opts...] run_agent() { local system_prompt="$1" shift local stdin_content stdin_content=$(cat) # buffer stdin
if [[ -z "$stdin_content" ]]; then
echo "[agent] ERROR: no input on stdin" >&2
exit 1
fi
# Combine system prompt with stdin as user message
printf '%s' "$stdin_content" \
| claude -p \
--system-prompt "$system_prompt" \
--output-format text \
$CLAUDE_OPTS \
"$@"
}# Run agent then pipe through a guard # Usage: run_agent_guarded <guard_name> <system_prompt> run_agent_guarded() { local guard="$1" shift local system_prompt="$1" shift
local output
output=$(run_agent "$system_prompt" "$@")
local agent_exit=$?
if [[ $agent_exit -ne 0 ]]; then
echo "$output"
exit $agent_exit
fi
# Pass through guard
echo "$output" | "$AGENTS_DIR/guards/$guard"
exit $?
}# For structured output: run agent then validate with jq run_json_agent() { local system_prompt="$1" shift run_agent "$system_prompt" --output-format text "$@" | guard-json-valid }
Anybody looking to do interesting things should instantly ignore any project that mention "persistent memory". It speaks of scope creep or complexity obfuscation.
If a tool wants to include "persistent memory" it needs to write the 3 sentence explanation of how their scratch/notes files are piped around and what it achieves.
Not just claim "persistent memory".
I might even go so far that any project using the terminology "memory" is itself doomed to spend too much time & tokens building scaffolding for abstractions that dont work.
The purpose of scaffolding is to create persistent memories.
>claim "persistent memory"
Just look at it as a build product.
>abstractions that don't work
Look at this as a testing problem.
I have a known workflow to create an RPG character with steps, lets automate some of the boilerplate by having a succession of LLMs read my preferences about each step and apply their particular pieces of data to that step of the workflow, outputting their result to successive subdirectories, so I can pub/sub the entire process and make edits to intermediate files to tweak results as I desire.
Now that's cool!
Aside but 12 MB is ... large ... for such a thing. For reference, an entire HTTP (including crypto, TLS) stack with LLM API calls in Zig would net you a binary ~400 KB on ReleaseSmall (statically linked).
You can implement an entire language, compiler, and a VM in another 500 KB (or less!)
I don't think 12 MB is an impressive badge here?
$ ls -l axe
-rwxr-xr-x 1 root wheel 12830781 Mar 12 22:38 axe*
$ ldd axe
axe:
libthr.so.3 => /lib/libthr.so.3 (0xe2e74a1d000)
libc.so.7 => /lib/libc.so.7 (0xe2e74c27000)
libsys.so.7 => /lib/libsys.so.7 (0xe2e75de6000)
[vdso] (0xe2e7366b000)--
1: https://lobste.rs/s/tzyslr/reducing_size_go_binaries_by_up_7...
To your point, why even advertise the number? If that particular number is completely irrelevant in practical usage, why mention it? It seems like the point is to impress, hence my response.
Worth comparing architectures: Axe is stateless CLI (pipe in, get output), crabtalk.ai is daemon + commands (persistent process manages state, commands are standalone binaries on PATH).
Both bet on small binaries (crabtalk is 8MB) and Unix philosophy, but the daemon gives you hot-swap, process isolation, and persistent state across invocations.
I'm a bit skeptical of this approach, at least for building general purpose coding agents. If the agents were humans, it would be absolutely insane to assign such fine-grained responsibilities to multiple people and ask them to collaborate.
Gonna be honest, it has taken away from the message both times I've seen it. It feels a bit like you're LARPing your favorite humans vs robots tv show.
I get that it can seem childish but when you compare that to the indolent people who are demanding AI, it cancels out.
Disclaimer: I haven't dug into axe enough yet, just going on first impressions.
>No daemon, no GUI.
I love the world we developers live in right now. ;)
>What would you automate first?
In a sense, I have wanted to be able to just add AI to a repo, and treat it like the junior developer it is. Its okay if the junior developer will do literally any stupid thing I tell it to do, because I won't tell it to do stupid things.
So, exactly: refactor this code, implement a shim, produce docs for <blah>, construct a build harness, write unit tests, produce a build, diff these codebases, implement this API, do all this on your own branch, and build and test things so that I can review the PR over coffee.
Essentially, three word commands which will encourage the AI to produce better software. Through my repo, so I can just review through the repo.
Okay, that's how I hope things work, now off to actually dig in to axe and give it a try on a few things, thanks very much again ..
One thing I’ve noticed when experimenting with agent pipelines is that the “single-purpose agent” model tends to make both cost control and reasoning easier. Each agent only gets the context it actually needs, which keeps prompts small and behavior easier to predict.
Where it gets interesting is when the pipeline starts producing artifacts instead of just text — reports, logs, generated files, etc. At that point the workflow starts looking less like a chat session and more like a series of composable steps producing intermediate outputs.
That’s where the Unix analogy feels particularly strong: small tools, small contexts, and explicit data flowing between steps.
Curious if you’ve experimented with workflows where agents produce artifacts (files, reports, etc.) rather than just returning text.
Yes! I run a ghost blog (a blog that does not use my name) and have axe produce artifacts. The flow is: I send the first agent a text file of my brain dump (normally spoken) which it then searched my note system for related notes, saves it to a file, then passes everything to agent 2 which make that dump a blog draft and saves it to a file, agent 3 then takes that blog draft and cleans it up to how I like it and saves it. from that point I have to take it to publish after reading and making edits myself.
One thing I’ve noticed when experimenting with similar workflows is that once artifacts start accumulating (drafts, logs, intermediate reports, etc.), you start running into small infrastructure questions pretty quickly:
– where intermediate artifacts live – how later agents reference them – how long they should persist – whether they’re part of the workflow state or just temporary outputs
For small pipelines the filesystem works great, but as the number of steps grows it starts to look more like a little dataflow system than just a sequence of prompts.
Do you usually just keep everything as local files, or have you experimented with something like object storage or a shared artifact layer between agents?
For example, let's say I want to add commit message generation (which I don't think is a great use of LLMs, but it is a practical example) to a repo. I would add the appropriate hook to /.git, but I would also want the agent with its instructions to live inside the repo (perhaps in an `axe` or `agents` directory).
Can Axe load agents from the current folder? Or can that be added?
1. I have a flow where I pass in a youtube video and the first agent calls an api to get the transcript, the second converts that transcript into a blog-like post, and the third uploads that blog-like post to instapaper.
2. Blog post drafting: I talk into my phone's notes app which gets synced via syncthing. The first agent takes that text and looks for notes in my note system for related information, than passes my raw text and notes into the next to draft a blog post, a third agent takes out all the em dashes because I'm tired of taking them out. Once that's all done then I read and edit it to be exactly what I want.
Also, I had to do several refactorings of my agent's constructs and found out that one of them was reinventing stuff producing a plethora of function duplications: e.g. DB connection pools(i had at least four of them simultaneously).
Would AXE require shared state between chained agents? Could it do it if required?
OP, what have you used this on in practice, with success?
How is it supposed to work, if agent can simply run "cat" command instead of using skill for file read/write/etc?
chroot is not a security tool and never has been
A Proper self-contained, self improving AI@home with the AI as the OS is my end goal, I have a nice high spec but older laptop I am currently using as a sacrificial pawn experimenting with this, but there is a big gap in my knowledge and I'm still working through GPT2 level stuff, also resources are tight when you're retired. I guess someone will get there this year the way things are going, but I'm happy to have fun until then.
I currently use Claude web with an MCP component for my workflows but axe looks like it could be a nicer and quicker way to work with the tools I have.
@jrswab, do you think it would be feasible to limit outgoing connections to a whitelist of domains, URLs, or IP addresses?
I’d like to automate some of my email, calendar, or timesheet tasks, but I’m concerned that a prompt injection could end up exfiltrating or deleting data. In fact, that’s the main reason why I’m not using Openclaw or similar projects with real data yet.
One idea I'm thinking of is, after an agent has been in use for a while, and built up and understanding of the task, would be something like, "Write a Python script to replace this agent."
I could imagine this would work with agents that are processing log files or other semi-structured data for example.
I could see using this once the plan is defined and switching back to chat while iterating on post-implementation cleanup and refactoring.
It looks like Axe works the same way: fire off a request and later look at the results.
Does it do anything CPU-bound on its own, such that it benefits significantly from being a compiled (Go) executable? I actually like having things like this done in Python, since there's more potential to hack around with them.
I just don't see this in the readme… It is not in the Features section at least.
Anyway, i have MCP server that can post inline comments into Gitlab MR. Would like to try to hook it up to the code reviewer.
However, this does not help if a person gives access to something like Google Calendar and a prompt tells the LLM to be destructive against that account.
how would you say this compares to similar tools like google’s dotprompt? https://google.github.io/dotprompt/getting-started/
Dotprompt is a promt template that lives inside app code to standardize how we write prompts.
Axe is an execution runtime you run from the shell. There's no code to write (unless you want the LLM to run a script). You define the agent in TOML and run with `axe run <agent name> and pipe data into it.
[0]https://inchbyinch.de/wp-content/uploads/2017/08/0400-axe-ty...
[1]https://i.pinimg.com/originals/da/14/80/da148078cc1478ec6b25...
Tiny note: there's a typo in your repo description.