task.md:
#!/usr/bin/env claude-run
Analyze this codebase and summarize the architecture.
Then: chmod +x task.md
./task.md
These aren't just prompts. Claude Code has tool use, so a markdown file can run shell commands, write scripts, read files, make API calls. The prompt orchestrates everything.A script that runs your tests and reports results (`run_tests.md`):
#!/usr/bin/env claude-run --permission-mode bypassPermissions
Run ./test/run_tests.sh and summarize what passed and failed.
Because stdin/stdout work like any Unix program, you can chain them: cat data.json | ./analyze.md > results.txt
git log -10 | ./summarize.md
./generate.md | ./review.md > final.txt
Or mix them with traditional shell scripts: for f in logs/\*.txt; do
cat "$f" | ./analyze.md >> summary.txt
done
This replaced a lot of Python glue code for us. Tasks that needed LLM orchestration libraries are now markdown files composed with standard Unix tools. Composable as building blocks, runnable as cron jobs, etc.One thing we didn't expect is that these are more auditable (and shareable) than shell scripts. Install scripts like `curl -fsSL https://bun.com/install | bash` could become:
`curl -fsSL https://bun.com/install.md | claude-run`
Where install.md says something like "Detect my OS and architecture, download the right binary from GitHub releases, extract to ~/.local/bin, update my shell config." A normal human can actually read and verify that.The (really cool) executable markdown idea and auditability examples are from Pete Koomen (@koomen on X). As Pete says: "Markdown feels increasingly important in a way I'm not sure most people have wrapped their heads around yet."
We implemented it and added Unix pipe semantics. Currently works with Claude Code - hoping to support other AI coding tools too. You can also route scripts through different cloud providers (AWS Bedrock, etc.) if you want separate billing for automated jobs.
GitHub: https://github.com/andisearch/claude-switcher
What workflows would you use this for?
You could, for example, put a C program on lines 2 and further and expect/hope/pray Claude to interpret or compile and run that (adding a comment “run the following program; download and compile an interpreter or compiler if needed first” as an instruction to Claude would improve your chances)
You could also pass commented code/scripts straight into Claude Code using it quickly without changing how they execute. The prompt instructions could go at the top of a valid file (say python/typescript) as comments, e.g.
`claude-run --azure --opus my_script.py`
#!/usr/bin/env claude-run
<instructions>
Analyze this codebase.
</instructions>
Then: chmod +x task.xml && ./task.xmlHere is the template I start with:
#!/usr/bin/env gpt-agent
input: <target/context (e.g., cwd)>
task: |
<one clear objective>
output: |
<deliverables + required format>
require:
- <must be true before start; otherwise stop + report>
invariant:
- <must stay true while working (scope + safety)>
ensure:
- <must be true at the end (definition of done)>
rescue: |
<what to do if any requirement/invariant/ensure cannot be met>`#!/usr/bin/env claude-run --permission-mode bypassPermissions`
Or use the .ag files you have unmodified:
`claude-run --opus --vercel task.ag`
Not only it would avoid any confusion (Markdown wasn't meant to be executable?) but it would allow future extensions in a domain that is moving fast.
The recent incident (https://news.ycombinator.com/item?id=46532075) regarding Claude Code's changelog shows that pure Markdown can break things if it is consumed raw.
Also, regarding: "Detect my OS and architecture, download the right binary from GitHub releases, extract to ~/.local/bin, update my shell config."
I have a hard time seeing how this is "more auditable" than a shell script with hardcoded URLs/paths.
"the right binary" is something that would make me reject an issue from a PM, asking for clarifications because it's way too vague.
But maybe that's why I'll soon get the sack?
I think the reasons Markdown is appealing include:
- It's just a text file.
- LLMs like Claude have high comprehension of the format, so Claude Code does very well with it.
- You can mix structured and unstructured text, and code with plain language: YAML frontmatter, outline/headings, code blocks, tables, links and images etc.
I used a heavily condensed version of the example prompt that Pete Koomen posted about as a simplified example, so that's really just me cutting it back to the most simple form of the concept.
In real-use it would be detailed, verbose and specific, and include the actual code blocks and external shell script references to retrieve and execute. So this really is just a proof of concept to give an idea of the sort of thing that people could create in future.
I know lots of us developers joke about getting the sack and losing out to AI. But for what it's worth, the sorts of points you raise are exactly why I think skilled developers become even more valuable than ever with AI.
Programming will change massively this next decade. But it has many times even in my life. So I'm definitely in the camp that thinks this is a new programming abstraction level, and Claude Code and Codex and others are useful tools that improve the productivity of skilled coders. Especially when they are used carefully and thoughtfully.
- file types exist for a reason
- this is just prompt engineering which is already easy to do
I can see there might be valid arguments for enforcing file type associations for execution at the OS level. These are just text files, and Unix-like environments support making text files executable with a shebang as a universal convention.
I am a fan of that unix-like philosophy generally: tools that try to do a single thing well, can be chained together, and allow users to flexibly create automations using plain text. So I tried to stick with that approach for these scripts.
I'm a bear of little brain, and prompt engineering makes my head hurt. So part of the motivation was to be able to save prompts and collections of prompts once I've got them working, and then execute on demand. I think the high readability of markdown as scripts is helpful for creating assets that can be saved, shared and re-used, as they are self-documenting.
You can try this out and you’ll see what I mean if you run a few simple examples. This approach was based on experimentation and trying to be consistent with Claude’s own philosophy here.
This could be dinosaur mindset from 2022, but would it not make sense to prompt the LLM to create a bash script based on these instructions, so it could be more deterministic? Claude code is pretty reliable, but this is probably only one and a half nines at best.
As for safety, running this in a devcontainer[1][2] or as part of a CI system should be completely fine.
1. (conventional usage) https://code.visualstudio.com/docs/devcontainers/containers
2. (actual spec) https://containers.dev/
As you say, Claude is actually very good at writing shell scripts and using tools on-the-fly. But I know there is an AI-confidence factor involved for developers making the choice to leverage that.
For simple tasks (in practice) I already find you can often prompt the whole thing.
For tasks where you already have the other traditional scripts or building blocks, or where it is complex, then you might break it up.
Interestingly, you can intermix these approaches.
You can have runnable markdown that writes and runs scripts on the fly, mixed with running command line tools, and chained along with traditional tools in a bash script, and then call that script from a runnable markdown that passes in test results, or analyzes the code base and passes recommendations in.
The composability and ability to combine and embed code blocks and tool use within plain language is quite powerful. I’m still learning how to use this.
I’m glad it is already useful and thank you.
More seriously, I agree that setting permissions to the minimum needed for the task and using sandboxed containers is sensible.
claude-run is just a bunch of little convenience scripts, but for it to work effectively with code execution, the handling needs to do a little more than just `cat` the file output, for example stripping shebang lines, supporting flags and permissions and a few other things. But all very simple if you see the repo.
Adding support for session isolation and support for different cloud providers and API keys to keep things separate from one's personal Claude subscription took a little work. But that is optional.
"Executable runbooks" is the name given to the concept there
Like, once upon a time maybe you gave your jr programmer a list of things to do, and depending on their skill, familiarity with the cli, hangover status, spelling abilities, etc, you'll get different results. So you write a deterministic shell script.
There are some tasks that are challenging to achieve with traditional code, but where modern LLMs perform strongly.
Examples include summarization, complex content formatting and restructuring, natural language classification, and evaluation judgements.
I’ve found that it is useful to be able to easily incorporate these along with traditional Shell scripts and command line tools as part of workflow pipelines. And I hope it can be useful for other people too.
Executable markdown provides a method of building these tasks into traditional pipelines as small, single-task-focused, composable modules. They also have the advantage that they can be easily shared and re-used.
The scripts are all pretty simple but they also:
- Handle script-context-relevant flags and control code execution permissions
- Convenience flags for directing scripts to run across cloud providers rather than a personal Claude subscription.
- Session isolation, especially between your regular interactive `claude` command and running with API keys
This means that your runnable script use can be kept isolated from your regular personal Claude environment that you use for interactive development.
Using the request to use a seed within the prompt will mean that when Claude rights the code it could use that seed inside what it writes for randomize functions. But sadly it wouldn’t impact Claude’s own text generation’s determinism.
There is active interest on GitHub to support this. But the most recent issue with it I could see was closed in July as “not planned”
I was excited in the possibly extravagant implementation idea and... when I read enough to realize it's based on some yet another LLM... Sorry, no, never. You do you.
That’s entirely what Claude Code does.
Sorry, I have literally no interest in all of it that makes you dependent on it, atrophies mind, degrades research and social skills, and negates self-confidencen with respect to other authors, their work, and attributions. Nor any of my colleagues in military and those I know better in person.
Constant research, general IDEs like JetBrains's, IDA Pro, Sublime Text, VS Code, etc. backed by forums, chats, and Communities, is absolutely enough for the accountable and fun work in our teams, who manage to keep in adequate deadlines.
I just disable it everywhere possible, and will do all my life. The close case to my environment was VS Code, and hopefully there's no reason to build it from source since they still leave built-in options to disable it: https://stackoverflow.com/a/79534407/5113030 (How can I disable GitHub Copilot in VS Code?...)
Isn't it just inadequate to not think and develop your mind, and let alone pass control of your environment to a yet another model or "advanced T9" of unknown source of unknown iteration.
In pentesting, random black-box IO, medicine experimental unverified intel, log data approximation why not? But in environment control, education, art or programming, fine art... No, never ^^
Related: https://www.tomshardware.com/tech-industry/artificial-intell...
The default permissions are to not allow execution. Which means that you can use the eval and text-generation capabilities of LLMs to perform assessments and evaluations of piped-in content without ever executing code themselves.
The script shebang has to explicitly add the permissions to run code, which you control. It supports the full Claude Code flag model for this.
Having said that, there are ad hoc automation tasks that I've traditionally used Jupyter notebooks to do that I'm finding are easier to get running using markdown files and Claude Code. It's early days and I still am getting a feel for this myself.
There are some comments from earlier with discussion of other literal program tools.
The balance between readability and determinism for auditability partly relates to developer philosophy. Tech is famous for religious arguments. I have friends who hate AI coding, and want to avoid nondeterminstic tools at all costs. And other friends whose productivity has increased significantly, and who see the future of programming as natural language.
The quality of AI models and tools like Claude Code is improving fast, and there are many developers who find value in them, myself included. I built this to make life easier for developers who want to use AI tools for automation.
I find it much faster to parse and understand plain language than many code scripts I've seen. It was one of Python's great insights that people spend more time reading code than writing it. And there is a tradeoff in auditability between determinism and the ability to quickly read and understand what systems do.
There are clearly many people who find AI useful, and who are becoming skilled in its use as a tool. This is just a little tool that I put together for myself and other people who fall in that basket.
Learning where to use AI tools appropriately - how to constrain the dangers while maximizing the value - is part of the challenge. From using this particular tool for real work, it fits some use cases well, and can make things easier both to understand and share, as well as to write.
I hope it's useful for some other people wanting to use AI for scripting and automation.
I think that quickly understandable instructions are part of auditability. Not the whole thing, and their use needs to be balanced with safety and security. But an important part of it.
I accept there are plenty of folks who don't see AI tools that way. We're sharing this for people who see the value in this new approach, even though it is a fast-moving field and there are a lot of imperfections.
Any reasonably competent Claude Code user who is careful about setting permissions boundaries is no more going to delete their hard drive than a competent command line user would. There will be things that go wrong with AI, as before it.
In years of tech support, I've personally had to help people who neutered their Windows install or deleted files they needed. Those things happen and I'd argue they come down to skill issues, with AI or without. New tools have a learning curve.
I get that you think that's bizarre to see readability with AI-based tools as more auditable, and I really do understand that perspective.
In short, isn't that like giving a voice-controlled scalpel to a random guy on the street an tell them 'just tell it to neurosurgery', and hope it accidentally does the right procedure?
It is intended to be a useful complement to traditional Shell scripting, Python scripting etc. for people who want to add composable AI tooling to their automation pipelines.
I also find that it helps improve the reliability of AI in workflows when you can break down prompts into re-useable single-task-focused modules that leverage LLMs for tasks they are good at (format.md, summarize-logs.md, etc). These can then be chained with traditional Shell scripts and command line tools.
Examples are summarizing reports, formatting content. These become composable building blocks.
So I hope that is something that has practical utility even for users like yourself who don’t see a role for plain language prompting in automation per se.
In practice this is a way to add composable AI-based tooling into scripts.
Many people are concerned about (or outright opposed to) the use of AI coding tools. I get that this will not be useful for them. Many folks like myself find tools like Claude helpful, and this just makes it easier to use them in automation pipelines.
That kind of failure mode is fundamentally different from traditional scripting: it passes tests, builds trust, and then fails catastrophically once the implicit interpretation shifts.
In short: I believe it's nice this works for the engineer who knows exactly what (s)he is doing - but those folks usually don't need LLMs, they just write the code. People who this appeals to - and who may not begin to think about side-effects of innocent-sounding prompts - are being given a foot machine gun, which may act like a genie with hilarious unintended consequences.
- Lets you make regular Markdown files directly executable using shebang line.
- It keeps the Markdown itself clean and standard rather than using variable placeholders or any kind of special syntax.
- Includes support for session isolation
- Allows you to keep script use separate from your regular Claude Code subscription, by allowing you to specify the provider cloud / model in scripts, or switch them on the fly.
Another commenter suggested a custom format for executable llm scripts, which looks like the direction mdflow takes.
Using claude-switcher you can also use multiple clouds/keys for billing and failover, and to keep your subscription tokens for interactive or personal use, which I think is also useful.
…could possibly go wrong?
One of the key things we realized starting to use it is that the approach allows you to mix deterministic and non-deteministic tools together as part of a composable chain.
So you can, for example, use LLMs for their evaluation capabilities with a natiural language script as part of a broader chain that wraps it in deterministic code, and that also can include and run deterministic code nested within the plain language script.
So it allows us to create pipelines that combine the best of both approaches as appropriate based on the sub-task at hand.
Which means your entire pipeline is tainted.
If your process is fine with that, whatever, but don't pretend that the result can be controlled.
I agree that this is a choice by each person using tools like this, and that it is up to each of us as developers whether a tool like this suits the use case at hand.
My own view is that the world is rapidly moving to more human language programming tools, and that system automation and shell scripting will be part of this. There is a wide array of sensible potential use cases I can see between the two polarized views of "never use an LLM' and "let's vibe code system automation".
If your output is in any way nondeterministic, then so is the code executed. Fin. No nuance to mathematically be had.
Randomness is nothing new. Various algorithms have always been non-deterministic. Randomness is in most standard libraries.
My problem here is not with what you're doing - but that you're presenting as if you do not understand what you're doing.
I guess these so called “developers” these days did not ever think about why this is needed. Ever.
The “senior/staff” engineers of 2025 are now at the same knowledge level of juniors in 2015 or were not at all “senior” to begin with ideas like this.
Making a markdown `.md` text file executable with Claude Code is effective in practice because Claude Code can easily understand the content.
(iOS Safari)
``` #!/usr/bin/env claude-run --permission-mode bypassPermissions ```
The claude-run helper supports passing in those flags supported by Claude Code itself that are relevant to a shell-scripting like context.
It also adds a couple of convenience flags (`--aws`, `--azure`, `--vercel`, `--vertex` for cloud API key use).