It takes the stress about needing to monitor all the agents all the time too, which is great and creates incentives to learn how to build longer tasks for CC with more feedback loops.
I'm on Ubuntu 22.04 and it was surprisingly pleasant to create a layered sandbox approach in rust with bubblewrap and Landlock LSM: Landlock for filesystem restrictions (read-only system paths, blocked ~/.ssh/~/.aws, workdir write access) and TCP port control (only 443/22/MCP ports), bubblewrap for mount namespace isolation (/tmp per-project, hiding secrets), and dnsmasq for DNS whitelisting (only anthropic.com, github.com, pypi.org resolve - everything else gets NXDOMAIN).
I see the power and am considering Max but 5x cost is difficult to swallow. Just doing this for a lark, not professionally.
> a malicious AI trying to escape the VM (VM escape vulnerabilities exist, but they’re rare and require deliberate exploitation)
No VM escape vulns necessary. A malicious AI could just add arbitrary code to your Vagrantfile and get host access the first time you run a vagrant command.
If you're only worried about mistakes, Claude could decide to fix/improve something by adding a commit hook. If that contains a mistake, the mistake gets executed on your host the first time you git commit/push.
(Yes, it's unpleasantly difficult to truly isolate dev environments without inconveniencing yourself.)
I basically do something like "take snapshot -> run tiny vm -> let agent do what it does -> take snapshot -> look at diff" for each change, restarting if it doesn't give me what I wanted, or I misdirected it somehow. But there is no automatic sync of files, that'd defeat the entire point of putting it into a VM in the first place, wouldn't it?
> A malicious AI could just add arbitrary code to your Vagrantfile
> [...]
> Claude could decide to fix/improve something by adding a commit hook.
You can fix this by confining Claude to a subdirectory (with Docker volume mounts, for example): repository/
├── sandbox <--- Claude lives in here
│ └── main.py <--- Claude can edit this
└── .git <--- Claude can not touch thisShannot[0] captures intent before execution. Scripts run in a PyPy sandbox that intercepts all system calls - commands and file writes get logged but don't happen. You review in a TUI, approve what's safe, then it actually executes.
The trade-off vs VMs: VMs let Claude do anything in isolation, Shannot lets Claude propose changes to your real system with human approval. Different use cases - VMs for agentic coding, whereas this is for "fix my server" tasks where you want the changes applied but reviewed first.
There's MCP integration for Claude, remote execution via SSH, checkpoint/rollback for undoing mistakes.
Feedback greatly appreciated!
The problem with this approach (unless I'm misunderstanding - entirely possible!) is that it still blocks the agent on the first need for approval.
What I think most folks actually want (or at least what I want) is to allow the agent to explore a space, including exploring possible dead ends that require permissions/access, without stopping until the task is finished.
So if the agent is trying to "fix a server" it might suggest installing or removing a package. That suggestion blocks future progress.
Until a human comes in and says "yes - do it" or "no - try X instead" it will sit there doing nothing.
If instead it can just proceed, observe that the package doesn't resolve the issue, and continue exploring other solutions immediately, you save a whole lot of time.
So the agent can freely explore, check logs, list files, inspect service status. It only blocks when it wants to change something (install a package, write a config, restart a service).
Also worth noting: Shannot operates on entire scripts, not individual commands. The agent writes a complete program, the sandbox captures everything it wants to do during a dry run, then you review the whole batch at once. Claude Code's built-in controls interrupt at each command whereas Shannot interrupts once per script with a full picture of intent.
That said, you're pointing at a real limitation: if the fix genuinely requires a write to test a hypothesis, you're back to blocking. The agent can't speculatively install a package, observe it didn't help, and roll back autonomously.
For that use case, the OP's VM approach is probably better. Shannot is more suited to cases where you want changes applied to the real system but reviewed first.
Definitely food for thought though. A combined approach might be the right answer. VM/scratch space where the agent can freely test hypotheses, then human-in-the-loop to apply those conclusions to production systems.
- Spin up a vm with an image of the real target device.
- Let the agent act freely in the vm until the task is resolved, but capture and record all dangerous actions
- Review & replay those actions on the real machine
My issue is that for any real task, an agent without feedback mechanisms is essentially worthless. You have to have some sort of structured "this is what success looks like, here's how you check" target for it. A human in the loop can act as that feedback, which is in line with how claude code works by default (you define success by approving actions and giving feedback on status), but requiring a human in the loop also slows it down a bunch - you can end up ping-ponging between terminals trying to approve actions and review the current status.
This what claude already does out of the box.
I started running Claude Code in a devcontainer with limited file access (repo only) and limited outbound network access (allowlist only) for that reason.
This weekend, I generalized this to work with docker compose. Next up is support for additional agents (Codex, OpenCode, etc). After that, I'd like to force all network access through a proxy running on the host for greater control and logging (currently it uses iptables rules).
This workflow has been working well for me so far.
Still fresh, so may be rough around the edges, but check it out: https://github.com/mattolson/agent-sandbox
You can also use Lima, a lightweight VM control plane, as it natively works with qemu and Virtualization.Framework. (I think Vagrant does too; it's been a minute since I've tried.) This has traditionally been used for running container engines, but it's great for narrowly-scoped use cases like this.
Just need to be careful about how the directory Claude is working with is shared. I copy my Git repo to a container volume to use with Claude (DinD is an issue unless you do something like what Kind did) and rsync my changes back and verify before pushing. This way, I don't have to worry if Claude decides to rewind the reflog or something.
https://github.com/EstebanForge/construct-cli
For Linux, WSL also of course, and macOS.
Any coding agent (from the supported ones, our you can install your own).
Podman, Docker or even Apple's container.
In case anyone is interested.
> So in that sense it seems that AI is actually more aligned with my goals than a potential employee.
It may seem like that but I recommend you reading up on different kinda of misalignment in AI safety.
(GPT recently changed its attitude on this subject too which is very interesting.)
The most interesting part is that you will be given the option to downgrade the conversation to an older model. Implying that there was a step change in capability on this front in recent months.
sandbox-run npx @anthropic-ai/claude-code
This runs npx (...) transparently inside a Bubblewrap sandbox, exposing only the $PWD. Contrary to many other solutions, it is a few lines of pure POSIX shell.What other dev OSs are there?
> once privileges are dropped [...] it doesn't appear to be possible to reinstate them
I don't understand. If unprivileged code could easily re-elevate itself, privilege dropping would be meaningless ... If you need to communicate with the outside, you can do so via sockets (such as the bind-mounted X11 socket in one of the readme Examples).
Consider one wanted to replicate the human-approval workflow that most agent harnesses offer. It's not obvious to me how that could be accomplished by dropping privileges without an escape hatch.
And I think that what CC's /sandbox uses on a Mac
There's a load of ways that a repository owner can get an LLM agent to execute code on user's machines so not a good plan to let them run on your main laptop/desktop.
Personally my approach has been put all my agents in a dedicated VM and then provide them a scratch test server with nothing on it, when they need to do something that requires bare metal.
I currently apply the same strategy we use in case of the senior developer or CTO going off the deep end. Snapshots of VMs, PITR for databases and file shares, locked down master branches, etc.
I wouldn't spend a bunch of energy inventing an entirely new kind of prison for these agents. I would focus on the same mitigation strategies that could address a malicious human developer. Virtual box on a sensitive host another human is using is not how you'd go about it. Giving the developer a cheap cloud VM or physical host they can completely own is more typical. Locking down at the network is one of the simplest and most effective methods.
The only access the container has are the folders that are bind mounted from the host’s filesystem. The container gets network access from a transparent proxy.
https://github.com/dogestreet/dev-container
Much more usable than setting up a VM and you can share the same desktop environment as the host.
I ended up getting a mini-PC solely dedicated toward running agents in dangerous mode, it's refreshing to not have to think too much about sandboxing.
I needed a way to run Claude marketplace agents via Discord. Problem: agents can execute code, hit APIs, touch the filesystem—the dangerous stuff. Can't do that in a Worker's 30s timeout.
Solution: Worker handles Discord protocol (signature verification, deferred response) and queues the task. Cloudflare Sandbox picks it up with a 15min timeout and runs claude --agent plugin:agent in an isolated container. Discord threads store history, so everything stays stateless. Hono for routing.
This was surprisingly little glue. And the Cloudflare MCP made it a breeze do debug (instead of headbanging against the dashboard). Still working on getting E2E latency down.
https://code.claude.com/docs/en/sandboxing#sandboxing
> Claude Code includes an intentional escape hatch mechanism that allows commands to run outside the sandbox when necessary. When a command fails due to sandbox restrictions (such as network connectivity issues or incompatible tools), Claude is prompted to analyze the failure and may retry the command with the dangerouslyDisableSandbox parameter.
The ability for the agent itself to decide to disable the sandbox seems like a flaw. But do I understand correctly that this would cause a pause to ask for the user's approval?
[0] https://github.com/anthropics/claude-code/issues/14268
Side note: I wish Anthropic would open source claude code. filing an issue is like tossing toilet paper into the wind.
> So now you need Docker-in-Docker, which means --privileged mode, which defeats the entire purpose of sandboxing.
> That means trading “Claude might mess up my filesystem” for “Claude has root-level access to my container runtime.”
A Vagrant VM is exactly the same thing, just without Docker. The benefit of Docker is you've got an entire ecosystem of tooling and customized containers to benefit from, easier to maintain than a Vagrantfile, and no waiting for "initialization" on first booting a Vagrant box.On both Linux and MacOS, use this:
# Build 'claude' VM and Docker context
$ colima start --profile claude --vm-type=qemu
$ docker context create claude --docker "host=unix://$HOME/.colima/claude/docker.sock"
$ docker context use claude
# Start DinD, pass through ports 8080 and 8443, and mount one host directory (for a Git repo)
$ docker run -d --name dind-lab --privileged -e DOCKER_TLS_CERTDIR= -v dind-lab-data:/var/lib/docker \
-p 8080:8080 -p 8443:8443 -v /home/MYUSER/GITDIR:/mnt/host/home/MYUSER/GITDIR \
docker:27-dind
$ docker run --rm -it -e DOCKER_HOST=tcp://127.0.0.1:2375 \
-p 8080:8080 -p 8443:8443 -v /mnt/host/home/MYUSER/GITDIR:/home/MYUSER/GITDIR \
ubuntu:24.04 bash
# Or if you don't want to pass-through ports w/ DinD twice, use its network namespace directly
# ( docker run --rm -it -e DOCKER_HOST=tcp://127.0.0.1:2375 --network container:dind-lab .... )
Your normal default Docker context remains safe for normal use, and the "dangerous" context of claude euns in a different VM. If Claude destroys its container's VM, just delete it (colima stop claude; colima delete claude) and remake it.You could do rootless Docker/Podman, but there's a lot of broken stuff to deal with that will just distract the AI.
You can't assume that.
Attackers with LLMs have enough capabilities to engineer them to build exploits for kernel vulnerabilities [0] or to bypass sandboxes to exfiltrate data [0] in covert ways.
It is completely possible to craft a chained attack for an agent to bypass sandboxes even with or without a kernel exploit.
From [0] and [1]
[0] https://sean.heelan.io/2026/01/18/on-the-coming-industrialis...
[1] https://www.promptarmor.com/resources/claude-cowork-exfiltra...
Our next version of Docker Sandboxes will have MicroVM isolation and a Docker instance within for this exact reason. It'll let you use Claude Code + Containers without Docker-in-Docker.
I'm working on targeting both the curl|bash pattern and coding agents with this (via smart out of the box profiles). Early stages but functional. Feedback and bug reports would be appreciated.
I’ve found the sprites just work for claude. Pull how a repo (or repos) and run dangerously.
One frustrating thing about these solutions is that they’re great to prevent Claude from breaking a machine, but there’s no pervasive sandbox for third party services
[1]: https://github.com/nikvdp/cco [2]: https://code.claude.com/docs/en/sandboxing
So I just run agents as the agent user.
I don't need it to have root though. It just installs everything locally.
If I did need root I'd probably just buy a used NUC for $100, and let Claude have the whole box.
I did something similar by just renting a $3 VPS, and getting Claude root there. It sounds bad but I couldn't see any downside. If it blows it up, I can just reset it. And it's really nice having "my own sysadmin." :)
Claude gets all the packages it needs through Guix.
industrially-making-exploits.. : https://news.ycombinator.com/item?id=46676081
This seems like a very hard problem with coding specifically as you want unsafe content (web searches) to be able to impact sensitive things (code).
I'd love to find people to talk to about this stuff.
But a simple vm and some automation to install developer tools using ansible, nix or whatever you prefer isn't that hard to (vibe) code together. I like Lima but it feels slightly sub-optimal for the job currently.
Some useful things to consider:
- Ssh agent forwarding for authenticating against e.g. git is useful. But maybe don't use the same key that authenticates to your production machines as well ...
- How do you authenticate without a browser? Most AI tools have ways to deal with that but it's slightly tedious to automate during provisioning.
- Making sure all your development tools are there; I use things like sdkman, nvm, bun, etc. And I have my shell preferences and some other tools I like to have around.
- Minimizing time provisioning these vms over and over again. This gets tedious really quickly.
- Keeping the VMs fast is important too. In my projects, build tool performance adds up and AI tools like to call them a lot. So assign enough memory and CPU.
- It would be nice to switch between local and remote/cloud based vms easily.
- Software flexibility; developers are picky about their tools. There is no one size fits all here. Even just deciding on the base image to use for your vm is likely to escalate. I picked debian for what it is worth.
In short, I think there's enough out there that you can pull something together but it still involves quite a bit of DIY. It would be nice if this got easier. And AI tools asking for permission for everything is not a good security model. Because people just turn that off. Sandboxing those things is the way to go. But AI tools need to be able to do enough to work with your software.
IMO, if you are not running in the dangerous mode then you are really missing out on one of the best aspects of claude code- its ability to iterate. If you have to confirm each iteration then it's just not practical.
Please inform me if my thinking is wrong.
If Claude is writing a program to go that low level I'd pay money to watch that.
Also, is overwriting the same a deleting? Maybe it will just clobber your files with echo >file and mv them out of the way.
Maybe it realizes you have Time Machine backups enabled, so deleting your entire directory is permitted since it's not actually deleted. ;)
So it's basically adding "don't delete my files pretty please" to the prompt?
EDIT: I misread, the natural language description of the rule is just a shortcut to generate the actual rule which is based on regexp patterns.
Still, it only protects you against very specific commands. Won't help you if the LLM decides to fill your disk with `cat /dev/urandom > foo` for example.
I don't know anyone that inspects every binary yet we apparently we should not trust shell scripts?
So there's that
It all integrates nicely with VS Code. It has a firewall script and you spin up your database within the docker compose file so it has full access to a postgres instance. I can share my full setup if anyone needs it.
Devcontainers look perfect but also like a bit of a burden to entry with regards to setup.
Windows is the best (sandboxed) linux
sudo chmod $UID /mnt/<project_path>
...done?
Does anybody have experience using microVMs (Firecracker, Kata Containers, etc.) for this use case? Would love to hear your thoughts.
The idea is to simply use the runtime flag (after kata install):
docker run -d --runtime=kata -p 8080:8080 codercom/code-server:latest
Hope this works, with this I could keep my existing docker setup.
check it out: https://shellbox.dev
There was this HN post[0] last week on a tool for automatically shutting down the codespace container when idle.
This allows you to use Claude Code from your mobile device, in a safe environment (restricted Kubernetes pod)
I do agree with the security / cautionary comments and wouldn't leverage this setup outside a hacked together homelab.
There's not a tonne of tooling for that use case now, although it's not too hard to put together I vibe-coded something that works for my use case fairly quickly (CC + Opus 4.5 seemed to understand what's needed)
Syncthing works well for getting a local copy of a directory from the VM.
Even with npm/pip, these may not be available on a base linux box.
Even then, some complex projects may need other tools that are not part of a base system (command line tools, redis, ...).
With these powers there's a lot less back-and-forth with me running commands, copying the output, pasting it to Claude, etc.
I'm sure you've had the case where you had to instruct someone to do something (e.g. playing tech support with family, helping another engineer, etc). While it helps the other person learn, it feels soooo slow vs just doing it yourself :) And since I don't have to teach the agent, I think this approach makes sense.
just give it its own machine and let it check out any code
I PXE boot it from a known image when I feel the need
Could do the same thing on EC2 of course.
There is definitely a real world risk. You should browse the ai coding subreddits. The regularity of `rm -rf` disasters is, sadly, a great source of entertainment for me.
I once was playing around, having Claude Code (Agent A) control another instance of Claude Code (Agent B) within a tmux session using tmux's scripting. Within that session, I messed around with Agent B to make it output text that made Agent A think Agent B rm -rf'd entire codebase. It was such a stupid "prank", but seeing Agent A's frantic and worried reaction to Agent B's mistake was the loudest and only time I've laughed because of an LLM.
https://web.archive.org/web/20250622161053/https://supabase....
Now, there are some actual warnings. https://supabase.com/docs/guides/getting-started/mcp#securit...
https://old.reddit.com/r/ClaudeAI/comments/1pgxckk/claude_cl...
as
"Bash(az resource:)",
is much more permissive than
"Bash(az resource show:)",
It mostly gets it right but I instantly fix the file with the "readonly" version when it gets it too open.
And setup an .env for the project with user/password to access only a dev database.
Missing FreeBSD jails in 2026 is kind of weird (hello 1999)...
Why don't Claude Code & other AI agents offer an option to make a sound or trigger a system notification whenever they prompt for approval? I've looked into setting this up, and it seems like I'd have to wire up a script that scrapes terminal output for an approval request. Codex has had a feature request open for a while: https://github.com/openai/codex/issues/3052
I have such a love/hate relationship with VirtualBox. It's so useful but so buggy. My current installation has a bug that causes high network latency, but I'm afraid to upgrade in case it introduces new, worse bugs.
VMware is a million times better, but it is also Proprietary™
I do believe in the whole RMS "respects the user's freedoms" spiel, so all things being equal I prefer FOSS, even if it's worse - but there are limits.
There's a bug in that it can't output smart quotes “like this”
Sonnet, Opus et al think they output it but something in the pipeline is rewriting it
https://github.com/firasd/vibesbench/blob/main/docs/2026/A/t...
Try it in Claude Code and you'll see what I mean! Very weird
(Maybe I should be asking Claude this)
Edit: someone already built this: https://github.com/neko-kai/claude-code-sandbox
Or you can just mount the socket and call docker from within docker.
> Mounting the Docker socket grants the agent full access to your Docker daemon, which has root-level privileges on your system. The agent can start or stop any container, access volumes, and potentially escape the sandbox. Only use this option when you fully trust the code the agent is working with.
https://docs.docker.com/ai/sandboxes/advanced-config/#giving...
We have an updated version of Sandboxes coming out soon that uses MicroVM isolation to solve this exact problem. This next version will let your agent access a Docker instance within the MicroVM, therefore allowing you to do this securely.
It just got added to Homebrew:
brew install sandvault
Or clodpod [1] for a VM-based solutionVersion control ain’t a match for a good backup
But if you need something more strict, 'config.vm.synced_folder' also supports 'type rsync', which will copy the source folder at startup to the VM, but then it's on you to sync it back or whatever.
Thanks
docker sandbox run claude
You can have the local environment completely isolated with vagrant. But if you’re not careful with auth tokens it can (and eventually will when it gets confused)go wipe the shared dev database or the GitHub repo. The author kinda acknowledges this, but it’s glossing over a big chunk of the problem. If it can pus to GitHub, unless you’ve set up your tokens carefully it can delete things too. Having a local isolated test database separate from the shared infrastructure is a matter of a mature dev environment, which is a completely separate thing from how you run Claude. Two of the three examples cited as “no, no, no” are not protected by vagrant or docker or even EC2. It’s what tokens the agent has and needs.
- There's a cloned 'my-project' git repo on the base OS
- The 'Vagrantfile' is added to the project
- 'vagrant up', 'vagrant ssh' and claude login is run inside the VM
At this stage, besides the source code and the Claude Code token (after logging in), there are no other credentials on the VM: no SSH keys, no DB credentials, no API tokens, nothing.
There is also no need to add:
- SSH keys or GitHub tokens: because git push/pull is handled outside the VM
- DB credentials: because Claude can just install a DB inside the VM and run the project migrations against that isolated instance, not any shared/production database
API tokens can definitely be a problem if you need external service integration. But that's an explicit opt-in decision, you'd have to deliberately add those credentials to the Vagrantfile or sync them in. At that point, yes, you need proper token scoping and permissions.