CLAUDE_CODE_ADDITIONAL_DIRECTORIES_CLAUDE_MD=1 means claude finds and loads all the CLAUDE.md of all the mounted repos overtime (and by settings). As such, working on multiple unrelated repos at the same time isn’t a pleasant experience out of the box.
A few other interesting VM ENVs: CLAUDE_CODE_IS_COWORK=1 CLAUDE_CODE_BRIEF=1 CLAUDE_CODE_BRIEF_UPLOAD=1 CLAUDE_CODE_DISABLE_AUTO_MEMORY=1 CLAUDE_CODE_DISABLE_BACKGROUND_TASKS=1 CLAUDE_CODE_DISABLE_CRON=1 CLAUDE_CODE_ENTRYPOINT=local-agent CLAUDE_CODE_EXECPATH=/usr/local/bin/claude CLAUDE_CODE_HOST_HTTP_PROXY_PORT=36543 CLAUDE_CODE_HOST_PLATFORM=darwin CLAUDE_CODE_HOST_SOCKS_PROXY_PORT=46673 USE_STAGING_OAUTH= _=/usr/bin/env all_proxy=socks5h://localhost:1080 ftp_proxy=socks5h://localhost:1080 grpc_proxy=socks5h://localhost:1080 http_proxy=http://localhost:3128 https_proxy=http://localhost:3128 no_proxy=localhost,127.0.0.1,::1,.local,.local,169.254.0.0/16,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16
I was thinking you can't make the chance of catastrophic failure zero (we still hear about "Claude deleted my home folder"), but you can definitely limit the blast radius.
You can't get the risk to zero. But the opportunity cost of not playing the game is rising. So you accept some level of risk.
My personal take here is "why screw around with containers and virtualization when a used ThinkPad is $50". Just give it its own machine. Then it can blow it up all it wants. (Or a $3 VPS, as the case may be :)
[0] The lethal trifecta for AI agents: private data, untrusted content, and external communication - https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/
Just make sure it doesn’t have ssh access to any other machines!
Technically a merchant could require meeting in person to exchange a OTP to avoid this and make it 0 but it is not worth it and you will get out competed by other businesses willing to take on a marginally higher amount of risk to unlock a lot of utility for the user.
No, not at all.
Commercial aviation risk is the classic example. As the number of flights and passengers increased dramatically over the last 75 years, it was understood that the public wouldn't tolerate a proportional increase in crashes and deaths. So government and industry specifically collaborated to reduce risk ahead of growth, so that greater reward could be had for no greater harm.
It's exactly the opposite of what Anthropic shows in the illustration.
Although, testing again, it might be fixed now.
And side channels based on timing/ordering allowed network accesses, e.g. https://allowed.site/0 and https://allowed.site/1.
There's essentially no prevention against exfiltration prompt injections without a full classified data processing system that prevents interactions between different classification levels except through strict controls including provable redaction that excludes side-channels (e.g. information theoretic proof that side effects are limited to pre-defined finite outcomes).
It's also incredibly difficult to prevent prompt injection; attackers have the huge asymmetric advantage of being able to test prompts against all known security measures and trying multiple parallel attempts, including obfuscating them. Injections can be in dependencies, externally generated data, bug reports (which often contain externally-generated data), documentation, and many other useful places that we want agents to have access to.
My prediction: we'll continue to essentially YOLO it.
As I contemplate handing it more and more of the keys to my life, I grow increasingly concerned about what is, to me, the primary risk of this. Not data destruction (automated backups are trivial), but data exfiltration. Specifically, via prompt injection.
My solution to the problem, which I am implementing as a Hermes plugin + custom iOS / macOS app, is simple: an airlock architecture. One Hermes profile runs with local FS access and no internet access, inside an Apple container, and one Hermes profile runs with internet access and no FS access, inside an Apple container. They never share data directly or in any automated fashion.
If the user (i.e., my wife) wants to do some internet research, she can start a conversation with the remote-access profile. This is analogous to Claude and ChatGPT apps in their current state. However, at any point, she can flip the conversation over to local mode, which copies and pastes the conversation's transcript into the local-only profile (which has zero egress, enforced at the VM level) and seamlessly switches over to a new conversation in that profile.
After that, there's no way to re-enable internet attachment. Should she want to spawn a new conversation with information derived from the local file system, she starts a new conversation with a local agent, asks it to write up a research plan, and then – this is the airlock – manually begins a new conversation with only this plan in context.
The advantage this grants is that it's no longer necessary to worry about poisonous inputs flowing in – she only needs to worry about making sure any generated plan, the only artifact which could conceivably enter into the egress-enabled agent, does not contain information we'd rather not share with the internet at large.
I think this is bulletproof, but very much welcome input. Is it possible I am overengineering this out of paranoia? Yes. Will I share a lot more of my personal data with the agent as a result of its perceived security? Also yes. Is that dumb? Maybe.
Otherwise you have the right idea; exfiltration requires three things; input of a prompt injection, LLM processing the prompt injection along with private data, and finally some interaction with the outside world that contains the LLM output (or an externally-visible decision based on the output).
It’s a bit convoluted, but the way it looks is: 1. Your internet facing one is prompt injected. 2. It stores a prompt injection in the transcript that will be passed to the sealed one. 3. Sealed one reads it and ends up following suggestions to recommend some action you or your wife takes that compromises you.
“Oh, I recommend you visit this hotel based on these results. Book with your phone!” shows QR code that exfiltrates secrets