Observed Agent Sandbox Bypasses
21 points
3 days ago
| 3 comments
| voratiq.com
| HN
joshribakoff
1 hour ago
[-]
Some of these don’t really seem like they bypassed any kind of sandbox. Like hallucinating an npm package. You acknowledge that the install will fail if someone tries to reinstall from the lock file. Are you not doing that in CI? Same with curl, you’ve explained how the agent saw a hallucinated error code, but not how a network request would have bypass the sandbox. These just sound like examples of friction introduced by the sandbox.
reply
themafia
52 minutes ago
[-]
> These just sound like examples of friction introduced by the sandbox.

The whole idea of putting "agentic" LLMs inside a sandbox sounds like rubbing two pieces of sandpaper together in the hopes a house will magically build itself.

reply
jazzyjackson
25 minutes ago
[-]
Trouble is it occasionally works
reply
formerly_proven
29 minutes ago
[-]
That’s some good house-building sandpaper then.
reply
ashishb
49 minutes ago
[-]
> The swap bypassed our policy because the deny rule was bound to a specific file path, not the file itself or the workspace root.

This policy is stupid. I mount the directory read inside the container to make it impossible to do it (except for a security leak in the container itself)

reply
kaffekaka
1 hour ago
[-]
I am testing running agents in docker containers, with a script for managing different images for different use cases etc, and came across this: https://docs.docker.com/ai/sandboxes/

Has anyone given it a try?

reply
ianlevesque
17 minutes ago
[-]
Yes but it’s barely usable. I ended up making my own Dockerfile and a bash script to just ‘docker run’ my setup itself, and as a bonus you don’t need Docker Desktop. I might open source it at some point but honestly it’s pretty trivial to just append a couple of volume mount flags and env vars to your docker run and have exactly what you want included.
reply
ashishb
52 minutes ago
[-]
> Has anyone given it a try?

Yes, I don't think this will persist caches & configs outside of the current dir, for example, the global npm/yarn/uv/cargo cache or even Claude/Codex/Gemini code config.

I ended up writing my own wrapper around Docker to do this. If interested, you can see the link in my previous comments. I don't want to post the same link again & again.

reply
cbsmith
40 minutes ago
[-]
I've been using container-use to do something like that: https://container-use.com/introduction
reply
sureglymop
1 hour ago
[-]
Would test it but it requires "Desktop". Immediate no... no reason to use that.
reply