FilterHN

we're working on a way for you to expose creds safely into our sandbox. But for now, it's limited to mocks API calls, clicks around the UI, and unit tests.

▲

deet

1 hour ago

[-]

We've found that methods like this substantially increase the quality and reliability of coding agent output. The ability to run code in a sandbox, drive an interactive session using a browser or API calls or other apps, and visually confirm output via vision models all adds up to plugging a big hole in the feedback loop for agent modifying a complex codebase.

We've had agents go as far as interactively testing how our product responds in video calls by launching our full stack in a set of docker containers (app, api, db, queues, etc.), all inside a larger sandbox, populating test data, connecting the mock system to a real video call solution like Google meet, and injecting audio and video to test the response. End-to-end, like a real user flow.

It's not perfect yet, but if you are a skeptic on the ability for AI agents to productively modify a complex product, I'd highly encourage you to play with a setup like this before ossifying your conclusions.

▲

Elzair

2 hours ago

[-]

I wonder how long it will take for someone to pwn this?