Launch HN: Cua (YC X25) – Open-Source Docker Container for Computer-Use Agents
136 points
18 hours ago
| 21 comments
| github.com
| HN
Hey HN, we’re Francesco and Alessandro, the creators of c/ua (https://www.trycua.com), a Docker‑style container runtime that lets AI agents drive full operating systems in lightweight, isolated VMs. Our entire framework is open‑source (https://github.com/trycua/cua), and today we’re thrilled to have our Launch HN!

Check out our demo to see it in action: https://www.youtube.com/watch?v=Ee9qf-13gho, and for more examples - including Tableau, Photoshop, CAD workflows - see the demos in our repo: https://github.com/trycua/cua.

For Computer-Use AI agents to be genuinely useful, they must interact with your system's native applications. But giving full access to your host device is risky. What if the agent's process gets compromised, or the LLM hallucinates and leaks your data? And practically speaking, do you really want to give up control of your entire machine just so the agent can do its job?

The idea behind c/ua is simple: let agents operate in a mirror of the user’s system - isolated, secure, and disposable - so users can fire-and-forget complex tasks without needing to dedicate their entire system to the agent. By running in a virtualized environment, agents can carry out their work without interrupting your workflow or risking the integrity of your system.

While exploring this idea, I discovered Apple’s Virtualization.Framework and realized it offered fast and lightweight virtualization on Apple Silicon. This led us to build a high-performance virtualization layer and, eventually, a computer-use interface that allows agents to interact with apps just like a human would - without taking over the entire system.

As we built this, we decided to open-source the virtualization core as a standalone CLI tool called Lume (Show HN here: https://news.ycombinator.com/item?id=42908061). c/ua builds on top of Lume, providing a full framework for running agent workflows inside secure macOS or Linux VMs, so your system stays free for you to use while the agent works its magic in the background.

With Cua you can build an AI agent within a virtual environment to: - navigate and interact with any application's interface; - read screen content and perform keyboard/mouse actions; - switch between applications and self-debug when needed; - operate in a secure sandbox with controlled file access. All of this occurs in a fully isolated environment, ensuring your host system, files, and sensitive data remain completely secure, while you continue using your device without interruption.

People are using c/ua to: - Bypass CryptoJS-based encryption and anti-bot measures to interact with modern web apps reliably; - Automate Tableau dashboards and export insights via Claude Desktop; - Drive Photoshop for batch image editing by prompt; - Modify 3D models in Fusion 360 with a CAD Copilot; -Extract data from legacy ERP apps without brittle screen‑scraping scripts.

We’re currently working on multi‑VM orchestration for parallel agentic workflows, Windows and Linux VM support, and episodic and long-term memory for CUA Agents.

On the open‑source side, c/ua is 100 % free under the MIT license - run it locally with any LLM you like. We’re also gearing up a hosted orchestration service for teams who want zero‑ops setup (early access sign‑ups opening soon).

We’d love to hear from you. What desktop or legacy apps do you wish you could automate? Any thoughts, feedback, or horror stories from fragile AI automations are more than welcome!

gavinbains
8 hours ago
[-]
Legendary. This is going to be very helpful, and the TAM is getting bigger. Thank you guys for this, and for all the learnings in-batch -- I'm excited for the future!

I reckon I could run this for buying fashion drops, is this a use case y'all have seen?

reply
frabonacci
2 hours ago
[-]
Appreciate that a lot! Yep - buying fashion drops, limited releases, ticketing, etc. are all great fits. Cua can also bypass CryptoJS-based encryption and other anti-bot measures, so it plays nicely with modern web apps out of the box.
reply
badmonster
4 hours ago
[-]
Congrats on the launch! love this idea. How does the LLM interact with the VM—screen+metadata as JSON, or higher-level planning?
reply
frabonacci
2 hours ago
[-]
Thanks, really appreciate it!

The LLM interacts with the VM through a structured virtual computer interface (cua-computer and cua-agent). It’s a high-level abstraction that lets the agent act (e.g., “open Terminal”, “type a command”, “focus an app”) and observe (e.g., current window, file system, OCR of the screen, active processes) in a way that feels a lot more like using a real computer than parsing raw data.

So under the hood, yes, screen+metadata are used (especially with the Omni loop and visual grounding), but what the model sees is a clean interface designed for agentic workflows - closer to how a human would think about using a computer.

If you're curious, the agent loops (OpenAI, Anthropic, Omni, UI-Tars) offer different ways of reasoning and grounding actions, depending on whether you're using cloud or local models.

https://github.com/trycua/cua/tree/main/libs/agent#agent-loo...

reply
brap
16 hours ago
[-]
Congrats on the launch!

I don’t know if this is a problem you’ve faced, but I’m curious: how do LLM tool devs handle authn/authz? Do host apps normally forward a token or something? Is there a standard commonly used? What if the tool needs some permissions to act on the user’s behalf?

reply
alexchantavy
13 hours ago
[-]
There are companies like https://www.keycard.sh/ taking this on. There are other competitors too but I can't think of them atm
reply
frabonacci
12 hours ago
[-]
Good question! Specifically around computer-use agents (CUAs), I haven't seen much exploration yet - and I think it’s an area worth exploring for vertical products. For example, how do you securely handshake between a CUA agent and an API-based agent without exposing credentials? If everything stays within a local cluster, it's manageable, but once you start scaling out, authn/authz becomes a real headache.

I'm also working on a blog post that touches on this - particularly in the context of giving agents long-term and episodic memory. Should be out next week!

reply
dhruv3006
5 hours ago
[-]
One-shot VM would be nice. ephemeral VM spins up, agent runs task, VM is deleted —perfect for CI pipelines.
reply
frabonacci
2 hours ago
[-]
100% - ephemeral VMs are on the roadmap. Perfect for CI: spin up, run the agent, nuke it
reply
sagarpatil
6 hours ago
[-]
Love your accent!
reply
frabonacci
2 hours ago
[-]
Thank you!!
reply
orliesaurus
12 hours ago
[-]
bravi! the future is the Agent OS - How robust is the UI element detection and interaction across different apps and inside navigating complex menus? Is it resistant to UI changes? That's often where these automations get brittle.

thank you e forza Cua

reply
frabonacci
2 hours ago
[-]
UI detection’s a big focus - we use visual grounding + structured observations (like icons, OCR, app metadata, window state), so the agent can reason more like a user would. It’s surprisingly robust even with layout shifts or new themes
reply
winwang
16 hours ago
[-]
Congrats! How do you guys deal with SOC2/HIPAA/etc.? Or are those separate concerns?
reply
frabonacci
12 hours ago
[-]
Thanks! Great question - those are definitely relevant, but they depend a lot on the deployment model. Since CUAs often run locally or in controlled environments (e.g. a user’s own VM or cluster), we can sidestep a lot of traditional SOC2/HIPAA concerns around centralized data handling. That said, if you're running agents across org boundaries or processing sensitive data via cloud APIs, then yeah - those frameworks absolutely come into play.

We're designing with that in mind: think fine-grained permissioning, auditability, and minimizing surface area. But it’s still early, and a lot of it depends on how teams end up using CUAs in practice.

reply
taikon
12 hours ago
[-]
How's it different from e2b computer use?
reply
frabonacci
12 hours ago
[-]
We’re still figuring things out in public, but a few key differences:

- Open-source from the start. Cua’s built under an MIT license with the goal of making Computer-Use agents easy and accessible to build. Cua's Lume CLI was our first step - we needed fast, reproducible VMs with near-native performance to even make this possible.

- Native macOS support. As far as we know, we’re the only ones offering macOS VMs out of the box, built specifically for Computer-Use workflows. And you can control them with a PyAutoGUI-compatible SDK (cua-computer) - so things like click, type, scroll just work, without needing to deal with any inter-process communication.

- Not just the computer/sandbox, but the agent too. We’re also shipping an Agent SDK (cua-agent) that helps you build and run these workflows without having to stitch everything together yourself. It works out of the box with OpenAI and Anthropic models, UI-Tars, and basically any VLM if you’re using the OmniParser agent loop.

- Not limited to Linux. The hosted version we’re working on won’t be Linux-only - we’re going to support macOS and Windows too.

reply
orliesaurus
12 hours ago
[-]
Active development of CUA, according to GitHub
reply
tomatohs
17 hours ago
[-]
Would love to use this for TestDriver, but needs to support Windows :*(
reply
frabonacci
16 hours ago
[-]
Windows host support is on our roadmap - we're currently exploring virtualization options with KVM/QEMU. Please join the discussion on our Discord: https://discord.com/invite/mVnXXpdE85
reply
brene
17 hours ago
[-]
will this also be available as a hosted service? Or do you have instructions on how to manage a fleet of these manually while you're building the orchestration workflows?
reply
frabonacci
16 hours ago
[-]
Yes, we’re currently running pilots with select customers for a hosted service of Cua supporting macOS and Windows cloud instances. Feel free to reach out with your use case at founders@trycua.com
reply
rahimnathwani
17 hours ago
[-]
I tried this three times. Twice a few days ago and once just now.

First time: it opened a MacOS VM and started to do stuff, but it got ahead of itself and starting typing things in the wrong place. So now that VM has a Finder window open, with a recent file that's called

  plt.ylabel('Price(USD)').sh
The second and third times, it launched the VM but failed to do anything, showing these errors:

  INFO:cua:VM run response: None
  INFO:cua:Waiting for VM to be ready...
  INFO:cua:Waiting for VM macos-sequoia-cua_latest to be ready (timeout: 600s)...
  INFO:cua:VM status changed to: stopped (after 0.0s)
  DEBUG:cua:Waiting for VM IP address... Current IP: None, Status: stopped
  DEBUG:cua:Waiting for VM IP address... Current IP: None, Status: stopped
  DEBUG:cua:Waiting for VM IP address... Current IP: None, Status: stopped
  INFO:cua:VM status changed to: running (after 12.4s)
  INFO:cua:VM macos-sequoia-cua_latest got IP address: 192.168.64.2 (after 12.4s)
  INFO:cua:VM is ready with IP: 192.168.64.2
  INFO:cua:Initializing interface for macos at 192.168.64.2
  INFO:cua.interface:Logger set to INFO level
  INFO:cua.interface.macos:Logger set to INFO level
  INFO:cua:Connecting to WebSocket interface...
  INFO:cua.interface.macos:Waiting for Computer API Server to be ready (timeout: 60s)...
  INFO:cua.interface.macos:Attempting WebSocket connection to ws://192.168.64.2:8000/ws
  WARNING:cua.interface.macos:Computer API Server connection lost. Will retry automatically.
  INFO:cua.interface.macos:Still waiting for Computer API Server... (elapsed: 10.0s, attempts: 11)
  INFO:cua.interface.macos:Still waiting for Computer API Server... (elapsed: 20.0s, attempts: 21)
  INFO:cua.interface.macos:Still waiting for Computer API Server... (elapsed: 30.0s, attempts: 31)
  WARNING:cua.interface.macos:Computer API Server connection lost. Will retry automatically.
  INFO:cua.interface.macos:Still waiting for Computer API Server... (elapsed: 40.0s, attempts: 41)
  INFO:cua.interface.macos:Still waiting for Computer API Server... (elapsed: 50.1s, attempts: 51)
  ERROR:cua.interface.macos:Could not connect to 192.168.64.2 after 60 seconds
  ERROR:cua:Failed to connect to WebSocket interface
  DEBUG:cua:Computer initialization took 76856.09ms
  ERROR:agent.core.agent:Error in agent run method: Could not connect to WebSocket interface at 192.168.64.2:8000/ws: Could not connect to 192.168.64.2 after 
  60 seconds
  WARNING:cua.interface.macos:Computer API Server connection lost. Will retry automatically.
This was using the gradio interface, with the agent loop provider as OMNI and the model as gemma3:4b-it-q4_K_M

These versions:

  cua-agent==0.1.29
  cua-computer==0.1.23
  cua-core==0.1.5
  cua-som==0.1.3
reply
frabonacci
16 hours ago
[-]
Thanks for trying out c/ua! We still recommend pairing the Omni loop configuration with a more capable VLM, such as Qwen2.5-VL 32B, or using a cloud LLM provider like Sonnet 3.7 or OpenAI GPT-4.1. While we believe that in the coming months we'll see better-performing quantized models that require less memory for local inference, truth is we're not quite there yet.

Stay tuned - we're also releasing support for UI-Tars-1.5 7B this week! It offers excellent speed and accuracy, and best of all, it doesn't require bounding box detection (Omni) since it's a pixel-native model.

reply
rahimnathwani
16 hours ago
[-]
Thanks. I'll try that, but right now it's not working at all, i.e. cua can't interact with the VM at all. That's a not a model issue.
reply
frabonacci
16 hours ago
[-]
If you're running Cua from VS Code or Cursor, have you checked out this issue? https://github.com/trycua/cua/issues/61

Feel free to ping me on Discord (I'm francesco there) - happy to hop on a quick call to help debug: https://discord.com/invite/mVnXXpdE85

reply
xdotli
16 hours ago
[-]
THIS IS FIRE been wanting this for ages
reply
frabonacci
12 hours ago
[-]
Thank you for your support!
reply
jameskuj
17 hours ago
[-]
A superfan of this product!
reply
frabonacci
17 hours ago
[-]
Thank you - your support means a lot to us!
reply
3s
13 hours ago
[-]
this is really cool! congrats on the launch
reply
frabonacci
12 hours ago
[-]
Thank you - we appreciate your support!
reply
throw03172019
14 hours ago
[-]
This is precisely what I am looking for but for Windows. We need to automate some Windows native apps.

In the meantime, I’ll give this a shot on macOS tonight. Congrats!

reply
frabonacci
12 hours ago
[-]
Yes - pig.dev is a great product! You should definitely check it out.

Also, let us know on Discord once you’ve tried out c/ua locally on macOS: https://discord.com/invite/mVnXXpdE85

reply
shykes
13 hours ago
[-]
Check out pig: https://pig.dev

(I am not affiliated)

reply
throw03172019
12 hours ago
[-]
I do recall looking at it before but was concerned about HIPAA if they are storing data on their servers as well.

Also, is the project still active? No commits for 2 months is odd for a YC startup in current batch :)

reply
farazmsiddiqi
16 hours ago
[-]
i love this — isolation and permissioning for computer use agents. why can’t i use regular docker containers to deploy my computer use agent?
reply
frabonacci
2 hours ago
[-]
Glad you love it! Right now, we’re relying more on the Lume CLI and its API server rather than a full Docker setup. However, we’ll soon be shipping a Docker interface that’ll handle VNC and model hosting (through docker model runner). Stay tuned for that!
reply
ekarabeg
17 hours ago
[-]
Congrats on the launch! Awesome product!
reply
frabonacci
16 hours ago
[-]
Thanks — we really appreciate your support!
reply
zwenbo
17 hours ago
[-]
Amazing product! Congrats on the launch!
reply
frabonacci
17 hours ago
[-]
Thank you so much - we truly appreciate your support!
reply
mountainriver
17 hours ago
[-]
This is cool! We built a similar thing with AgentDesk https://github.com/agentsea/agentdesk

Would love to chat sometime!

reply
abshkbh
4 hours ago
[-]
https://github.com/abshkbh/arrakis Also building in this space using MicroVMs. Currently working on a Mac port. Would love to connect - abshkbh AT gmail.com
reply
frabonacci
16 hours ago
[-]
I love AgentDesk’s take on Kubernetes - it’s something we had considered as well, but it didn’t make much sense for macOS since you can only spin up two macOS VMs at a time due to Apple’s licensing restrictions.

Feel free to join our Discord so we can chat more: https://discord.com/invite/mVnXXpdE85

reply
mountainriver
15 hours ago
[-]
Thats a fantastic way to get your IP banned :)
reply
reindent
16 hours ago
[-]
That's great.

Also built something on top of Browser Use (Nanobrowser) and Docker.

https://github.com/reindent/nanomachine

Just finished planning and shell capabilities

Lets chat @reindentai (X)

reply
frabonacci
16 hours ago
[-]
Sure - just followed you back!
reply
swanYC
14 hours ago
[-]
Love this !
reply
frabonacci
12 hours ago
[-]
Thank you - we appreciate it!
reply
SkylerJi
16 hours ago
[-]
This is insane y'all
reply
frabonacci
12 hours ago
[-]
Thank you - we appreciate it!
reply