FilterHN

Show HN: Live VNC for web agents – debugging native captcha on Cloud Run

10 points

by quarkcarbon279

1 day ago

| past

| 0 comments

| rtrvr.ai

| HN

Hi HN, Bhavani here (rtrvr.ai).

We build DOM-native web agents (no screenshot-based vision, no CDP/Playwright debugger-port control). We handle captchas natively including Google reCAPTCHA image challenges by traversing cross-origin iframes and shadow DOM. The latency is high on this one currently.

The problem: when debugging image selection captchas ("select all images with traffic lights"), logs don't tell you why the agent clicked the wrong tiles. I found myself staring at execution logs thinking "did it even see the grid correctly?" and realized I just wanted to watch it work.

So we built live VNC view + takeover for serverless Chrome workers on Cloud Run.

Key learnings:

1. Session affinity is best-effort; "attach later" can hit a different instance

2. A separate relay service that pairs viewer↔runner by short-lived tokens makes attach deterministic

3. Runner stays clean: concurrency=1, one browser per container, no mixed traffic

Would love feedback from folks who've shipped similar:

1. What replaced VNC for you (WebRTC etc) and why?

2. Best approach for recording/replay without huge storage?

3. How do you handle "attach later" safely in serverless?

No one has commented on this post.