Genuinely trying to understand if this is painful enough to pay to avoid or just annoying for a few weeks and then fine.
(I have a landing page but no product yet, posting to validate before building!)
1. Cold start latency killed iteration loops. Spinning up a GPU VM to test a 10-minute sim run took longer than the sim itself — you'd wait 3-5 min for the instance, run 8 min, tear down. That per-iteration overhead crushes exploration.
2. Idle billing. If you're grid-searching over reward functions, you want to fire 20 parallel runs, collect results, tune, repeat — but most providers bill per-hour so even a 12-minute run costs you a full hour.
3. Physics sim + CUDA dependencies. Custom CUDA kernels (warp sim, etc.) often need specific driver versions. Docker helps but image build/push overhead adds another 5-10 min to the loop.
The "CI for sims" framing (push code → run on GPU automatically) directly addresses #1 and #3. Worth building.
On the infrastructure layer: we built GhostNexus (https://ghostnexus.net) to address #1 and #2 — per-second billing, <30s cold starts on RTX 4090 hardware, Python SDK with 3 lines to submit a job. Might be worth using as the GPU backend if you don't want to manage the infra layer yourself. (Disclaimer: I'm the founder.)
I think that if you can write blog phostthat linksto a repo with a demo of your product and a few images/videos of the test, it will get more upvotes from the general public.