Async/Await on the GPU
59 points
1 hour ago
| 5 comments
| vectorware.com
| HN
zozbot234
59 minutes ago
[-]
I'm not quite seeing the real benefit of this. Is the idea that warps will now be able to do work-stealing and continuation-stealing when running heterogenous parallel workloads? But that requires keeping the async function's state in GPU-wide shared memory, which is generally a scarce resource.
reply
LegNeato
51 minutes ago
[-]
Yes, that's the idea.

GPU-wide memory is not quite as scarce on datacenter cards or systems with unified memory. One could also have local executors with local futures that are `!Send` and place in a faster address space.

reply
Arch485
10 minutes ago
[-]
Very cool!

Is the goal with this project (generally, not specifically async) to have an equivalent to e.g. CUDA, but in Rust? Or is there another intended use-case that I'm missing?

reply
shayonj
1 hour ago
[-]
Very cool to see this and something I have been curious about myself and exploring the space as well. I'd be curious what are some parallels and differentiations between this and NVIDIA's stdexec (outside of it being in Rust and using Future, which is also cooL)
reply
textlapse
46 minutes ago
[-]
What's the performance like? What would the benefits be of converting a streaming multiprocessor programming model to this?
reply
LegNeato
38 minutes ago
[-]
We aren't focused on performance yet (it is often workload and executor dependent, and as the post says we currently do some inefficient polling) but Rust futures compile down to state machines so they are a zero-cost abstraction.

The anticipated benefits are similar to the benefits of async/await on CPU: better ergonomics for the developer writing concurrent code, better utilization of shared/limited resources, fewer concurrency bugs.

reply
firefly2000
42 minutes ago
[-]
Is this Nvidia-only or does it work on other architectures?
reply
LegNeato
38 minutes ago
[-]
Currently NVIDIA-only, we're cooking up some Vulkan stuff in rust-gpu though.
reply
monster_truck
16 minutes ago
[-]
I don't have anything to offer but my encouragement, but there are _dozens_ of ROCm enjoyers out there.

In years prior I wouldn't have even bothered, but it's 2026 and AMD's drivers actually come with a recent version of torch that 'just works' on windows. Anything is possible :)

reply