The idea: the GPU is the computer, the CPU is the BIOS.
You boot a VM, program a dispatch chain of kernel instances, submit once with vkQueueSubmit, and everything — layer execution, inter-layer communication, self-regulation, compression, database queries — happens on the GPU without CPU round-trips. The CPU just provides I/O.
let vm = vm_boot()
let prog = vm_program(vm, kernels, 4)
vm_write_register(vm, 0, 0, input)
vm_execute(prog)
let result = vm_read_register(vm, 3, 30)
4 VM instances, one submit, no CPU involvement between stages.The memory model is 5 SSBOs: Registers (per-VM working memory), Metrics (regulator signals), Globals (shared mutable — KV cache, DB tables), Control (indirect dispatch params), Heap (immutable bulk data — quantized weights).
What makes it interesting:
- Homeostasis regulator: each VM instance has a kernel that monitors activation norms, memory pressure, throughput. The GPU self-regulates without asking the CPU.
- GPU self-programming: a kernel writes workgroup counts to the Control buffer, the next vkCmdDispatchIndirect reads them. The GPU decides its own workload.
- Compression as computation: Q4_K dequantization, delta encoding, dictionary lookup — these are just kernels in the dispatch chain, not a special subsystem. Adding a new codec = writing an emitter. No Rust changes.
- CPU polling: Metrics and Control are HOST_VISIBLE. CPU can poll GPU state and activate dormant VMs without rebuilding the command buffer. The GPU broadcasts needs, the CPU fulfills them.
The VM is workload-agnostic. Same architecture handles LLM inference, database queries, physics sims, graph neural networks, DSP pipelines, and game AI. We've validated all six. The dispatch chain is the universal primitive.
What's new in v1.0.0 beyond GPU VM: - 247 stdlib modules (up from 51) - Native media codecs (PNG, JPEG, GIF, MP4/H.264 — no ffmpeg) - GUI toolkit with 15+ widgets - Terminal graphics (Kitty/Sixel) - 1,169 tests passing - Still 2.3 MB, still zero external dependencies
The zero-dep thing is real — zero Rust crates. The binary links against vulkan-1 and system libs, nothing else. cargo audit has nothing to audit.
Landing page: https://octoflow-lang.github.io/octoflow/ GPU VM details: https://octoflow-lang.github.io/octoflow/gpu-vm.html GitHub: https://github.com/octoflow-lang/octoflow Download: https://github.com/octoflow-lang/octoflow/releases/latest
I'm one developer. This is early. The GPU VM works and tests pass bit-exact, but there's a lot of road ahead — real LLM inference at scale, multi-agent orchestration, the full database engine. I'd love feedback from anyone who works with GPU compute, Vulkan, or language design.