The GPU programming seems to be both super low level, but also high level, cause textures and descriptors need these ultra specific data format's, and then the way you construct and upload those formats are very complicated and change all the time.
Is there really no way to simplify this ?
Regular vertex data was supposed to be strictly pre formatted in pipeline too, util it was not suddenly, and now we can just give the shader a `device_address`extension memory pointer and construct the data from that.
I've brought it up several times when talking with folks who work down in the chip level for optimizing these operations and all I can say is, there are a lot of unforeseen complications to what we're suggesting.
It's not that we can't have a GPU that does these things, it's apparently more of a combination of previous and current architectural decisions that don't want that. For instance, an nVidia GPU is focused on providing the hardware optimizations necessary to do either LLM compute or graphics acceleration, both essentially proprietary technologies.
The proprietariness isn't why it's obtuse though, you can make a chip go super-duper fast for specific tasks, or more general for all kinds of tasks. Somewhere, folks are making a tradeoff of backwards compatibility and supporting new hardware accelerated tasks.
Neither of these are "general purpose compute and data flow" focuses. As such, you get the GPU that only sorta is configurable for what you want to do. Which in my opinion explains your "GPU programming seems to be both super low level, but also high level" comment.
That's been my experience. I still think what you're suggesting is a great idea and would make GPU's a more open compute platform for a wider variety of tasks, while also simplifying things a lot.
You're free to do what you're asking after by simply performing all operations manually in a compute shader. You can manually clip, transform, rasterize, and even sample textures. But you'll lose the implicit use of various fixed function hardware that you currently benefit from.
Even on modern hardware there's still a lot of architectural differences to reconcile at the API level.
There’s an old concurrency pattern where a producer and consumer tag team on two sets of buffers to speed up throughput. Producer fills a buffer, transfers ownership to the consumer, and is given the previous buffer in return.
It is structurally similar to double buffered video, but for any sort of data.
It seems like Rust would be good for proving the soundness. And it should be a library now rather than a roll your own.
This is why I try to encourage new Linux users away from Ubuntu: it's a laggard with, often important, functionality. It is now an enterprise OS (where durability is more important than functionality), it's not really suitable for a power user (like someone who would use Zed).
Mixing and matching the kernel and userspace mesa components is subject to limitations. However it will transparently fall back to software rendering so you might not notice if you aren't doing anything intensive.
Related, being a container flatpak has no choice but to ship the mesa userspace component. If it didn't nothing would work.
Some of the shader compilers require LLVM which is a giant dependency to say the least. But with Valve's ACO for RADV I think that could technically be omitted.
You run into the same problem on other platforms too of course (eg Android)
Ubuntu has never ever been the most stable or useful distro. What it did have was apt and more up to date stuff than debian.
I would never willingly choose Ubuntu if allowed other options (Fedora, Debian, maybe CoreOS, etc)
But I mostly agree with you. Once you get out of that phase, I don't really see much value in Ubuntu. I'd pick pretty much anything else for everything I do these days. Debian/Fedora/Alpine on the server. Arch on the desktop.
You really want enterprise standards support for your graphics API.
Bleeding edge ...is not nice in graphics. Especially the more complex the systems get, so do the edge cases.
I mean in general. If you are writing a high end game engine don't listen to me, you know better. But if you are a mid-tier graphics wonk like myself 20 year old concepts are usually quite pareto-optimal for _lots_ of stuff and should be robustly covered by most apis.
If I could give one advice for myself 20 years ago.
For anything practical - focus on the platform native graphics API. Windows - DirectX. Mac - OpenGL (20 years ago! Predates metal!. Today ofc would be metal).
I don't think that advice would be much different today (apart from Metal) IF you don't know what to do and just want to start on doing graphics. For senior peeps who know the field do whatever rights for you of course.
Linux - good luck. Find the API that has best support for your card & driver combo - meaning likely the most stabilized with most users.
If you want something extremely reliable, more modern, but may require some learning to tweak: Silverblue or Kinoite.
I love Debian, it's a great distro. It's NOT the distro I'd pick to drive things like my laptop or personal development machine. At least not if you have even a passing interest in:
- Using team communication apps (slack/teams/discord)
- Using software built for windows (Wine/Proton)
- Gaming (of any form)
- Wayland support (or any other large project delivering new features relatively quickly)
- Hardware support (modern linux kernels)
I'd recommend it immediately as a replacement for Ubuntu as a server, but I won't run it for daily drivers.
Again - Arch (or it's derivatives) are basically the best you can get in that space.
The stable/testing/etc distinction doesn't really help, either, because it's an alien concept to those outside of technical spheres.
I strongly believe that the Fedora model is the best fit for the broadest spread of users. Arch is nice for those capable of keeping it wrangled but that's a much smaller group of people.
I'll add - I think the complexity is somewhat "over-stated" for Arch at this point. There was absolutely a period where just reading the entire install guide (much less actually completing it) was enough to turn a large number of even fairly technical people off the distro. Archinstall removed a lot of that headache.
And once it's up, it's generally just fine. I moved both my spouse and my children to Arch instead of Windows 11, and they don't seem particularly bothered. They install most of their own software using flatpaks through the store GUI in Gnome, or through Steam, the browser does most of the heavy lifting these days anyways.
I basically just grab their machine and run `pacman -Syu` on it once in a while, and help install something more complicated once in a blue moon.
Still requires someone who doesn't mind dropping into a terminal, but it's definitely not what I'd consider "all that challenging".
Even as someone who uses the terminal daily it's more involved than I really care for.
Stable is stable as in "must not be broken at all costs" kind of stable.
basically everything works just fine. there's occasionally a rare crash or gnome reset where you need to login again, but other than that not many problems.
It's not that Debian is a bad release, but it's the difference in a game on steam being completely unavailable for a few hours (Arch) or 10 days (Debian testing) due to an upstream issue.
I swapped a while back, mostly because I kept hitting issues that are accurately described and resolved by steps coming from Arch's community, even on distros like Debian and Fedora.
---
The power in debian is still that Ubuntu has made it very popular for folks doing commercial/closed source releases to provide a .deb by default. Won't always work... but at least they're targeting your distro (or almost always, ubuntu, but usually close enough).
Same for Fedora with the Redhat enterprise connections.
But I've generally found that the community in Arch is doing a better job at actually dogfooding, testing, and fixing the commercial software than most of the companies that release it... which is sad, but reality.
Arch has plenty of its own issues, but "Stale software" isn't the one to challenge it on. Much better giving it a pass due to arch/platform support limitations, security or stability needs, etc... All those are entirely valid critiques, and reasonable drivers for sticking to something like Debian.
There are times where there are known bugs in Debian which are purposely not fixed but instead documented and worked around. That’s part of the stability promise. The behaviour shall not change which sometimes includes “bug as a feature”
Servers and headless boxes use stable and all machines are updated regularly. Most importantly, stable to stable (i.e. 12 to 13) upgrades takes around 5 minutes incl. final reboot.
I reinstalled Debian once. I had to migrate my system to 64 bit, and there was no clear way to move from 32 to 64 bit at that time. Well, once in 20 years is not bad, if you ask me.
Arch is a wonderful daily driver distro for folks who can deal with even a small amount of configuration.
Excellent software availability through AUR, excellent update times (pretty much immediate).
The only downside is there's not a ton of direct commercial software packaged for it by default (ex - most companies they care give a .deb or a .rpm) but that's easily made up for by the rest of AUR.
It's not even particularly hard to install anymore - run `archinstall` https://wiki.archlinux.org/title/Archinstall make some choices, get a decent distro.
Throw in that steam support is pretty great... and it's generally one of the best distros available right now for general use by even a moderate user.
Also fine as a daily driver for kids/spouses as long as there's someone in the house to run pacman every now and then, or help install new stuff.
It slows down for a couple months around release, but generally provides pretty reliable & up to date experience with a very good OS.
Dance dance the red spiral.
I'm not quite bold enough to recommend it to people but if anyone asks I would definitely say yes to running sid. Apt-pin for testing at low priority is good to have, just because sometimes there's lag when one library updates for everyone using it to update, and you can get unsatisfiable dependencies.
While I agree with your general point, RHEL stands out way, way more to me. Ubuntu 22.04 and RHEL 9 were both released in 2022. Where Ubuntu 22.04 has general support until mid-2027 and security support until mid-2032, RHEL 9 has "production" support through mid-2032 and extended support until mid-2034.
Wikipedia sources for ubuntu[0] and RHEL [1]:
[0] https://en.wikipedia.org/wiki/Ubuntu#Releases
[1] https://upload.wikimedia.org/wikipedia/en/timeline/fcppf7prx...
So I was sad not to be able to run a text editor (let's be honest, Zed is nice but it's just displaying text). And somehow the non-accelerated version is eating 24 cores. Just for text.
https://github.com/zed-industries/zed/discussions/23623
I ended up buying a new graphics card in the end.
I just wish everyone could get along somehow.
Sublime Text spent over a decade tuning their CPU renderer and it still didn't cut it at high resolutions.
https://www.sublimetext.com/blog/articles/hardware-accelerat...
What does help is an industry accepted benchmark, easily ran by everyone. I remember browser css being all over the place, until that whatsitsname benchmark (with the smiley face) demonstrated which emperors had no clothes. Everyone could surf to the test and check how well their favorite browser did. Scores went up quickly, and today, css is in a lot better shape.
This was so much more practical before the market coalesced to just 3 players. Matrox, it's time for your comeback arc! and maybe a desktop pcie packaging for mali?
So this goes into Vulkan. Then it has to ship with the OS. Then it has to go into intermediate layers such as WGPU. Which will probably have to support both old and new mode. Then it has to go into renderers. Which will probably have to support both old and new mode. Maybe at the top of the renderer you can't tell if you're in old or new mode, but it will probably leak through. In that case game engines have to know about this. Which will cause churn in game code.
And Apple will do something different, in Metal.
Unreal Engine and Unity have the staffs to handle this, but few others do. The Vulkan-based renderers which use Vulkan concurrency to get performance OpenGL can't deliver are few. Probably only Unreal Engine and Unity really exploit Vulkan properly.
Here's the top level of the Vulkan changes.[1] It doesn't look simple.
(I'm mostly grumbling because the difficulty and churn in Vulkan/WGPU has resulted in three abandoned renderers in Rust land through developer burnout. I'm a user of renderers, and would like them to Just Work.)
[1] https://docs.vulkan.org/refpages/latest/refpages/source/VK_E...
it's not.
descriptor sets are realistically never getting deprecated. old code doesn't have to be rewritten if it works. there's no point.
if you're doing bindless (which you most certainly arent if you're still stuck with descriptor sets) this offers a better way of handling that.
if you care to upgrade your descriptor set based path to use heaps, this extension offers a very nice pathway to doing so _without having to even recompile shaders_.
for new/future code, this is a solid improvement.
if you're happy where you are with your renderer, there isn't a need to do anything.
BDA, dynamic rendering and shader objects almost make Vulkan bearable. What's still sorely missing is a single-line device malloc, a default queue that can be used without ever touching the queue family API, and an entirely descriptor-free code path. The latter would involve making the NV bindless extension the standard which simply gives you handles to textures, without making you manage descriptor buffers/sets/heaps. Maybe also put an easy-path for synchronization on that list and making the explicit API optional.
Until then I'll keep enjoying OpenGL 4.6, which already had BDA with c-style pointer syntax in glsl shaders since 2010 (NV_shader_buffer_load), and which allows hassle-free buffer allocation and descriptor-set-free bindless textures.
- with DXVK to play games - with llama.cpp to run local LLMs
Vulkan is already everywhere, from games to AI.
From the linked video, "Feature parity with OpenCL" is the thing I'm most looking forward to.
However it looks like it's simpler to change your shaders (if you can) to use the new GLSL/SPIR-V functionality (or Slang) and don't specify the root signature at all (it's complex and verbose).
Descriptor heaps really reduce the amount of setup code needed, with pipeline layouts gone you can drop like third of the code needed to get started.
Similar in magnitude to dynamic rendering.
The current OpenGL-like sediment-layer-model (e.g. never remove old stuff) is extremely confusing when not following Vulkan development very closely since 2016, since there's often 5 ways to do the same thing, 3 of which are deprecated - but finding out whether a feature is deprecated is its own sidequest.
What I actually wrestled with most was getting the outer frame-loop right without validation layer errors. I feel like this should be the next thing which the "Eye of Khronos" should focus on.
All official tutorial/example code I've tried doesn't run without swapchain-sync-related validation errors on one or another configuration. Even this 'best practices' example code which demonstrates how to do the frame-loop scaffolding correctly produces valiation layer errors, so it's also quite useless:
https://docs.vulkan.org/guide/latest/swapchain_semaphore_reu...
What's worse: different hardware/driver combos produce different validation layer errors (even in the swapchain-code which really shouldn't have different implementations across GPU vendors - e.g. shouldn't Khronos provide common reference code for those GPU-independent parts of drivers?). I wonder if there is actually any Vulkan code out there which is completely validation-layer-clean across all possible configs (I seriously doubt it).
Also the VK_[EXT/KHR]_swapchain_maintenance1 extension which is supposed to fix all those little warts has such a low coverage that it's not worth supporting (but it should really be part of the core API by now - the extension is from 2019).
Anyway... baby steps into the right direction, only a shame that it took a decade ;)
Like, these days game devs just use Unreal Engine, which abstracts away having to work with the PS5 / PS4, DirectX 12, and Vulkan APIs.
I imagine unless it's either for A. edification or B. very bespoke purpose code, you're not touching Vulkan.
This idea creates a serious chicken-egg-problem.
Two or three popular engine code bases sitting on top of Vulkan isn't enough 'critical mass' to get robust and high performance Vulkan drivers. When there's so little diversity in the code hammering on the Vulkan API it's unlikely that all the little bugs and performance problems lurking in the drivers will be triggered and fixed, especially when most Unity or Unreal game projects will simply select the D3D11 or D3D12 backend since their main target platform on PC is Windows.
Similar problem to when GLQuake was the only popular OpenGL game, as soon as your own code used the GL API in a slightly different way than Quake did all kinds of things broke since those GL drivers only properly implemented and tested the GL subset used by GLQuake, and with the specific function call patterns of GLQuake.
From what I've seen so far, the MESA Vulkan drivers on Linux seem to be in much better shape than the average Windows Vulkan driver. The only explanation I have for this is that there are hardly any Windows games running on top of Vulkan (instead they use D3D11 or D3D12), while running those same D3D11/D3D12 games on Linux via Proton always goes through the Vulkan driver. So on Linux there may be more 'evolutionary pressure' to get high quality Vulkan drivers indirectly via D3D11/D3D12 games that run via Proton.
Vulkan is mature. It has been used in production since 2013 (!) in the form of Mantle. I have no idea why all the Vulkan doomsayers here think it still needs a half-to-whole decade to be 'useful'.
I run all my windows games on Vulkan.
I don't think those lists are complete, but they seem to show the right relative amount of 3D API usage across PC games.
Proton is amazing and Wine project deserves your support.
There are literally dozens of in-house engines that run on Vulkan. Not everything is Unreal or Unity.
This is not true in the slightest. There are loads of custom 3D engines across many many companies/hobbyists. Vulkan has been out for a decade now, there are likely Vulkan backends in many (if not most) of them.
It's a similar challenge to the many different historical strata of C++ resources.
Well, all desktop hardware and drivers at least. God help you if you want to ship on Android.
https://docs.vulkan.org/tutorial/latest/00_Introduction.html
Addiitionally most of these fixes aren't coming into Android, now getting WebGPU for Java/Kotlin[0] after so many refused to move away from OpenGL ES, and naturally any card not lucky to get new driver releases.
Still, better now than never.
[0] - https://developer.android.com/jetpack/androidx/releases/webg...
Do you work for Google or an Android OEM? If not, you have no basis to make the claim that Android will cease updating Vulkan API support.
WebGPU on Android runs on top of Vulkan.
If you knew about 3D programming on Android, you would know that there are ongoing efforts to have only Vulkan, with OpenGL ES on top.
However Java and Kotlin devs refuse to bother with the NDK for Vulkan, and keep reaching for OpenGL ES instead.
Please refer to Google talks on Vulkanised conferences.
Don't these compatibility layers run into issues with constant pipeline recompilation related performance issues, when emulating OpenGL?
Ok this made me laugh given that Vulkan support on Android is so bad that WebGPU needs a fallback mode to GLES ;)
The argument being that if Android only does Vulkan, that OEMs will be forced to care about their drivers.
There are talks done by Google on this, either Vulkanised, Google IO, or GDC, can't remember now the exact one.
The fuck are you talking about? Of course they'll come to Android
Everyone keeps telling me OpenCL is deprecated (which is true, although it's also true that it continues to work superbly in 2026) but there isn't a good / official OpenCL to Vulkan wrapper out there to justify it for what I do.
The main thing that's not possible at all on top of Vulkan is his signals API, which I would enjoy seeing - it could be done if timeline semaphores could be waited on/signalled inside a command buffer, rather than just on submission boundaries. Not sure how feasible that is with existing hardware though.
> Vulkan’s VK_EXT_descriptor_buffer (https://www.khronos.org/blog/vk-ext-descriptor-buffer) extension (2022) is similar to my proposal, allowing direct CPU and GPU write. It is supported by most vendors, but unfortunately is not part of the Vulkan 1.4 core spec.
The new `VK_EXT_descriptor_heap` extension described in the Khronos post is a replacement for `VK_EXT_descriptor_buffer` which fixes some problems but otherwise is the same basic idea (e.g. "descriptors are just memory").
I'm sure the comments will be all excuses and whys but they're all nonsense. It's just a poorly thought out API.
And Vulkans unnecessary complexity doesn't stop at that issue, there are plenty of follow-up issues that I also have no intention of dealing with. Instead, I'll just use Cuda which doesn't bother me with useless complexity until I actually opt-in to it when it's time to optimize. Cuda allows to easily get stuff done first then check the more complex stuff to optimize, unlike Vulkan which unloads the entire complexity on you right from the start, before you have any chance to figure out what to do.
That's not realistic on non-UMA systems. I doubt you want to go over PCIe every time you sample a texture, so the allocator has to know what you're allocating memory _for_. Even with CUDA you have to do that.
And even with unified memory, only the implementation knows exactly how much space is needed for a texture with a given format and configuration (e.g. due to different alignment requirements and such). "just" malloc-ing gpu memory sounds nice and would be nice, but given many vendors and many devices the complexity becomes irreducible. If your only use case is compute on nvidia chips, you shouldn't be using vulkan in the first place.
No you don't, cuMemAlloc(&ptr, size) will just give you device memory, and cuMemAllocHost will give you pinned host memory. The usage flags are entirely pointless. Why would UMA be necessary for this? There is a clear separation between device and host memory. And of course you'd use device memory for the texture data. Not sure why you're constructing a case where I'd fetch them from host over PCI, that's absurd.
> only the implementation knows exactly how much space is needed for a texture with a given format and configuration
OpenGL handles this trivially, and there is also no reason for a device malloc to not also work trivially with that. Let me create a texture handle, and give me a function that queries the size that I can feed to malloc. That's it. No heap types, no usage flags. You're making things more complicated than they need to be.
Once Vulkan is finally in good order, descriptor_heap and others, I really really hope we can get a WebGPU.next.
Where are we at with the "what's next for webgpu" post, from 5 quarters ago? https://developer.chrome.com/blog/next-for-webgpu https://news.ycombinator.com/item?id=42209272
It's also disappointing that OpenGL 4.6, released in 2017, is a decade ahead of WebGPU.
Web graphics have never and will never be cutting edge, they can't as they have to sit on top of browsers that have to already have those features available to it. It can only ever build on top of something lower level. That's not inherently bad, not everything needs cutting edge, but "it's outdated" is also just inherently going to be always true.
Also, some things could have easily be done different and then be implemented as efficient as a particular backend allows. Like pipelines. Just don't do pipelines at all. A web graphics API does not need them, WebGL worked perfectly fine without them. The WebGPU backends can use them if necessary, or not use them if more modern systems don't require them anymore. But now we're locked-in to a needlessly cumbersome and outdated way of doing things in WebGPU.
Similarly, WebGPU could have done without that static binding mess. Just do something like commandBuffer.draw(shader, vertexBuffer, indexBuffer, texture, ...) and automatically connect the call with the shader arguments, like CUDA does. The backend can then create all that binding nonsense if necessary, or not if a newer backend does not need it anymore.
Except it didn't. In the GL programming model it's trivial to accidentially leak the wrong granular render state into the next draw call, unless you always reconfigure all states anyway (and in that case PSOs are strictly better, they just include too much state).
The basic idea of immutable state group objects is a good one, Vulkan 1.0 and D3D12 just went too far (while the state group granularity of D3D11 and Metal is just about right).
> Similarly, WebGPU could have done without that static binding mess.
This I agree with, pre-baked BindGroup objects were just a terrible idea right from the start, and AFAIK they are not even strictly necessary when targeting Vulkan 1.0.
Even if those state group objects don't match the underlying hardware directly they still reign in the combinatorial explosion dramatically and are more robust than the GL-style state soup.
AFAIK the main problem is state which needs to be compiled into the shader on some GPUs while other GPUs only have fixed-function hardware for the same state (for instance blend state).
This is where I think Vulkan and WebGPU are chasing the wrong goal: To make draw calls faster. What's even faster, however, is making fewer draw calls and that's something graphics devs can easily do when you provide them with tools like multi-draw. Preferably multi-draw that allows multiple different buffers. Doing so will naturally reduce costly state changes with little effort.
They lag behind modern hardware, and after almost 15 years, there are zero developer tools to debug from browser vendors, other than the oldie SpectorJS that hardly counts.
Graphics people, here is what you need to do.
1) Figure out a machine abstraction.
2) Figure out an abstraction for how these machines communicate with each other and the cpu on a shared memory bus.
3) Write a binary spec for code for this abstract machine.
4) Compilers target this abstract machine.
5) Programs submit code to driver for AoT compilation, and cache results.
6) Driver has some linker and dynamic module loading/unloading capability.
7) Signal the driver to start that code.
AMD64, ARM, and RISC-V are all basically differing binary specs for a C-machine+MMU+MMIO compute abstraction.
Figure out your machine abstraction and let us normies write code that’s accelerated without having to throw the baby out with the bathwater ever few years.
Oh yes, give us timing information so we can adapt workload as necessary to achieve soft real-time scheduling on hardware with differing performance.
surprise, it's very difficult to do across many hw vendors and classes of devices. it's not a coincidence that metal is much easier to program for.
maybe consider joining khronos since you apparently know exactly how to achieve this very simple goal...
Tbf, Metal also works on non-Apple GPUs and with only minimal additional hints to manage resources in non-unified memory.