CUDA Tile Open Sourced
186 points
6 days ago
| 8 comments
| github.com
| HN
fooblaster
16 hours ago
[-]
Let's see if developers sleepwalk into another trap to keep us locked into nvidia's hardware for the next decade.
reply
pjmlp
8 hours ago
[-]
It is up to AMD, Intel and Khronos to offer APIs and tools that are actually nice to use.

They have had about 15 years to move beyond C99, stone age workflows to compile GLSL and C99 with their drivers, no libraries ecosystem, and printf debugging.

Eventually some of the issues have been fixed, after they started seeing only hardliners would put with such development experience, and then it was too late.

reply
tester756
5 hours ago
[-]
Isn't there OneAPI with its huge ecosystem of tools, debuggers, etc?
reply
pjmlp
3 hours ago
[-]
Yes, that is part of "it was too late".

OneAPI builds on top of SYSCL, is basically Intel's CUDA, which it is already the second attempt to have C++ in OpenCL, during OpenCL 2.x, an effort that worked so well, that OpenCL 3.0 is basically a reboot back to OpenCL 1.0.

Also even SYSCL only got a proper kick-off after CodePlay came up with its implementation, nowadays they sell oneAPI support and tooling, after being acquired by Intel.

reply
the__alchemist
14 hours ago
[-]
IMO it's not Nvidia's fault the competing APIs are high friction.
reply
flyingcoder
14 hours ago
[-]
AMD screwed up so badly.
reply
fooblaster
14 hours ago
[-]
That is true, but that doesn't mean Nvidia is not engaging in engineering to intentionally kneecap competition. Triton and other languages like that are a huge threat and CUtile is a means to combat that threat and prevent a hardware abstraction layer.
reply
positron26
13 hours ago
[-]
Hundreds of thousands of developers with access to a global communication network were not stopped by AMD. Why act like dependents or wait for some bright star of consensus unless the intent is really about getting the work for free?

We don't have to wait for singular companies or foundations to fix ecosystem problems. Only the means of coordination are needed. https://prizeforge.com isn't there yet, but it is already capable of bootstrapping its own development. Matching funds, joining the team, or contributing on MuTate will all make the ball pick up speed faster.

reply
nemothekid
5 hours ago
[-]
>We don't have to wait for singular companies or foundations to fix ecosystem problems.

Geohot has been working on this for about a year, and every roadblock he's encountered he has had to damn near pester Lisa Su about getting drivers fixed. If you want the CUDA replacement that would work on AMD, you need to wait on AMD. If there is a bug in the AMD microcode, you are effectively "stopped by AMD".

reply
OneDeuxTriSeiGo
12 hours ago
[-]
CUDA Tile is an open source MLIR Dialect so it wouldn't take much to write MLIR transforms to map it from the Tile IR to TOSA or gpu + vector + some amdgpu or other specialty dialects.

The Tile dialect is pretty much independent of the nvidia ecosystem so all it takes is one good set of MLIR transform passes to run anything on the CUDA stack that compiles to tile out of the nvidia ecosystem prison.

So if anything this is actually a massive opportunity to escape vendor lock in if it catches on in the CUDA ecosystem.

reply
saagarjha
11 hours ago
[-]
Yes, but why would you want to use this over the other MLIR dialects that are already cross platform?
reply
RobotToaster
7 hours ago
[-]
Or it's Nvidia doing an Embrace Extend Extinguish on MLIR.
reply
trueismywork
5 hours ago
[-]
TileIR license means llvm can just fork and support it themselves as needed.
reply
trueismywork
12 hours ago
[-]
TileIR is Apache licensed so AMD can implement it as well.
reply
RicoElectrico
7 hours ago
[-]
Obviously they will, as with the mainframe and cloud.
reply
gaogao
2 hours ago
[-]
The compiler for CUDA Tile being Blackwell only is a baffling decision. I wanted to try it out, but it's only really easy to grab H100s quickly right now. I guess maybe I'll try it out on my 5070 Ti after traveling, but am more likely to stick to an IR that targets multiple platforms, since they couldn't be bothered.
reply
robobsolete
31 minutes ago
[-]
I was keen to try it too, but oh well
reply
opan
17 hours ago
[-]
>The CUDA Tile IR project is under the Apache License v2.0 with LLVM Exceptions
reply
boywitharupee
18 hours ago
[-]
shouldn't the title be "CUDA Tile IR Open Sourced"?
reply
OneDeuxTriSeiGo
12 hours ago
[-]
It's more or less the same thing. CUDA TIle is the name of the IR, cuTile is the name of the high level DSLs.
reply
jauntywundrkind
20 hours ago
[-]
Will be interesting to see if Nvidia and other have any interest & energy getting this used by others, if there actually is an ecosystem forming around it.

Google leading XLA & IREE, with awesome intermediate representations, used by lots of hardware platforms, and backing really excellent Jax & Pytorch implementations, having tools for layout & optinization folks can share: they really build an amazing community.

There's still so much room for planning/scheduling, so much hardware we have yet to target. RISC-V has really interesting vector instructions, for example, and it seems like there's so much exploration / work to do to better leverage that.

Nvidia has partners everywhere now. Nvlink is used by Intel, AWS Tritanium, others. Yesterday the Groq exclusive license that Nvidia paid to give to Groq?! Seeing how and when CUDA Tiles emerges: will be interesting. Moving from fabric partnerships, up up up the stack.

reply
pjmlp
19 hours ago
[-]
For NVidia it suffices this is a Python JIT allowing programming CUDA compute kernels directly in Python instead of C++, yet another way how Intel and AMD, alongside Khronos APIs, lag behind in great developer experiences for GPU compute programming.

Ah, and Nsight debugging also supports Python CUDA Tiles debugging.

https://developer.nvidia.com/blog/simplify-gpu-programming-w...

reply
saagarjha
13 hours ago
[-]
Nsight does not have a debugger.
reply
dahart
8 hours ago
[-]
What do you mean? Are you unaware of Nsight VSE? https://developer.nvidia.com/nsight-visual-studio-edition
reply
saagarjha
6 hours ago
[-]
I was aware of their Visual Studio plugins but I did not know that they called their debugger support for Visual Studio “Nsight” as well.
reply
pjmlp
8 hours ago
[-]
Yes it does, apparently you never used it.
reply
Q6T46nT668w6i3m
18 hours ago
[-]
Slang is a fantastic developer experience.
reply
Conscat
14 hours ago
[-]
I work at Nvidia, and my team is using Slang for all of our (numerous and non-trivial) kernels because its automatic differentiation type system is so nice.
reply
pjmlp
17 hours ago
[-]
Especially when using the tooling from who created it, before offering it to Khronos as GLSL replacement, NVIDIA.
reply
Moosdijk
19 hours ago
[-]
> There's still so much room for planning/scheduling, so much hardware we have yet to target

this is nicely illustrated by this recent article:

https://news.ycombinator.com/item?id=46366998

reply
saagarjha
13 hours ago
[-]
Wrong type of scheduling.
reply
turtletontine
19 hours ago
[-]
On the RISC-V vector instructions, could you elaborate? Are the vector extensions substantially different from those in ARM or x86?
reply
adgjlsfhk1
19 hours ago
[-]
it's fairly similar to Arm's sve2, but very different from the x86 side in that the instructions are variable length rather than fixed
reply
nl
11 hours ago
[-]
> Groq exclusive license

non-exclusive license actually.

reply
almostgotcaught
12 hours ago
[-]
> Google leading XLA & IREE

IREE hasn't been at G for >2 years.

reply
xmorse
18 hours ago
[-]
Writing this in Mojo would have been so much easier
reply
3abiton
18 hours ago
[-]
It's barely gaining adoption though. The lack of buzz is a chicken and egg issue for Mojo. I fiddled shortly with it (mainly to get it working some of my pythong scripts), and it was suprisingly easy. It'll shoot up one day for sure if Latner doesn't give up early on it.
reply
ronsor
16 hours ago
[-]
Isn't the compiler still closed source? I and many other ML devs have no interest in a closed-source compiler. We have enough proprietary things from NVIDIA.
reply
0x696C6961
13 hours ago
[-]
Yeah, the mojo pitch is so good, but I don't think anyone has an appetite for the potential fuckery that comes with a closed source platform.
reply
3abiton
13 hours ago
[-]
Yes, but Latner said multiple time it's closed until it matures (he apparently did this with llvm and swift too). So not unusal. His open source target is end of 2026. In all fairness, I have 0 doubts that he would deliver.
reply
pjmlp
7 hours ago
[-]
Given Swift for Tensorflow, lets see how this one goes.
reply
saagarjha
6 hours ago
[-]
That one did get open sourced but nobody ended up wanting to use it
reply
jacobgorm
5 hours ago
[-]
Who would anyone want to pair a subpar language with a subpar ML framework?
reply
pjmlp
5 hours ago
[-]
That is the thing, what lessons were learnt from it, and how will Mojo tackle them.
reply
boredatoms
14 hours ago
[-]
I feel like its in AMD/Intel/G’s interest to pile a load of effort into (an open source) mojo
reply
ipsum2
12 hours ago
[-]
Mojo is not open source and would not get close to the performance of cuTile.

I'm tired of people shilling things they don't understand.

reply
almostgotcaught
12 hours ago
[-]
it's all over this thread (and every single other hn thread about GPU/ML compilers) - people quoting random buzzword/clickbait takes.
reply
llmslave2
16 hours ago
[-]
I really want Mojo to take off. Maybe in a few years. The lack of an stdlib holds it back more than they think, and since their focus is narrow atm it's not useful for the vast majority of work.
reply
bigyabai
18 hours ago
[-]
Use-cases like this are why Mojo isn't used in production, ever. What does Nvidia gain from switching to a proprietary frontend for a compiler backend they're already using? It's a legal headache.

Second-rate libraries like OpenCL had industry buy-in because they were open. They went through standards committees and cooperated with the rest of the industry (even Nvidia) to hear-out everyone's needs. Lattner gave up on appealing to that crowd the moment he told Khronos to pound sand. Nobody should be wondering why Apple or Nvidia won't touch Mojo with a thirty-nine and a half foot pole.

reply
xmorse
16 hours ago
[-]
Kernels now written in Mojo were all in hand written in MLIR like in this repo. They made a full language because that's not scalable, a sane language is totally worth it. Nvidia will probably end up buying them in a few years.
reply
pjmlp
7 hours ago
[-]
NVidia is perfectly fine with C++ and Python JIT.

CUDA Tile was exactly designed to give parity to Python in writing CUDA kernels, acknowledging the relevance of Python, while offering a path researchers don't need to mess with C++.

It was announced at this years GTC.

NVidia has no reason to use Mojo.

reply
bigyabai
15 hours ago
[-]
I don't think Nvidia would acquire Mojo when the Triton compiler is open source, optimized for Nvidia hardware and considered a industry standard.
reply
saagarjha
13 hours ago
[-]
Nobody is writing MLIR by hand, what are you on about? There are so many MLIR frontends
reply
oedemis
13 hours ago
[-]
how mojo with max optimize the process?
reply
itsthecourier
16 hours ago
[-]
what about a fourty feet pole? would it be viable?
reply
pjmlp
17 hours ago
[-]
It would help if they were not so much macOS and Linux focused.

Julia, Python GPU JITs work great on Windows, and many people only get Windows systems as default at work.

reply
saagarjha
13 hours ago
[-]
Approximately nobody writing high performance code for AI training is using Windows. Why should they target it?
reply
pjmlp
8 hours ago
[-]
As desktop, and sometimes that is the only thing available.

When is the Year of NPUs on Linux?

reply
saagarjha
6 hours ago
[-]
This targets Blackwell GPUs so I’m not sure what you are talking about
reply
pjmlp
6 hours ago
[-]
The same, hardware available for Windows users, as work devices at several companies, used by researchers that work at said companies,

https://www.pcspecialist.de/kundenspezifische-laptops/nvidia...

Which as usual, kind of work but not really, in GNU/Linux.

reply
bigyabai
14 hours ago
[-]
I've commissioned a board of MENSA members to devise a workaround for this issue; they've identified two potential solutions.

1) Install Linux

2) Summon Chris Lattner to play you a sad song on the world's smallest violin in honor of the Windows devs that refuse to install WSL.

reply
pjmlp
8 hours ago
[-]
I go with customers keep using CUDA with Python and Julia, ignore Chris Latter's company exists, while Mojo repeats Swift for Tensorflow history.

What about that outcome?

reply
toolboxg1x0
18 hours ago
[-]
NVIDIA tensor core units, where the second column in kernel optimization is producing a test suite.
reply
CamperBob2
19 hours ago
[-]
Fun game: see how many clicks it takes you to learn what MLIR stands for.

I lost count at five or six. Define your acronyms on first use, people.

reply
saagarjha
13 hours ago
[-]
This is a GitHub repo for compiler engineers.
reply
CamperBob2
12 hours ago
[-]
Cool. This is a site for hackers of all stripes.
reply
saagarjha
12 hours ago
[-]
Yes, so given that you clearly had trouble figuring out what it was, maybe you could have shared with the class?
reply
rswail
5 hours ago
[-]
Based on the use of LLVM I guessed "Machine Learning Intermediate Representation"?

How close was I?

reply
ipnon
13 hours ago
[-]
GPU programming definitely is not beginner friendly. There's a much higher learning curve than most open source projects. To learn basic Python you need to know about definitions and loops and variables, but to learn CUDA kernels you need to know maybe an order of magnitude more concepts to write anything useful. It's just not worth the time to cater to people who don't RTFM, the README would be twice as long and be redundant to the target audience of the library.
reply
CamperBob2
12 hours ago
[-]
That's the whole problem. I had to "R" multiple "FMs" before one of them bothered to define the acronym.

Stop carrying water for poor documentation practice.

reply
__patchbit__
9 hours ago
[-]
Use the AI prompt to pinprick learn.

Just say to the AI, "Explain THIS".

reply
CamperBob2
18 minutes ago
[-]
HN: "Learning is good"

Just say to the AI, "Explain THIS".

Also HN: "Not like that"

reply
RobotToaster
6 hours ago
[-]
ChatGPT Told me MLIR stands for "Modern Life Is Rubbish"
reply
reactordev
4 hours ago
[-]
YMMV
reply
ipnon
12 hours ago
[-]
It's kind of like if the Django README explained how SQL works, the structure of HTTP requests, best practices for HTML, and so on. If you don't know what MLIR is, you might not be the target audience for this library. Nvidia in general doesn't prioritize developer experience as much as companies like Meta do for open source projects like React.
reply
CamperBob2
12 hours ago
[-]
HTTP and HTML are very common acronyms; nobody should be getting out of high school these days without knowing them, and if they somehow managed to do so, they're darned sure not reading HN. Even SQL is pretty hard to avoid if you've been in an IT-adjacent industry for a while.

However, MLIR is a highly-specialized term. The problem with failing to define a term like that is that I don't know up front if I'm the target audience for the article. I had to Google it, and when I did that, all I found at first were yet more articles that failed to define it.

Wikipedia gets the job done, but these days, Wikipedia is often a long way down the Google search results list. I think they downranked it when they started force-feeding AI answers (which also didn't help).

reply
roughly
18 hours ago
[-]
The ol’ TMA problem.
reply
fragmede
18 hours ago
[-]
I did it in three. I selected it in your comment, and then had to hit "more" to get to the menu to ask Google about it, which brought me to https://www.google.com/search?q=MLIR which says: MLIR is an open-source compiler infrastructure project developed as a sub-project of the LLVM project. Hopefully

Get better at computers and stop needing to be spoon-fed information, people!

reply
reactordev
18 hours ago
[-]
In this day and age, asking questions about what something is is a minefield of “just ask AI” and “You should know this”. Let’s stop putting down people who ask questions and root out those that have shitty answers.
reply
ThrowawayTestr
17 hours ago
[-]
Google is nearly 30 years old
reply
pjmlp
17 hours ago
[-]
And we are not counting Yahoo, Altavista, Ask Jeeves, MSN,...
reply
fragmede
17 hours ago
[-]
I get why it feels frustrating when someone snaps "just google it." Nobody likes feeling dumb. That said, there’s a meaningful difference between asking a genuine question and demanding that every discussion be padded to accommodate readers who won’t even type four letters into a search bar. Expecting complete spoon-feeding in technical threads isn’t curiosity; it’s a refusal to engage. Learning requires participation.
reply
VTimofeenko
16 hours ago
[-]
> Learning requires participation

I won't argue, but there is a middle ground between articles consisting of pure JAFAs and this:

> accommodate readers who won’t even type four letters into a search bar

I think it helps if acronyms are expanded at least once or in a footnote so that the potential new reader can follow along and does not need to guess what ACMV^ means.

^: Awesome Combobulating Method by VTimofeenko, patent pending.

reply
reactordev
16 hours ago
[-]
Easy, if that’s how you feel, skip the comment and don’t engage.

Telling people who want to have that participation and discussion to “RTFM” is not a good response.

Often you’ll come across the authors on these posts that can shed direct, 1st person evidence, of what we’re talking about.

So please, when someone asks “what is that?” Don’t respond with “RTFM”.

reply
fragmede
11 hours ago
[-]
Asking "what is this?" is fine. Treating "I was unfamiliar with this" as evidence that the post is deficient is not.

HN already assumes a baseline of technical literacy. When something falls outside that baseline, the usual move is to ask for context or links, not to reframe personal unfamiliarity as an author failure.

So please, don’t normalize treating "I don’t know this yet" as a failure of the post.

reply
pluralmonad
1 hour ago
[-]
But not defining acronyms on first use is a failure of etiquette. Its your prerogative to not hold this to be true, but many of us do. There is little value in eliding the on-first-use definition.
reply
reactordev
4 hours ago
[-]
I agree but if someone asks “What is this?” and it’s not covered by the article, what we shouldn’t do is put that person down by telling them to “just google it”.

If that is your answer, please just don’t comment.

reply
CamperBob2
17 hours ago
[-]
You're posting a spirited defense of substandard technical writing. Just curious -- why is that?
reply
guipsp
17 hours ago
[-]
You cannot explain everything to everyone all the time. Besides, this is not even a paper. Sometimes you are not the target audience and have to put some words into Google.
reply
fragmede
11 hours ago
[-]
Because I think the norm we reinforce here actually matters.

When confusion gets framed as "this is substandard writing", it rewards showing up and performing a lack of context rather than engaging with the substance or asking clarifying questions. Over time that creates pressure to write to the lowest common denominator, instead of the audience the author is clearly aiming at.

HN already operates on an implicit baseline (CUDA, open source, LLVM, etc.) and mostly lets comments fill in gaps. That usually produces better discussions than treating every unfamiliar term as an author failure, especially when someone is just trying to share or explain something they care about.

So yeah, I am genuinely curious why you see personal unfamiliarity as something the entire discussion should reorganize itself around.

reply
CamperBob2
10 hours ago
[-]
When confusion gets framed as "this is substandard writing", it rewards showing up and performing a lack of context rather than engaging with the substance or asking clarifying questions. Over time that creates pressure to write to the lowest common denominator, instead of the audience the author is clearly aiming at. ... So yeah, I am genuinely curious why you see personal unfamiliarity as something the entire discussion should reorganize itself around.

(Shrug) The fact is that all major style guides -- APA, MLA, AP, Chicago, probably some others -- call for potentially-unfamiliar acronyms to be defined on first use, and it's common enough to do so. For some reason, though, essentially nobody who writes about this particular topic agrees with that.

Which is cool -- it's not my field, so I don't really GAF. I'm mostly just remarking on how unusually difficult it was to drill down on this particular term. I'll avoid derailing the topic further than I already have.

reply
iaebsdfsh
17 hours ago
[-]
From Wikipedia: The name "Multi-Level Intermediate Representation" reflects the system’s ability to model computations at various abstraction levels and progressively lower them toward machine code.
reply
poita66
18 hours ago
[-]
And yet you didn’t tell us what it stands for, just what it is. The person you’re responding to was specifically talking about finding out what it stands for
reply
piskov
18 hours ago
[-]
If only there was a chat-based app that you could ask questions to.
reply