FilterHN

Ask HN: Why hasn’t AMD made a viable CUDA alternative?

181 points

1 day ago

| 37 comments

I appreciate developing ROCm into something competitive with CUDA would require a lot of work, both internally within AMD and with external contributions to the relevant open source libraries.

However the amount of resources at stake is incredible. The delta between NVIDIA's value and AMD's is bigger than the annual GDP of Spain. Even if they needed to hire a few thousand engineers at a few million in comp each, it'd still be a good investment.

▲

fancyfredbot

1 day ago

[-]

There is more than one way to answer this.

They have made an alternative to the CUDA language with HIP, which can do most of the things the CUDA language can.

You could say that they haven't released supporting libraries like cuDNN, but they are making progress on this with AiTer for example.

You could say that they have fragmented their efforts across too many different paradigms but I don't think this is it because Nvidia also support a lot of different programming models.

I think the reason is that they have not prioritised support for ROCm across all of their products. There are too many different architectures with varying levels of support. This isn't just historical. There is no ROCm support for their latest AI Max 395 APU. There is no nice cross architecture ISA like PTX. The drivers are buggy. It's just all a pain to use. And for that reason "the community" doesn't really want to use it, and so it's a second class citizen.

This is a management and leadership problem. They need to make using their hardware easy. They need to support all of their hardware. They need to fix their driver bugs.

▲

thrtythreeforty

1 day ago

[-]

This ticket, finally closed after being open for 2 years, is a pretty good micocosm of this problem:

https://github.com/ROCm/ROCm/issues/1714

Users complaining that the docs don't even specify which cards work.

But it goes deeper - a valid complaint is that "this only supports one or two consumer cards!" A common rebuttal is that it works fine on lots of AMD cards if you set some environment flag to force the GPU architecture selection. The fact that this is so close to working on a wide variety of hardware, and yet doesn't, is exactly the vibe you get with the whole ecosystem.

▲

iforgotpassword

1 day ago

[-]

What I don't get is why they don't at least assign a dev or two to make the poster child of this work: llama.cpp

It's the first thing anyone tries when trying to dabble in AI or compute on the gpu, yet it's a clusterfuck to get to work. A few blessed cards work, with proper drivers and kernel; others just crash, perform horribly slow, or output GGGGGGGGGGGGGG to every input (I'm not making this up!) Then you LOL, dump it and go buy nvidia et voila, stuff works first try.

▲

wkat4242

16 hours ago

[-]

It does work, I have it running on my Radeon VII Pro

▲

Filligree

8 hours ago

[-]

It sometimes works.

▲

wkat4242

6 hours ago

[-]

How so? It's rock solid for me. I use ollama but it's based on llama.cpp

It's quite fast also, probably because that card has fast HBM2 memory (it has the same memory bandwidth as a 4090). And it was really cheap as it was on deep sale as an outgoing model.

▲

halJordan

1 hour ago

[-]

Aside from the fact that gfx906 is one of the blessed architecture mentioned (so why would it not work). Like how do you look at your specific instance and then turn around and say "All of you are lying, it works perfectly." How do you square that circle in your head

▲

Filligree

4 hours ago

[-]

"Sometimes" as in "on some cards". You're having luck with yours, but that doesn't mean it's a good place to build a community.

▲

wkat4242

1 hour ago

[-]

Ah I see. Yes, but you pick the card for the purpose of course. I also don't like the way they have such limited support on ROCm. But when it works it works well.

I have Nvidia cards too by the way, a 4090 and a 3060 (the latter I use for AI also, but more for Whisper because faster-whisper doesn't do ROCm right now).

▲

mook

1 day ago

[-]

I suspect part of it is also that Nvidia actually does a lot of things in firmware that can be upgraded. The new Nvidia Linux drivers (the "open" ones) support Turing cards from 2018. That means chips that old already do much of the processing in firmware.

AMD keeps having issues because their drivers talk to the hardware directly so their drivers are massive bloated messes, famous for pages of auto-generated register definitions. Likely it's much more difficult to fix anything.

▲

Evil_Saint

18 hours ago

[-]

Having worked at both Nvidia and AMD I can assure you that they both feature lots of generated header files.

▲

bgnn

22 hours ago

[-]

Hmm that is interesting. Can you elaborate what is exactly different between them?

I'm asking because I think a firmware has to directly talk to hardware through lower HAL (hardware abstraction layer), while customer facing parts should be fairly isolated in the upper HAL. Some companies like to add direct HW acces to customer interface via more complex functions (often a recipe made out of lower HAL functions), which I always disliked. I prefer to isolate lower level functions and memory space from the user.

In any case, both Nvidia and AMD should have very similar FW capabilities. I don't know what I'm missing here.

▲

Evil_Saint

17 hours ago

[-]

I worked on at both companies on drivers. The programming models are quite different. Both make GPUs but they were designed by different groups of people who made different decisions. For example:

Nvidia cards are much easier to program in the user mode driver. You cannot hang a Nvidia GPU with a bad memory access. You can hang the display engine with one though. At least when I was there.

You can hang an AMD GPU with a bad memory access. At least up to the Navi 3x.

▲

raxxorraxor

12 hours ago

[-]

Why isolate these functions? That will always cripple capabilities. With well designed interfaces, it doesn't lead to a mess and a more powerful device. Of course these lower level functions shouldn't be essential, but especially in these times you almost have to provide an interface here or be left behind by other environments.

▲

citizenpaul

1 day ago

[-]

I've thought about this myself and come to a conclusion that your link reinforces. As I understand it most companies doing (EE)hardware design and production consider (CS) software to be a second class citizen at the the company. It looks like AMD after all this time competing with NVIDIA has not learned the lesson. That said I have never worked in hardware so I'm taking what I've heard from other people.

NVIDIA while far from perfect has always easily kept stride in software quality ahead of AMD for over 20 years. While AMD repeatedly keeps falling on their face and getting egg all over themselves again and again and again as far as software goes.

My guess is NVIDIA internally has found a way to keep the software people from feeling like they are "less than" the people designing the hardware.

Sounds easily but apparently not. AKA management problems.

▲

bgnn

22 hours ago

[-]

This is correct but one of the reasons is the SWE at HW companies are living in their own bubble. They somehow don't follow the rest of the SW developments.

I'm a chip design engineer and I get frustrated with the garbage SW/FW team come up with, to the extent that I write my own FW library for my blocks. While doing that I try to learn the best practices, do quite a bit of research.

One other reason is, SW was only FW till not long ago, which was serving the HW. So there was almost no input from SW to HW development. This is clearly changing but some companies, like Nvidia, are ahead of the pack. Even Apple SoC team is quite HW centric compared to Nvidia.

▲

CoastalCoder

1 day ago

[-]

I had a similar (I think) experience when building LLVM from source a few years ago.

I kept running into some problem with LLVM's support for HIP code, even though I had not interest in having that functionality.

I realize this isn't exactly an AMD problem, but IIRC it was they were who contributed the troublesome code to LLVM, and it remained unfixed.

Apologies if there's something unfair or uninformed in what I wrote, it's been a while.

▲

tomrod

1 day ago

[-]

Geez. If I were Berkshire Hathaway looking to invest in the GPU market, this would be a major red flag in my fundamentals analysis.

▲

Covzire

1 day ago

[-]

That reeks of gross incompetence somewhere in the organization. Like a hosting company that has a customer dealing with very poor performance, over pays greatly to avoid it while the whole time nobody even thinks to check what the linux swap file is doing.

▲

zombiwoof

1 day ago

[-]

This

▲

sigmoid10

1 day ago

[-]

>This is a management and leadership problem.

It's easy (and mostly correct) to blame management for this, but it's such a foundational issue that even if everyone up to the CEO pivoted on every topic, it wouldn't change anything. They simply don't have the engineering talent to pull this off, because they somehow concluded that making stuff open source means someone else will magically do the work for you. Nvidia on the other hand has accrued top talent for more than a decade and carefully developed their ecosystem to reach this point. And there are only so many talented engineers on the planet. So even if AMD leadership wakes up tomorrow, they won't go anywhere for a looong time.

▲

raxxorraxor

12 hours ago

[-]

Even top tier engineers can be found eventually. The problem is if you never even start.

Of course the specific disciplines need quite an investment into the knowledge of their workers, but it isn't anything insurmountable.

▲

jlundberg

1 day ago

[-]

I wonder what would happen if they hired John Carmack to lead this effort.

He would probably be able to attract some really good hardware and driver talent.

▲

sigmoid10

6 hours ago

[-]

Carmack has been traditionally anti-AMD nad pro-Nvidia (at least regarding GPUs) in the past. I don't know if they could convince him even with all the money in the world unless they fundamentally changed everything first.

▲

pjc50

1 day ago

[-]

> This is a management and leadership problem. They need to make using their hardware easy. They need to support all of their hardware. They need to fix their driver bugs.

Yes. This kind of thing is unfortunately endemic in hardware companies, which don't "get" software. It's cultural and requires (a) a leader who does Get It and (b) one of those Amazon memos stating "anyone who does not Get With The Program will be fired".

▲

flutetornado

1 day ago

[-]

I was able to compile ollama for AMD Radeon 780M GPUs and I use it regularly on my AMD mini-PC which cost me 500$. It does require a bit more work. I get pretty decent performance with LLMs - just making a qualitative statement as I didn't do any formal testing, but I got comparable performance vibes as a NVIDIA 4050 GPU laptop I use as well.

https://github.com/likelovewant/ROCmLibs-for-gfx1103-AMD780M...

▲

vkazanov

1 day ago

[-]

Same here on lenovo thinkpad 14s with AMD Ryzen™ AI 7 PRO 360 that has a Radeon 880M iGPU. Works OK on ubuntu.

Not saying it works everywhere but it wasn't even that hard to setup, comparable to cuda.

Hate the name though.

▲

Our_Benefactors

14 hours ago

[-]

Nobody will come after you for omitting the tm

▲

vkazanov

7 hours ago

[-]

You never know

▲

bn-l

4 hours ago

[-]

> There is no ROCm support for their latest AI Max 395 APU

Fire the ceo

▲

trod1234

1 day ago

[-]

It is a little bit more complicated than ROCm simply not having support because ROCm has at a point claimed support, and they've had to walk it back painfully (multiple times). Its not a driver issue, nor a hardware issue on their side.

There has been a long-standing issue between AMD and its mainboard manufacturers. The issue has to do with features required for ROCm, namely PCIe Atomics. AMD has been unable or unwilling to hold the mainboard manufacturers to account for advertising features the mainboard does not support.

The CPU itself must support this feature, but the mainboard must as well (in firmware).

One of the reasons why ROCm hasn't worked in the past is because the mainboard manufacturers have claimed and advertised support for PCIe Atomics, and the support they've claimed has been shown to be false, and the software fails in non-deterministic ways when tested. This is nightmare fuel for the few AMD engineers tasked with ROCm.

PCIe Atomics requires non-translated direct IO to operate correctly, and in order to support the same CPU models from multiple generations they've translated these IO lines in firmware.

This has left most people that query their system to check this showing PCIAtomics is supported, while when actual tests that rely on that support are done they fail, in chaotic ways. There is no technical specification or advertising that the mainboard manufacturers provide showing whether this is supported. Even the boards with multiple x16 slots and the many technologies related to it such as Crossfire/SLI/mGPU brandings these don't necessarily show whether PCIAtomics is properly supported.

In other words, the CPU is supported, the firmware/mainboard fail with no way to differentiate between the two at the upper layers of abstraction.

All in all. You shouldn't be blaming AMD for this. You should be blaming the three mainboard manufacturers who chose to do this. Some of these manufacturers have upper end boards where they actually did do this right they just chose to not do this for any current gen mainboard costing less than ~$300-500.

▲

fancyfredbot

1 day ago

[-]

Look, this sounds like a frustrating nightmare, but the way it seems to us consumers is that AMD chose to rely on poorly implemented and supported technology, and Nvidia didn't. I can't blame AMD for the poor support by motherboards manufacturers but I can and will blame AMD for relying on it.

▲

trod1234

1 day ago

[-]

While we won't know for sure, unless someone from AMD comments on this; in fairness there may not have been any other way.

Nvidia has a large number of GPU related patents.

The fact that AMD chose to design their system this way, in such a roundabout and brittle manner, which is contrary to how engineer's approach things, may have been a direct result of being unable to design such systems any other way because of broad patents tied to the interface/GPU.

▲

fancyfredbot

1 day ago

[-]

I feel like this issue is to at least some extent a red herring. Even accepting that ROCm doesn't work on some motherboards, this can't explain why so few of AMD's GPUs have official ROCm support.

I notice that at one point there was a ROCm release which said it didn't require atomics for gfx9 GPUs, but the requirement was reintroduced in a later version of ROCm. Not sure what happened there but this seems to suggest AMD might have had a workaround at some point (though possibly it didn't work).

If this really is due to patent issues AMD can likely afford to licence or cross-license the patent given potential upside.

It would be in line with other decisions taken by AMD if they took this decision because it works well with their datacentre/high-end GPUs, and they don't (or didn't) really care about offering GPGPU to the mass/consumer GPU market.

▲

zozbot234

1 day ago

[-]

> why so few of AMD's GPUs have official ROCm support

Because "official ROCm support" means "you can rely on AMD to make this work on your system for your critical needs". If you want "support" in the "you can goof around with this stuff on your own and don't care if there's any breakage" sense, ROCm "supports" a whole lot of AMD hardware. They should just introduce a new "experimental, unsupported" tier and make this official on their end.

▲

wkat4242

16 hours ago

[-]

And why the support is dropped so quickly too.

▲

trod1234

1 day ago

[-]

> I feel like this issue is to at least some extent a red herring.

I don't see that, these two issues adequately explain why so few GPUs have official support. They don't want to get hit with a lawsuit, as a result of issues outside their sphere of control.

> If this really is due to patent issues AMD can likely afford to license or cross-license the patent given potential upside.

Have you ever known any company willing to cede market dominance and license or cross-license a patent letting competition into a market that they hold an absolute monopoly over, let alone in an environment where antitrust is non-existent and fang-less?

There is no upside for NVIDIA to do that. If you want to do serious AI/ML work you currently need to use NVIDIA hardware, and they can charge whatever they want for that.

The moment you have a competitor, demand is halved at a bare minimum depending on how much the competitor undercuts you by. Any agreement on coordinating prices leads to price-fixing indictments.

▲

fancyfredbot

1 day ago

[-]

> I don't see that, these two issues adequately explain why so few GPUs have official support.

I'm sorry I don't follow this. Surely if all AMD GPUs have the same problem with atomics then this can't explain why some GPUs are supported and others aren't?

> There is no upside for NVIDIA to do that.

If NVIDIA felt this patent was actually protecting them from competition then there would be no upside. But NVIDIA has competiton from AMD, Intel, Google, and Amazon. Intel have managed to engineer OneAPI support for their GPUs without licensing this patent or relying on PCIe atomics.

AMD have patents NVIDIA would be interested in. For example multi-chiplet GPUs.

▲

wongarsu

1 day ago

[-]

There are so many hardware certification programs out there, why doesn't AMD run one to fix this?

Create a "ROCm compatible" logo and a list of criteria. Motherboard manufacturers can send a pre-production sample to AMD along with a check for some token amount (let's say $1000). AMD runs a comprehensive test suite to check actual compatibility, if it passes the mainboard is allowed to be advertised and sold with the previously mentioned logo. Then just tell consumers to look for that logo if they want to use ROCm. If things go wrong on a mainboard without the certification, communicate that it's probably the mainboard's fault.

Maybe add some kind of versioning scheme to allow updating requirements in the future

▲

spacebanana7

1 day ago

[-]

How does NVIDIA manage this issue? I wonder whether they have a very different supply chain or just design software that puts less trust in the reliability of those advertised features.

▲

bigyabai

1 day ago

[-]

I should point out here, if nobody has already; Nvidia's GPU designs are extremely complicated compared to what AMD and Apple ship. The "standard" is to ship a PCIe card with display handling drivers and some streaming multiprocessor hardware to process your framebuffers. Nvidia goes even further by adding additional accelerators (ALUs by way of CUDA core and tensor cores), onboard RTOS management hardware (what Nvidia calls GPU System Processor), and more complex userland drivers that very well might be able to manage atomics without any PCIe standards.

This is also one of the reasons AMD and Apple can't simply turn their ship around right now. They've both invested heavily in simplifying their GPU and removing a lot of the creature-comforts people pay Nvidia for. 10 years ago we could at least all standardize on OpenCL, but these days it's all about proprietary frameworks and throwing competitors under the bus.

▲

kimixa

1 day ago

[-]

FYI AMD also has similar "accelerators", with the 9070 having separate ALU paths for wmma ("tensor") operations much like Nvidia's model - older RDNA2/3 architectures had accelerated instructions but used the "normal" shader ALUs, if a bit beefed up and tweaked to support multiple smaller data types. And CUDA cores are just what Nvidia call their normal shader cores. Pretty much every subunit on a geforce has a direct equivalent on a radeon - they might be faster/slower or more/less capable, but they're there and often at an extremely similar level of the design.

AMD also have on-die microcontrollers (multiple, actually) that do things like scheduling or pipeline management, again just like Nvidia's GSP. It's been able to schedule new work on-GPU with zero host system involvement since the original GCN, something that Nvidia advertise as "new" with them introducing their GSP (which just replaced a slightly older, slightly less capable controller rather than being /completely/ new too)

The problem is that AMD are a software follower right now - after decades of under-investment they're behind on the treadmill just trying to keep up, so when the Next Big Thing inevitably pops up they're still busy polishing off the Last Big Thing.

I've always seen AMD as a hardware company, with the "build it and they will come" approach - which seems to have worked for the big supercomputers who likely find it worth investing in their own modified stack to get that last few %, but clearly falls down selling to "mere" professionals. Nvidia, however, support the same software APIs on even the lowest end hardware, while nobody is likely running much on their laptop's 3050m in anger, it offers a super easy on-ramp for developers - and it's easy to mistake familiarity with superiority - you already know to avoid the warts so you don't get burned by them. And believe me, CUDA has plenty of warts.

And marketing - remember "Technical Marketing" is still marketing - and to this day lots of people believe that the marketing name for something, or branding a feature, implies anything about the underlying architecture design - go to an "enthusiast" forum and you'll easily find people claiming that because Nvidia call their accelerator a "core" means it's somehow superior/better/"more accelerated" than the direct equivalent on a competitor, or actually believe that it just doesn't support hardware video encoding as it "Doesn't Have NVENC" (again, GCN with video encoding was released before a Geforce with NVENC). Same with branding - AMD hardware can already read the display block's state and timestamp in-shader, but Everyone Knows Nvidia Introduced "Flip Metering" With Blackwell!

▲

trod1234

1 day ago

[-]

Its an open question they have never answered afaik.

I would speculate that their design is self-contained in hardware.

▲

zozbot234

1 day ago

[-]

AIUI, AMD documentation claims that the requirement for PCIe Atomics is due to ROCm being based on Heterogeneous System Architecture, https://en.wikipedia.org/wiki/Heterogeneous_System_Architect... which allows for a sort of "unified memory" (strictly speaking, a unified address space) across CPU and GPU RAM. Other compute API's such as CUDA, OpenCL, SYCL or Vulkan Compute don't have HSA as a strict requirement but ROCm apparently does.

▲

pjc50

1 day ago

[-]

So .. how's Nvidia dealing with this? Or do they benefit from motherboard manufacturers doing preferential integration testing?

▲

singhrac

1 day ago

[-]

I want to argue that graphics cards are really 3 markets: integrated, gaming (dedicated), and compute. Not only do these have different hardware (fixed function, ray tracing cores, etc.) but also different programming and (importantly) distribution models. NVIDIA went from 2 to 3. Intel went from 1 to 2, and bought 3 (trying to merge). AMD started with 2 and went to 1 (around Llano) and attempted the same thing as NVIDIA via GCN (please correct me if I'm wrong).

My understanding is that the reason is that the real market for 3 (GPUs for compute) didn't show up until very late, so AMD's GCN bet didn't pay off. Even in 2021, NVIDIA's revenue from gaming was above data center revenue (a segment they basically had no competition in, and 100% of their revenue was from CUDA). AMD meanwhile won the battle for Playstation and Xbox consoles, and was executing a turnaround in data centers with EPYC and CPUs (with Zen). So my guess as to why they might have underinvested is basically: for much of the 2010s they were just trying to survive, so they focused on battles they could win that would bring them revenue.

This high level prioritization would explain a lot of "misexecution", e.g. if they underhired for ROCm, or prioritized APU SDK experience over data center, their testing philosophy ("does this game work ok? great").

▲

brudgers

1 day ago

[-]

The market segmentation you describe makes a lot of sense to me. But I don't think the situation is a matter of under-investment and is instead just fundamental market economics.

Nvidia can afford to develop a comprehensive software platform for the compute market segment because it has a comprehensive share of that segment. AMD cannot afford it because it does not have the market share.

Or to put it another way, I assume that AMD's efforts are motivated rational economic behavior and it has not been economically rational to compete heavily with Nvidia in the compute segment.

AMD was able to buy ATI because ATI could not compete with Nvidia. So AMD's graphics business started out trailing Nvidia. AMD has had a viable graphics strategy without trying to beat Nvidia...which makes sense since the traditional opponent is Intel and the ATI purchase has allowed AMD to compete with them pretty well.

Finally, most of the call for AMD to develop a CUDA alternative is based on a desire for cheaper compute. That's not a good business venture to invest in against a dominate player because price sensitive customers are poor customers.

▲

quickthrowman

22 hours ago

[-]

> Finally, most of the call for AMD to develop a CUDA alternative is based on a desire for cheaper compute. That's not a good business venture to invest in against a dominate player because price sensitive customers are poor customers.

Nvidia’s gross margins are 80% on compute GPUs, that is excessive and likely higher than what cocaine and heroin dealers have for gross margins. Real competition would be a good thing for everyone except Nvidia.

▲

brudgers

20 hours ago

[-]

Competing on price when the entrenched incumbent has 80% margins does not sell me on the idea.

Or to put it another way, it would not be good for AMD.

▲

bigyabai

20 hours ago

[-]

80% isn't a ridiculous margin if nobody else is selling the same compute. Software margins famously go much higher, to as much as 95% or high-nines for cloud products. People pay the price hand over fist, because there simply isn't equivalent hardware to compete with. It's practically a steal for certain HPC customers that want the latest and greatest out of TSMC.

I agree with both your comment and the parent comment - serious competition could spell the end for CUDA's dominance. But there will never be serious competition, CUDA has the head-start and their competitors threw in the towel with OpenCL. Khronos can't get Apple to sign onto a spec and they can't get AMD to change their architecture - open GPGPU compute is stuck in neutral while Nvidia is shifting into 6th gear. Reality is that Nvidia could charge cloud-level margins and get away with it, because Apple is the only other TSMC customer with equivalent leverage and they pretend the server market doesn't exist.

▲

joe_the_user

23 hours ago

[-]

This is such a key point. Everyone wants cheaper and cheaper compute - I want cheaper and cheaper compute. But not large-ish company wants to simply facilitate cheapness - they would a significant return on their investment and just making a commodity is generally not what they want. Back in the days of the PC clone, the clone makers were relatively tiny and so didn't have to worry about just serving the commodity market.

▲

brudgers

23 hours ago

[-]

The demand for clones was also diffuse…the potential market included SMB and consumers and the market was exponentially expanding. The compute market is scaling linearly among a relative few established players who buy in bulk and have long B2B relationships with nVidia.

▲

danielmarkbruce

1 day ago

[-]

They likely haven't put even close to enough money behind it. This isn't a unique situation - you'll see in corporate america a lot of CEOs who say "we are investing in X" and they really believe they are. But the required size is billions (like, hundreds of really insanely talented engineers being paid 500k-1m, lead by a few being paid $3-10m), and they are instead investing low 10's of millions.

They can't bring themselves to put so much money into it that it would be an obvious fail if it didn't work.

▲

spacebanana7

1 day ago

[-]

Given how the big tech companies are buying hundreds of thousands of GPUs at huge prices, most of which is pure margin, I wonder whether it'd make sense for Microsoft to donate a couple billion to make the market competitive.

https://www.datacenterdynamics.com/en/news/microsoft-bought-...

▲

danielmarkbruce

1 day ago

[-]

The big players are all investing in building chips themselves.

And probably not putting enough money behind it... it takes enormous courage as a CEO to walk into a boardroom and say "I'm going to spend $50 billion, I think it will probably work, I'm... 60% certain".

▲

spacebanana7

1 day ago

[-]

You're probably correct, but I feel like I have to raise the issue of Zuckerberg spending a comparable amount on VR which was much more speculative.

▲

wavemode

1 day ago

[-]

Zuck is founder and owner. So is Huang (Nvidia CEO). They call all the shots.

Whereas AMD's CEO was appointed, and can be fired. Huge difference in their risk appetite.

I'm reminded of pg's article "founder mode": https://paulgraham.com/foundermode.html

I think some companies simply aren't capable of taking big risks and innovating in big ways, for this reason.

▲

hnlmorg

1 day ago

[-]

Zuckerberg owns Facebook though. It’s a lot easier to make bold decisions when you’re the majority shareholder.

Edit: though emphasis should be put on “easIER” because it’s still far from easy.

▲

danielmarkbruce

1 day ago

[-]

This. Without knowing the guy, he seems to be a) very comfortable taking a lot of risk and b) it's actually not that risky for him to blow $20 billion.

There aren't many cases like this. Larry/Sergey were more than comfortable risking $10 billion here and there.

▲

DanielHB

1 day ago

[-]

It amazes me how much these companies make actually gets spent on R&D, you see the funnel charts on reddit and I am like what the hell. Microsoft only spends ~6bn USD on R&D with a total 48bn of revenue and 15bn in profits?

What the hell is going on, they should be able to keep an army of PhDs doing pointless research even if only one paper in 10 years comes to a profitable product. But instead they are cutting down workforce like there is no tomorrow...

(I know, I know, market dynamics, value extraction, stock market returns)

▲

disgruntledphd2

1 day ago

[-]

R and D in the financial statements I've seen basically covers the entire product, engineering etc org. Lots and lots of people, but not what regular people consider RnD.

▲

DanielHB

6 hours ago

[-]

I know, R&D is like that in every company. It is mostly "development" not research.

What I am pointing out is that they could be doing a shit ton of research, what happened to big companies sponsoring fringe research? That used to be a thing, at Microsoft even.

▲

danielmarkbruce

1 day ago

[-]

aka "Big D, little r"

▲

laweijfmvo

1 day ago

[-]

well, look at Meta... they're spending Billions with a capital B on stuff and they get slaughtered every earnings call because it hasn't paid off yet. if Zuckerberg wasn't the majority share holder it probably wouldn't be sustainable.

▲

spmurrayzzz

1 day ago

[-]

CUDA isn't the moat people think it is. NVIDIA absolutely has the best dev ergonomics for machine learning, there's no question about that. Their driver is also far more stable than AMD's. But AMD is also improving, they've made some significant strides over the last 12-18 months.

But I think more importantly, what is often missed in this analysis is that most programmers doing ML work aren't writing their own custom kernels. They're just using pytorch (or maybe something even more abstracted/multi-backend like keras 3.x) and let the library deal with implementation details related to their GPU.

That doesn't mean there aren't footguns in that particular land of abstraction, but the delta between the two providers is not nearly as stark as its often portrayed. At least not for the average programmer working with ML tooling.

(EDIT: also worth noting that the work being done in the MLIR project has a role to play in closing the gap as well for similar reasons)

▲

martinpw

1 day ago

[-]

> But I think more importantly, what is often missed in this analysis is that most programmers doing ML work aren't writing their own custom kernels. They're just using pytorch (or maybe something even more abstracted/multi-backend like keras 3.x) and let the library deal with implementation details related to their GPU.

That would imply that AMD could just focus on implementing good PyTorch support on their hardware and they would be able to start taking market share. Which doesn't sound like much work compared with writing a full CUDA competitor. But that does not seem to be the strategy, which implies it is not so simple?

I am not an ML engineer so don't have first hand experience, but those I have talked to say they depend on a lot more than just one or two key libraries. But my sample size is small. Interested in other perspectives...

▲

spmurrayzzz

1 day ago

[-]

> But that does not seem to be the strategy, which implies it is not so simple?

That is exactly what has been happening [1], and not just in pytorch. Geohot has been very dedicated in working with AMD to upgrade their station in this space [2]. If you hang out in the tinygrad discord, you can see this happening in real time.

> those I have talked to say they depend on a lot more than just one or two key libraries.

Theres a ton of libraries out there yes, but if we're talking about python and the libraries in question are talking to GPUs its going to be exceedingly rare that theyre not using one of these under the hood: pytorch, tensorflow, jax, keras, et al.

There are of course exceptions to this, particularly if you're not using python for your ML work (which is actually common for many companies running inference at scale and want better runtime performance, training is a different story). But ultimately the core ecosystem does work just fine with AMD GPUs, provided you're not doing any exotic custom kernel work.

(EDIT: just realized my initial comment unintentionally borrowed the "moat" commentary from geohot's blog. A happy accident in this case, but still very much rings true for my day to day ML dev experience)

[1] https://github.com/pytorch/pytorch/pulls?q=is%3Aopen+is%3Apr...

[2] https://geohot.github.io//blog/jekyll/update/2025/03/08/AMD-...

▲

martinpw

1 day ago

[-]

Thanks for the additional information. I am still puzzled though. This sounds like it is a third party (maybe just a small group of devs?) doing all the work, and from your link they have had to beg AMD just to send them hardware? If this work was a significant piece of what is required to get ML users onto AMD hardware, wouldn't AMD just invest in doing this themselves, or at least provide much more support to these guys?

▲

spmurrayzzz

1 day ago

[-]

> This sounds like it is a third party (maybe just a small group of devs?) doing all the work

Just as a quantitative side note here — tinygrad has almost 400 contributors, pytorch has almost 4,000. This might seem small, but both projects have a larger people footprint than most tech companies' headcount that are operating at significant scale.

On top of that, consider that pytorch is a project with its origins at Meta, and Meta has internal teams that spend 100% of their time supporting the project. Coupled with the fact that Meta just purchased nearly 200k units worth of AMD inference gear (MI300X), there is a massive groundswell of tech effort being pushed in AMD's direction.

> wouldn't AMD just invest in doing this themselves, or at least provide much more support to these guys?

That was actually the point of George Hotz's "cultural test" (as he put it). He wanted to see if they were willing to part with some expensive gear in the spirit of enabling him to help them with more velocity. And they came through, so I think that's a win no matter which lens you analyze this through.

Since resources are finite, especially in terms of human capital, there's only so much to go around. AMD naturally can now focus more on the software closer to the metal as a result, namely the driver. They still have significant stability issues in that layer they need to overcome, so letting the greater ML community help them shore up the deltas in other areas is great.

▲

Vvector

1 day ago

[-]

Back in 2015, they were a quarter or two from bankruptcy, saved by the XBOX and Playstation contracts. Those years saw several significant layoffs, and talent leaving for greener pastures. Lisa Su has done a great job at rebuilding the company. But not in a position to hire 2000 engineers x few million comp (~$4 billion annually) even if there were people readily available.

"it'd still be a good investment." - that's definitely not a sure thing. Su isn't a risk taker, seems to prefer incremental growth, mainly focused on the CPU side.

▲

fancyfredbot

1 day ago

[-]

Where does the idea that engineers cost "a few million" come from? You might pay that much to senior engineering management, big names who can attract other talent, but normal engineers cost much less than a million dollars a year.

▲

Vvector

23 hours ago

[-]

OP said "Even if they needed to hire a few thousand engineers at a few million in comp each". That's where the number came from.

Nvidia seems to pay the bulk of their engineers 200k-400k. If the fully loaded cost is 2.2, then it's closer to 440k-880k per engineer. Probably 500k would be a good number to use

▲

red-iron-pine

1 day ago

[-]

they're not hiring 4 engineers, they're hiring a team.

and this isn't just developers, R&D and design are iterative and will require proofing, QA, prototyping -- and that means bodies who can do all of that.

▲

Zardoz89

1 day ago

[-]

They literally closed a deal hiring a 1100+ ZT Systems engineers yesterday.

▲

Vvector

5 hours ago

[-]

Those are mostly hardware engineers, not software engineers, right?

▲

ninetyninenine

1 day ago

[-]

This is the difference between Jensen and Su. It’s not that Jensen is a risk taker. No. Jensen focused on incremental growth of the core business while slowly positioning the company for growth in other verticals as well should the landscape change.

Jensen never said… hey I’m going to bet it all on AI and cuda. Let’s go all in. This never happened. Both Jensen and Su are not huge risk takers imo.

Additionally there’s a lot of luck involved with the success of NVIDIA.

▲

kbolino

1 day ago

[-]

I think this broaches the real matter, which is that nVidia's core business is GPUs while AMD's core business is CPUs. And, frankly, AMD has lately been doing a great job at its core business. The problem is that GPUs are now much more profitable than CPUs, both in terms of unit economics and growth potential. So they are winning a major battle (against Intel) even as they are losing a different one (against nVidia). I'm not sure there's a strategy they could have adopted to win both at the same time.

However, the next big looming problem for them is likely to be the shrinking market for x86 vs. the growing market for Arm etc. So they might very well have demonstrated great core competence, that ends up being completely swept away by not just one but two major industry shifts.

▲

dlewis1788

1 day ago

[-]

CUDA is an entire ecosystem - not a single programming language extension (C++) or a single library, but a collection of libraries & tools for specific use cases and optimizations (cuDNN, CUTLASS, cuBLAS, NCCL, etc.). There is also tooling support that Nvidia provides, such as profilers, etc. Many of the libraries build on other libraries. Even if AMD had the decent, reliable language extensions for general-purpose GPU programming, they still don't have the libraries and the supporting ecosystem to provide anything to the level that CUDA provides today, which is a decade plus of development effort from Nvidia to build.

▲

guywithahat

1 day ago

[-]

The counter point is they could make a higher level version of CUDA which wouldn't necessitate all the other supporting libraries. The draw of cuBLAS is that CUDA is a confusing pain. It seems reasonable to think they could write a better, higher level language (in the same vein as triton) and not have to write as many support libraries

▲

dlewis1788

1 day ago

[-]

100% valid - Nvidia is trying to address that now with cuTile and the new Python front-end for CUTLASS.

▲

Cieric

1 day ago

[-]

I can't contribute much to this discussion due to bias and NDAs, but I just wanted to mention, technically HIP is our CUDA competitor. ROCm is the foundation that HIP is being built on.

▲

stuaxo

5 hours ago

[-]

OT: The thing where I have to choose between ROCm or AmdGPU drivers is annoying.

Mostly stick to AmdGPU as it seems to work for other stuff, I'd like to be able to run the HIP stuff on there without having to change drivers.

▲

johnnyjeans

1 day ago

[-]

I wonder what the purpose is behind creating a whole new API? Why not just focus on getting Vulkan compute on AMD GPUs to have the data throughput of CUDA?

▲

Const-me

1 day ago

[-]

I don’t know answer to your question, but I recalled something relevant. Some time ago, Microsoft had a tech which compiled almost normal looking C++ into Direct3D 11 compute shaders: https://learn.microsoft.com/en-us/cpp/parallel/amp/cpp-amp-o... The compute kernels are integrated into CPU-running C++ in the similar fashion CUDA does.

As you see, the technology deprecated in Visual Studio 2022. I don’t know why but I would guess people just didn’t care. Maybe because it only run on Windows.

▲

Wumpnot

23 hours ago

[-]

Yes I found CPP AMP really interesting, but since it only ran on Windows..never used it for anything.

▲

Const-me

22 hours ago

[-]

It’s unfortunate they have deprecated it. We how have DXVK which implements D3D11, including compute shaders, for any platform which supports Vulkan. Making that (or a conceptually similar) thing work across platforms is no longer prohibitively expensive.

I believe that approach, i.e. the compute shaders, is the correct thing to do because modern videogames use them a lot, the runtime support is stable and performant now. No need for special HPC-only drivers or runtime components.

▲

fransje26

1 day ago

[-]

So if someone would like to, say, port a CUDA codebase to AMD, you would use HIP for a more or less 1-on-1 translation?

Any card you would recommend, when trying to replace the equivalent of a 3090/4090?

▲

markstock

16 hours ago

[-]

I can't recommend cards, but you are absolutely correct about porting CUDA to HIP: there was (is?) a hipify program in rocm that does most of the work.

▲

dagmx

1 day ago

[-]

AMD have actually made several attempts at it.

The first time, they went ahead and killed off their effort to consolidate on OpenCL. OpenCL went terribly (in no small part because NVIDIA held out on OpenCL 2 support) and that set AMD back a long ways.

Beyond that, AMD does not have a strong software division or one with the teeth to really influence hardware to their needs . They have great engineers but leadership doesn’t know how to get them to where they need to be.

▲

WithinReason

1 day ago

[-]

This is it, it's an organisational skill issue. To be fair, being a HW company and a SW company at the same time is very difficult.

▲

dagmx

1 day ago

[-]

It is but you have to be.

It’s been key to the success of their peers. NVIDIA and Apple are the best examples but even Intel to a smaller degree.

▲

latchkey

3 hours ago

[-]

https://x.com/AnushElangovan is now in charge of the software part of things and is making great progress very quickly.

▲

gmm1990

1 day ago

[-]

The idea that CUDA is the main reason behind Nvidia dominance seems strange to me. If most of the money is coming from Facebook and Microsoft they have their own teams writing code at a lower level than CUDA anyway. Even deepseek was writing stuff lower than that.

▲

brcmthrowaway

1 day ago

[-]

What is lower than CUDA? Compute shaders?

▲

a5ehren

1 day ago

[-]

PTX assembly. Deepseek used some of it to do a little bit of work that CUDA didn't have APIs for.

▲

brcmthrowaway

18 hours ago

[-]

Sadly platform specific

▲

johnnyjeans

1 day ago

[-]

> The delta between NVIDIA's value and AMD's is bigger than the annual GDP of Spain.

Nvidia is massively overvalued right now. AI has rocketed them into absolute absurdity, and it's not sustainable. Put aside the actual technology for a second and realize that public image of AI is at rock bottom. Every single time a company puts out AI-generated materials, they receive immense public backlash. That's not going away any time soon and it's only likely to get worse.

Speaking as someone that's not even remotely anti-AI, I wouldn't touch the shit with a 10 foot pole because of how bad the public image is. The moment that capital realizes this, that bubble is going to pop and it's going to pop hard.

▲

jjordan

1 day ago

[-]

Interesting perspective, I haven't noticed much if any public backlash against AI generation. What are some examples?

▲

runako

1 day ago

[-]

There's a spate of articles like this:

https://www.zdnet.com/article/how-to-remove-copilot-from-you...

https://www.tomsguide.com/computing/software/how-disable-cop...

https://www.asurion.com/connect/tech-tips/turn-off-apple-int...

https://www.reddit.com/r/GooglePixel/comments/1aunsyk/how_to...

https://mashable.com/article/how-to-turn-off-gemini-gmail-go...

etc. I know these are anecdotal, but think how odd it is for there to be "how to disable" articles about any tech. I don't remember seeing similar articles about how to remove e.g. the search feature from apps. Some people are definitely against this tech.

▲

themaninthedark

1 day ago

[-]

HN discussion: https://news.ycombinator.com/item?id=43518576

▲

johnnyjeans

1 day ago

[-]

For a recent example:

https://www.youtube.com/watch?v=V2eCWWaCzwQ

▲

spacebanana7

1 day ago

[-]

> AI has rocketed them into absolute absurdity, and it's not sustainable

Why isn't it sustainable? Their biggest customers all have strong finances and legitimate demand. Google and Facebook would happily run every piece of user generated content through an LLM if they had enough GPUs. Same with Microsoft and every enterprise document.

The VC backed companies and Open AI are more fragile, but they're comparatively small customers.

▲

ndiddy

1 day ago

[-]

IMO the closest analogue for Nvidia now is Cisco during the dot-com boom. Cisco sold the physical infrastructure required for Internet companies to operate. Investors all bought in because they figured it was a safe bet. Individual companies may come and go, but if the Internet keeps growing, companies will always need to buy networking equipment. Despite the Internet being way bigger than it was in 2000, and Cisco being highly profitable, Cisco's share price has never exceeded the peak it was at during the dot-com boom.

▲

fancyfredbot

1 day ago

[-]

Google may well want to run more of their content through an LLM, but they will not be using Nvidia hardware to do it, they'll be using their TPUs.

Amazon are on their third generation of in-house AI chips and Anthropic will be using those chips to train the next generation of Claude.

In other words, their biggest customers are looking for cheaper alternatives and are already succeeding in finding them.

▲

a5ehren

1 day ago

[-]

Google and Amazon still have to buy tons of Nvidia HW to provide in their clouds. No one writes to their custom chips besides internal teams because the software stack doesn't exist.

▲

pjc50

1 day ago

[-]

> Google and Facebook would happily run every piece of user generated content through an LLM if they had enough GPUs. Same with Microsoft and every enterprise document.

.. But how much actual value derives from this?

▲

spacebanana7

1 day ago

[-]

Youtube could conceivably put multi language subtitles on every video. Potentially even dub them.

But the "real value" would come from making adverts better targeted and more interactive. It's hard to quantity as a person outside of the companies, but the intuition for a positive value is pretty strong.

▲

johnnyjeans

1 day ago

[-]

> Youtube could conceivably put multi language subtitles on every video.

They already do this, it's opt-in.

> But the "real value" would come from making adverts better targeted and more interactive.

Is there any evidence to suggest that a transformer would be better at collaborative filtering than the current deep learning system that was custom engineered and built for this?

▲

the__alchemist

1 day ago

[-]

You state this very confidently. Are you shorting Nvidia stock? If not, why not?

▲

bee_rider

1 day ago

[-]

I see this form of argument sometimes here but I really don’t get it.

Lots of people don’t play the stock market or just invest in funds. It seems like just a way of challenging somebody that looks vaguely clever, or calls them out in a “put your money where your mouth is” sense, but actually presents no argument.

Anyway, if you want to short Nvidia you have to know when their bubble is going to pop to get much benefit out of it, right? The market can remain stupid for longer than you can remain solvent or whatever.

▲

the__alchemist

1 day ago

[-]

Spot on on the timing being important. I don't think you need to fine-tune it that much; short and hold until the pop happens. If you hold off for a the pop could happen at an indefinite time; maybe very far from now, then I think that invalidates the individual prediction.

One frustrating aspect of investing is that confident information is tough to come by. It's my take that if you have any (I personally rarely do), you should act on it. So, when someone claims confidently (e.g. with adjectives that imply confidence) that something's going to happen, then that's better than the default.

I don't have the insight the claimer does; my thought is: "I am jealous. I with I could be that confident about a stock's trajectory. I would act on it."

▲

themaninthedark

1 day ago

[-]

I was a student up until 2009; watching people talk about buying houses for 50K and selling them for 100K, everyone talking about easy money.

I knew things were bad when a friend of my sister was complaining that her father(a building framer) was not able to get a loan for a 500K house, something that his colleagues had been able to get. It took another 6 months before the collapse started to hit and the banks when up.

Timing is hard.

▲

mancerayder

18 hours ago

[-]

I agree with most everything you said, but the timing doesn't have to be exact and you don't have to short the stock to profit on its downfall. You can buy long-dated puts. Alas, they're not cheap because the risk is very real.

▲

jayd16

1 day ago

[-]

You can believe a stock is overvalued and also believe it will stay that way.

▲

the__alchemist

1 day ago

[-]

I agree; that's not what the post implies: > that bubble is going to pop and it's going to pop hard.

▲

the__alchemist

1 day ago

[-]

> But they didn't say soon.

Setting an indefinite timeline devalues any claim. You could prove this to yourself using Reductio ad absurdum, or by applying it to various general cases.

▲

jayd16

1 day ago

[-]

But they didn't say soon.

▲

chrisan

1 day ago

[-]

You imply you either would believe his word or would short nvidia yourself if he said he was. If not, why not?

▲

the__alchemist

1 day ago

[-]

Close - If I had the degree of confidence that post implies about Nvidia being overvalued, I would take an aggressive short position.

▲

dagw

1 day ago

[-]

I would take an aggressive short position.

Lots of very smart people have lost a lot of money by being completely right about the destination, but wrong about the path and how long it will take to get there.

▲

the__alchemist

1 day ago

[-]

> Lots of very smart people have lost a lot of money by being completely right about the destination, but wrong about the path and how long it will take to get there.

If you make a habit of this and still lose money, then either you statistically were very unlucky, or did not have a history of being right.

▲

dagw

1 day ago

[-]

The 'fun' things with shorts is that they have a fixed upside and infinite downside (ie if you go short $1000, the most you can earn is $1000, but you could lose any amount of money and much more than you invested. This is the opposite of buying a stock, where if you invest $1000, the most you can lose is $1000, but there is no limit to how much you can earn). You can be perfectly right 9 times out of 10, but that 1 time you're wrong can quickly wipe out everything you made from being right those 9 times.

▲

spacebanana7

1 day ago

[-]

The market can stay stupid for longer than we can stay liquid.

▲

johnnyjeans

1 day ago

[-]

I forbid myself from speculative trading as a consequence of idiosyncratic principles that I live my life by. One of many symbolic rejections of toxic profiteering that infests our neo-mercantile society. I have enough digits in my bank account that adding any more would be unambiguously greedy and distasteful, so in the end it would be violating my principles simply to debase myself. No thanks.

Anyways you'd need some kind of window of when a stock is going to collapse to short it. Good luck predicting this one.

▲

the__alchemist

1 day ago

[-]

I respect, and adore your philosophy.

For a short, I think you don't need that strong of a window. For an options combination, yes.

▲

cmrdporcupine

1 day ago

[-]

One thing I've learned the hard way is that industry trends -- and the stock valuations that go with them -- can stay irrational far longer than you can imagine.

▲

mixmastamyk

1 day ago

[-]

Too late, the stock is already down ~40%. This is not the worst time to buy.

▲

bee_rider

1 day ago

[-]

NVIDIA does GPUs and software. Intel does CPUs and software. AMD does GPUs and CPUs.

▲

btown

1 day ago

[-]

AMD was investing in a drop-in CUDA compatibility layer & cross-compiler!

Perhaps in keeping with the broader thread here, they had only ever funded a single contract developer working on it, and then discontinued the project (for who-knows-what legal or political reasons). But the developer had specified that he could open-source the pre-AMD state if the contract was dissolved, and he did exactly that! The project is active with an actively contributing community, and is rapidly catching up to where it was.

https://www.phoronix.com/review/radeon-cuda-zluda

https://vosen.github.io/ZLUDA/blog/zludas-third-life/

https://vosen.github.io/ZLUDA/blog/zluda-update-q4-2024/

IMO it's vital that even if NVIDIA's future falters in some way, the (likely) collective millennia of research built on top of CUDA will continue to have a path forward on other constantly-improving hardware.

It's frustrating that AMD will benefit from this without contributing - but given the entire context of this thread, maybe it's best that they aren't actively managing the thing that gives their product a future!

▲

bryanlarsen

1 day ago

[-]

ZLUDA is built on HIP which is built on ROCm. Both of the latter are significant efforts that AMD is pouring significant resources into.

▲

bryanlarsen

1 day ago

[-]

There's a massive amount of effort in https://github.com/orgs/ROCm/repositories?type=all

Throwing a vast amount of effort at something isn't sufficient.

▲

whywhywhywhy

1 day ago

[-]

The answer is in the question, because if they had the foresight to do such a thing the tech would already be here, instead they thought 1 dimensionally about their product, were part of the group that fumbled OpenCL and now they're a decade behind playing catch up.

▲

bluGill

1 day ago

[-]

A good group can catch up significantly in 2 years. They will still be behind, but if they are cheaper (or just you can buy them) that would still go a long way.

▲

whatever1

1 day ago

[-]

I think even with the trashy api and drivers if they release graphic cards with 4x the memory of the nvidia equivalents the community would put the effort to make them work.

▲

JohnBooty

1 day ago

[-]

Yeah. Easier said than done, I know, but they need to not just catch up to nVidia but leapfrog them somehow.

I would have said that releasing cards with 32GB+ of onboard RAM, or better yet 128GB, would have gotten things moving. They'd be able to run/train models that nVidia's consumer cards couldn't.

But I think nVidia closed that gap with their "Project Digits" (or whatever the final name is) PCs.

▲

DrNosferatu

1 day ago

[-]

This: provide cards with extra larger VRAM pools than the competition - to provide a real edge in LLM inferencing - and the users will come.

This happened with bitcoin.

▲

jsight

1 day ago

[-]

I think that it is really hard to be cheaper in the ways that really matter. Performance per watt matters a lot here, and NVidia is excellent at this. It doesn't seem like anyone else will be able to compete within at least the next couple of years.

▲

bryanlarsen

1 day ago

[-]

"good group" is carrying a lot of weight here. You can't buy that. You can buy good small groups, but AMD needs a good large group, and that can't be bought.

▲

Symmetry

1 day ago

[-]

The second article I ever submitted to Hacker News back in 2011 was on AMD's efforts to build a CUDA competitor[1]. I don't think it's lack of foresight.

[1]https://www.semiaccurate.com/2011/06/22/amd-and-arm-join-for...

▲

mayerwin

13 hours ago

[-]

Tinycorp (owned by George Hotz, also behind Comma.ai) is working on it after AMD finally understood that it was a no-brainer: https://geohot.github.io/blog/jekyll/update/2025/03/08/AMD-Y... Exciting times ahead!

▲

spellbaker

1 day ago

[-]

If AMD developers use AI deployed on nvidia hardware to create tools that complete against nvidia as a company but overall improves outcomes because of competition, would this be an example of co evolution observable in human time standards... I feel like ai is evolving, taking a stable form in this complex multi dimension multi paradigm sweet spot of an environment we have created, on top of this technical, social and governmental infrastructure and we're watching it live on discovery tech filtered into a 2d video narrated by some idiot who has no right to be as confident as he sounds. I'm sorry I'm on withdrawal from quitting mass media and I'm very bored.

▲

fulladder

1 day ago

[-]

>I'm sorry I'm on withdrawal from quitting mass media and I'm very bored.

Good choice! So many people doing that these days.

▲

rinka_singh

9 hours ago

[-]

There's OpenCL https://en.wikipedia.org/wiki/OpenCL - this BTW, also runs on NVIDIA GPUs too...

OpenCL is completely open (source) and so why wouldn't we, all of us, throw our weight behind OpenCL.

(no, I have no connection with them and have nothing to do with them, other than having learned a bit).

▲

fulladder

1 day ago

[-]

I've been telling people for years that NVIDIA is actually a software company, but nobody ever listens. My argument is that their silicon is nothing special and could easily be replicated by others, and therefore their real value is in their driver+CUDA layer.

(Maybe "nothing special" is a little bit strong, but as a chip designer I've never seen the actual NVIDIA chips as all that much of a moat. What makes it hard to find alternatives to NVIDIA is their driver and CUDA stack.)

Curious to hear others' opinions on this.

▲

latchkey

19 hours ago

[-]

Nobody has mentioned this, but https://docs.scale-lang.com/ is doing some amazing work on this front. Take CUDA code, compile it and output a binary that runs on AMD. Michael and his team working on this are brilliant engineers.

▲

euos

1 day ago

[-]

CUDA is over a decade of investment. I left CUDA toolkit team in 2014 and it was probably around 10 years old back then. Can't build something comparable fast.

▲

cavisne

22 hours ago

[-]

The problem is the hardware not the software, and specifically not CUDA. Triton for example writes PTX directly (a level below CUDA). Trying to copy Nvidia hardware exactly means you will always be a generation behind, so they are forced to try and guess what different direction to take that will be useful.

So far those guesses haven't worked out (not surprising as they have no specific ML expertise and are not partnered with any frontier lab), and no amount of papering over with software will help.

That said I'm hopeful the rise of reasoning models can help, no one wants to bet the farm on their untested clusters but buying some chips for inference is much safer.

▲

geor9e

23 hours ago

[-]

If it's a question about entrenched corporate dysfunction, I can't answer it. Most people's answers are wild guesses at best.

If it's a question of first principles, there is a small glimmer of hope in a company called tinygrad making the attempt - https://geohot.github.io//blog/jekyll/update/2025/03/08/AMD-...

If the current 1:16 AMD:NVIDIA stock value difference is entirely due to the CUDA moat, you might make some money if the tide turns. But who can say…

▲

WithinReason

1 day ago

[-]

This is the best article on why OpenCL failed:

https://www.modular.com/blog/democratizing-ai-compute-part-5...

▲

dachworker

1 day ago

[-]

Another possible reason might be outreach. NVIDIA spends big money on getting people to use their products. I have worked at two HPC centers and at both we had NVIDIA employees stationed there, whose job it was to help us get the most out of the hardware. Besides that, they also organize Hackatrons and they have dedicated software developer programs for each common application, be it LLMs, Weather Prediction or Protein Folding, not to mention dedicated libraries for pretty much every domain.

▲

z3phyr

1 day ago

[-]

In turn I will raise you the following: Why are GPU ISA trade secrets at all? Why not open them up like CPU ISAs, get rid of specialized cores and let compiler writers port their favorite languages to compile into native GPU programs? Everyone will be happy. Game devs will be happy with more control over the hardware, Compiler devs will be happy to run haskell or prolog natively on GPUs, ML devs will be happier, NVIDIA/AMD will be happier with taking the MainStage.

▲

jtatarchuk

23 hours ago

[-]

Ya'll should have come to Beyond CUDA.

https://www.linkedin.com/posts/jtatarchuk_beyondcuda-democra...

▲

Lerc

1 day ago

[-]

I made a post here a while back suggesting an investment strategy of spending one billion on AMD shares and one billion on software developers to 3rd party write a quality support stack for their hardware. I'm still not sure if its a crazy idea.

Actually it might be better to spend 1B on shares and 10x 100M on development and take ten attempts in parallel and use the best of them.

▲

whiplash451

1 day ago

[-]

What’s the point of buying the 1B worth of AMD shares in your strategy ? (I get the other half)

▲

Lerc

1 day ago

[-]

That's your profit source. Make AMD as viable an option as nVidia and you massively increase their stock price. (and probably reduce nVidia at the same time if you're in the shorting game)

▲

mancerayder

19 hours ago

[-]

Is this why the stock is in the toilet for years now? It seems it missed the AI bubble, at least from an investor standpoint, like there's tremendous skepticism.

▲

x0nr8

1 day ago

[-]

HIP is definitely a viable option. In fact with some effort you can port large CUDA projects to be compilable with the HIP/ AMD-clang toolchain. This way you don’t have to rewrite the world from scratch in a new language but still be able to run GPU workloads on AMD hardware.

▲

noboostforyou

1 day ago

[-]

Maybe this is an overly cynical response but the answer is simply that they cannot (at least not immediately). They have not invested enough into engineering talent with this specific goal in mind.

▲

christkv

1 day ago

[-]

I don't think its that bad. The focus will turn to inference going forward and that eventually means a place for AMD and maybe even Intel. Eventually it will be all about the efficiency of inference in watts.

That switch will reduce the NVIDIA margins by a lot. NVIDIA probably has 2 years left of being the only one with golden shovels.

▲

electriclove

1 day ago

[-]

Until they have another hit (gaming -> mining -> AI -> ?)

▲

londons_explore

1 day ago

[-]

AMD's CEO is the cousin of Nvidias CEO.

Neither will encroach too much on the others turf. The two companies don't want to directly compete on the things that really drive the share price.

▲

a5ehren

1 day ago

[-]

Distant cousins, and they had never met until they were both CEOs. This is a stupid trope and people need to stop using it.

▲

londons_explore

11 hours ago

[-]

The long history of both companies doing a half-assed job of entering the others markets speaks differently.

Every 'run cuda on AMD' project gets bought then cancelled. Every cuda alternative is understaffed and cancelled as soon as it starts to gain any traction, etc.

To me it looks like a clear attempt to not hurt each others businesses whilst being deny-able if anyone accuses them of unfair market practices/collusion.

▲

shmerl

1 day ago

[-]

What about Rust for GPU programming? I wonder why AMD doesn't back such kind of effort as an alternative.

▲

echelon

1 day ago

[-]

Leadership. At the end of the day, the buck stops with leadership.

If they wanted to prioritize this, they would. They're simply not taking it seriously.

▲

spacebanana7

1 day ago

[-]

I always thought of Lisa Su as an effective leader, but this does make me question it.

▲

setgree

1 day ago

[-]

I thought so too until I read a Stratechery interview that painted a pretty bad portrait of her time at Sony [0].

I tend to think that AMD looks well run when compared with Intel [1], but when you consider Nvidia as the relevant counterfactual [2], things don't look so good.

I wrote about this here [3].

[0] https://news.ycombinator.com/item?id=40697341

[1] https://news.ycombinator.com/item?id=41446766

[2] https://news.ycombinator.com/item?id=39344815

[3] https://setharielgreen.com/blog/amd-also-seems-to-be-flounde...

▲

adra

1 day ago

[-]

Why, because 90% of her job is talking to and appeasing shareholders, grand standing with fat whales, and what else.. what do you think a CEO at these companies actually does? They aren't in the trenches of each subdivision nurturing and cracking whips. She likely attends a 2 hour briefing with a line item: CUDA parity project: on schedule release date not set

▲

fulladder

1 day ago

[-]

Fair points. I think the difference is that AMD has other businesses that are much larger than GPUs, whereas NVIDIA was always just about GPUs.

▲

grg0

20 hours ago

[-]

And why do you think that? Have you worked with her personally, or is it maybe because Fortune spent a decade preaching "women in tech" regardless of what those women may actually be doing?

▲

acdha

20 hours ago

[-]

Do you think AMD out-performing Intel for years might have something to do with it? I know that doesn’t fit whatever axe you have to grind against “women in tech” but it seems more in keeping with the business world’s widely recognized tendency to credit CEOs for the decisions made by large numbers of people who work for them but don’t get up on stage.

▲

grg0

19 hours ago

[-]

Intel mostly just shot itself in the foot; easy to "outperform" a handicapped opponent, and there is no merit in doing so. x86 is also a piece of shit in the grander scheme of things, and will be eaten away by alternative hardware soon. So no, I don't think that has anything to do with it, which is why I was asking a legitimate question. Do people just regurgitate the crap they read online, or do they actually put thought into the beliefs that they form?

Agree with your last statement, though. For-hire CEOs--as opposed to founder-CEOs, who often have an actual vision--take way more credit for their "work" than they deserve.

Also rofl at that "axe to grind". That one took me offline for a few seconds.

▲

petesergeant

1 day ago

[-]

Also why haven’t MS released a decent set of ML-tooling for TypeScript?

▲

atemerev

1 day ago

[-]

HIP is now somewhat viable (and ROCm is now all HIP).

But — too late. First versions of ROCm were terrible. Too much boilerplate. 1200 lines of template-heavy C++ for a simple FFT. Can't just start hacking around.

Since then, the CUDA way is cemented in minds of developers. Intel now has oneAPI, and it is not too bad, and hackable, but there is no hardware and no one will learn it. And HIP is "CUDA-like", so why not CUDA, unless you _have to_ use AMD hardware.

Tl;dr first versions of ROCm were bad. Now they are better, but it is too late.

▲

postalrat

1 day ago

[-]

Because they dont want nvidia to be in control of their own development efforts.

▲

jsheard

1 day ago

[-]

The question was why don't they have anything as good as CUDA, not why don't they adopt CUDA itself.

▲

cmrdporcupine

1 day ago

[-]

Isn't the "goodness" of CUDA really down to its mass adoption -- and therefore its community and network effects -- not strictly its technical attributes?

If I recall, there are various "GPU programming" and "AI" efforts that have existed for AMD GPUs, but none of them have had the same success in large part because they're simply non-"standard?"

▲

fulladder

1 day ago

[-]

I don't think it's just adoption and network effects, though that is part of the equation. The other big (bigger?) piece is that the CUDA landscape is very complete, with libraries and examples for many different kinds of use cases, and they are well documented and easy to get started with. Ctrl+F this page for "ecosystem" and you'll find another comment that explains it better than I can.

▲

em500

1 day ago

[-]

I thought OpenCL was supposed to be the "standard"? From the Wikipedia page, it's largely vendor neutral and not that much younger than CUDA (initial release Aug 2009 vs Feb 2007). Maybe some more knowledgeable people can comment why it seems to have been outcompeted by the proprietary CUDA?

▲

the__alchemist

1 day ago

[-]

CUDA has a comparatively nicer user experience. If you would like to understand tacitly and have an nvidia GPU, try writing a simple program using both. (Something highly parallel, like nbody, for example)

▲

fulladder

1 day ago

[-]

OpenCL was definitely supposed to be the standard, but as the sibling suggests, it's just much harder to use than CUDA. Also, it doesn't feel like OpenCL has much of a community around it, documentation doesn't seem great, and just the overall experience is very frustrating. I tried to implement something in OpenCL about 5 years ago, thinking it would be fairly trivial to port a simple compute shader from CUDA, and ended up giving up. Just too difficult.