FilterHN

RTX 5090 and M4 MacBook Air: Can It Game?

280 points

by allenleee

3 hours ago

| past

| 16 comments

| scottjg.com

| HN

▲

matthewfcarlson

2 hours ago

[-]

I have been bothering the VM team for years for VM GPU pass through. I worked on the Apple Silicon Mac Pro and it would have made way more sense if you could run a linux VM and pass through the GPU that goes inside the case!

Sadly, as you can tell, they have not taken me up on my requests. Awesome that other people got it working!

▲

scottjg

14 minutes ago

[-]

two semi interesting things to note around this:

1. Virtualization.framework seems to support some form of GPU passthrough from the host (granted, not eGPU - it's for the integrated GPU). I think the primary use case is having macOS guests get acceleration, while still sharing GPU time with the host. There is also a patch that recently hit QEMU mainline that supports using the "venus server" with virtio-gpu to support a similar functionality for Linux guests under Hypervisor.framework.

2. Apple internally has some kind of PCI Passthrough support available in Hypervisor.framework. It seems like the code is shipped to customers in the framework, but it relies on some kind of kext or kernel component that isn't shipped in retail macOS. I can't say if that's intended to ever be released to customers, but clearly someone at Apple has thought about this the feature.

▲

m132

1 hour ago

[-]

It looks like the pass through part here was implemented using standard DriverKit interfaces, if I'm not mistaken. That is, the PCIe BAR can already be mapped from the user-space, without any extra modifications to macOS. It's just a matter of VMMs, such as QEMU, adopting this interface in addition to Linux VFIO and the like (unless you're talking about Virtualization.framework, which is kind of a VMM of its own).

What exactly do you feel like macOS is missing?

▲

anp

1 hour ago

[-]

I’m not very familiar with the specifics of pass through but IIUC only being able to map 1.5gb of active DMA buffers at a time is pretty limiting.

▲

caycep

1 hour ago

[-]

What are the chances there will be another Mac Pro in the future?

Will Apple ever make a computer that makes Siracusa happy? (and do you have the "Believe" shirt?)

▲

pjmlp

27 minutes ago

[-]

Never, a couple of years ago Apple gave up on the server market, that is why having Swift on Linux is so relevant for app developers.

Now they gave up on the workstation market that really enjoys their slots for all myriad of cards.

Having a thunderbolt cable salad is only for those that miss external extensions from 8 and 16 bit home computer days.

Which is clearly what Apple is nowadays focused, if you look back at the vertical integrations before the PC clones market took off.

So now if you really need a workstation, it is either Windows, or one of those systems sold with Red-Hat Enterprise/Ubuntu from IBM, Dell , HP.

▲

kahrl

40 minutes ago

[-]

It's been a LOOOONG time since final cut pro was a killer app. In the foreseeable future, they will be selling TokTok machines to illiterates with dopamine problems. Highly doubt.

▲

crdrost

2 hours ago

[-]

It feels like half the problem in this blog post is dealing with memory access issues induced by QEMU and the VM boundary... it's probably something dumb I'm missing, but if you boot up Ubuntu in Docker, wouldn't the NVIDIA drivers still load? And then you wouldn't have to fight Apple about the memory management because OSX would still own the memory?

▲

swiftcoder

1 hour ago

[-]

> but if you boot up Ubuntu in Docker, wouldn't the NVIDIA drivers still load?

Even if the drivers loaded, they can't talk to the GPU from within docker (unless one implements PCI passthrough). MacOS owns the PCI bus in this scenario.

▲

smw

59 minutes ago

[-]

docker on macos runs in a linux vm

▲

jmalicki

1 hour ago

[-]

The driver wants to own the memory is the problem.

▲

brcmthrowaway

1 hour ago

[-]

I still believe the lack of NVIDIA GPU support in the Mac Pro will go down as one of the greatest missed opportunities in tech.

Anyway, the Mac Pro is dead now. There's only so much sales audio and video professionals can provide.

▲

pjmlp

24 minutes ago

[-]

The missed opportunity is like with server market, now giving the workstation market to Windows and Linux.

It isn't only audio and video.

▲

Aurornis

1 hour ago

[-]

> I still believe the lack of NVIDIA GPU support in the Mac Pro will go down as one of the greatest missed opportunities in tech.

I don’t know about that. Apple supported some full size GPUs in past product lines and the number of users was very small. Granted, LLMs change that demand but the audience for Mac Pro buyers who would use a full-size GPU that is impossible to obtain is almost nothing compared to their laptop sales.

▲

bigyabai

1 hour ago

[-]

The audience for Mac Pro buyers is almost nothing, full stop. It failed to find a niche, and now Apple is getting rid of it: https://www.macrumors.com/2026/03/26/apple-discontinues-mac-...

Part of the reason the new Mac Pro failed to find an audience can definitely be blamed on macOS' hostility to third party hardware. Who knows what Apple would be worth if they beat Nvidia's Grace CPU to the datacenter market. It was certainly their opportunity.

▲

pjmlp

22 minutes ago

[-]

Yes, because they already moved on to workstations powered by either Windows or Red-Hat Linux/Ubuntu.

The only ones left were people like John Siracusa that still hoped to the very last minute, that Apple would change their mind.

▲

brcmthrowaway

58 minutes ago

[-]

True, they could do any number of things. But a datacenter play would appear quite random to investors and their core audience. Broadcom + Nvidia however...

▲

trollbridge

33 minutes ago

[-]

Apple seems to be content to sell shovels in the AI gold rush.

Admittedly… what’s on my desk? A MacBook M4 Air, a Mac Studio, and there’s an x86 iMac in the corner.

What goes in the travel bag? A MacBook Pro or the Air.

Every time I look at buying something else the math doesn’t add up.

The 5090 sits in a commodity PC chassis. It’s not like I need a model running on my own computer.

▲

jbverschoor

1 hour ago

[-]

I guess that little problem with the Nvidia chips overheating in the MacBook Pro didn’t give Apple a lot of confidence

▲

bigyabai

1 hour ago

[-]

The Mac Pro isn't a Macbook Pro. It has socketed PCI slots and should be able to support the user's hardware in macOS' software, regardless of how Apple feels.

▲

Aurornis

1 hour ago

[-]

Excellent article.

The game benchmarks are fun but the LLM improvements are where this gets really interesting for practical use. I love Apple platforms as an approachable way to run local models with a lot of RAM, but their relatively slow prompt processing speed is often overlooked.

> Here you can see the big issue with Macs: the prompt processing (aka “prefill”) speed. It just gets worse and worse, the longer the prompt gets. At a 4K-token prompt, which doesn’t seem very long, it takes 17 seconds for the M4 MacBook Air to parse before we even start generating a response. Meanwhile, if you strap the eGPU to it, it’ll only take 150ms. It’s 120x faster.

The prefill problem goes unnoticed when you’re playing around with the LLM with small chats. When you start trying to use it for bigger work pieces the compute limit becomes a bottleneck.

The time to first token (TTFT) charts don’t look bad until you notice that they had to be shown on a logarithmic scale because the Mac platforms were so much slower than full GPU compute.

▲

superlopuh

1 hour ago

[-]

I'm curious and not an expert here, do you know why the TTFT is so much worse on Mac? To elaborate, the article just says that this step is compute bound, but I'm wondering whether it is just that simple or if it might also be less optimised in MLX?

▲

ademeure

1 hour ago

[-]

Apple GPUs didn’t have tensor cores until the M5 (aka “a neural accelerator in each core”) and in the article’s charts that a M5 Pro significantly beats a M4 Max (while in other workloads it would be much smaller since Pro is ~1/2 Max).

EDIT: since Aurornis beat me by 3 minutes, I’ll add another interesting tidbit instead :)

NVIDIA tensor cores on consumer GPUs are massively less powerful per SM core than on their datacenter counterparts-parts (which also makes them easier to get to peak efficiency on consumer GPUs because the rest of the pipeline is much more quickly a bottleneck as per Amdahl’s Law).

This is potentially changing with Vera Rubin CPX which looks an awful lot like a RTX 5090 replacement but with the full-blown datacenter tensor cores (that won’t be available unless you pay for the datacenter SKU) - so it will have very high TFLOPS relative to its bandwidth.

The target market for the CPX is exactly this: prefill and Time To First Token. You can basically just throw compute at the problem for (parts of) prefill performance (but it won’t help anything else past a certain point) and the 5090/M5 are nowhere near that limit.

So the design choice for NVIDIA/Apple/etc of how much silicon to spend for this on consumer GPUs is mostly dictated by economics and how much they can reuse the same chips for the different markets.

▲

Aurornis

1 hour ago

[-]

Prefill (prompt processing) is compute bound doing large matrix operations. Token generation (aka tokens/s) is memory bandwidth bound.

The RTX 5090 has an incredible amount of compute performance for matrix operations and a lot of memory bandwidth. The Apple Silicon parts have unusually high memory bandwidth for general purpose compute chips, which is why they can generate tokens so fast. Their raw matrix compute performance is amazing for their power envelope but not nearly as fast as a dedicated GPU consuming 400-500W.

Apple added tensor cores on the M5 generation which help with those matrix operations, which is why the M5 performs so much better than the M4 Max in that article.

Dedicate GPUs like the RTX 5090 are in another league, though.

You can see the divergence in the high resolution gaming benchmarks, too. Once he starts benchmarking at 4K or 6K where the CPU emulation stops being a bottleneck, the raw compute of the 5090 completely crushes any of the Apple Silicon GPUs.

▲

mathisfun123

1 hour ago

[-]

> I'm curious and not an expert here, do you know why the TTFT is so much worse on Mac?

because the GPUs aren't as fantastic as everyone assumes?

> might also be less optimised in MLX?

prefill has gotta be one of the most optimized paths in MLX...

▲

Moosdijk

38 minutes ago

[-]

It feels pedantic to point it out, but it’s actually 113x faster.

Seeing the author present their results like this give off the impression that they’re biased, which I am sure they aren’t.

▲

djmips

47 minutes ago

[-]

> Because OpenGL is not well-supported anymore on macOS, the game is completely unplayable there, even with CrossOver. Ironically, it plays totally fine on a Windows PC, but this is a game you literally can’t play on Mac without this eGPU setup.

I understand that this is true it seems that Doom does support Vulkan but you would need to add VK_NV_glsl_shader to MoltenVK. Probably much less work than what went into hanging an RTX 5090 off of a M4. Still, kudos to the scott and the local AI Inference speeds are pretty cool. What a crazy project! <applause>

▲

mywittyname

2 hours ago

[-]

> As much as I hate to admit it, step one in most of my projects now is to ask AI about it. Maybe it’ll tell me something I don’t know.

Or, more likely, it will tell you something it doesn't know.

Reminds me of yesterday, when I was arguing with ChatGPT that the 5070TI was an actual video card. It kept trying to correct me by saying I must have meant a 4070ti, since no such 5070ti card exists.

▲

collabs

2 hours ago

[-]

Or, it will acknowledge that it made a mistake and continue to make the same mistake again.

I asked Claude to generate an HTML page about PowerShell 7. It gave me a page saying 7.4 was the latest LTS release. I corrected it with links showing 7.6 was released in March and asked it to regenerate with the latest information.

It generated basically the same page with the same claim that 7.4 was the latest release.

▲

ericmay

1 hour ago

[-]

> Or, it will acknowledge that it made a mistake and continue to make the same mistake again.

People do this too though. At least the AI generally tries to follow instructions that you give it even when you are lacking clarity in the details.

I feel like it's similar to the self-driving car problem. The car could have 99.9999% reliability, drive much better and safer than a human, yet folks will still freak out about a single mistake that's made even though you have actual humans today driving the wrong way down the highway, crashing in to buildings, drunk driving, stealing cars, and all sorts of other just absolutely stupid things.

We need to move away from this idea that because it's an AI system it should give you perfect responses. It's not a deterministic system and it can be wrong, though it should get better over time. Your Google search results are wrong all the time too. The NYT writes things that are factually incorrect. Why do we have such a high standard for these models when we don't apply them elsewhere?

▲

bryceacc

1 hour ago

[-]

>I corrected it with links

it should be reasonably expected that you can give a source and fix an error in the AI output.

I would even go as far as to say if a human directly told the AI "no, use 7.6 as the latest version", the AI should absolutely follow direct instructions no matter what it thinks is true. What if this human was working on a slide about the upcoming release of 7.6 that has no public documentation?

▲

applfanboysbgon

1 hour ago

[-]

> Your Google search results are wrong all the time too. The NYT writes things that are factually incorrect.

This is also very bad and people complain about these things all the fucking time.

> Why do we have such a high standard for these models

Because Altman and Amodei are defrauding investors out of hundreds of billions of dollars on the promise that they will replace the entire workforce. Of course people are going to point out the emperor has no clothes when half of our society is engaged in mass hysteria worshipping these fucking things as the next industrial revolution, diverting massive amounts of resources to them, and ruining HN with 10 articles on the front page per day about how software engineering is dead.

▲

ericmay

1 hour ago

[-]

> This is also very bad and people complain about these things all the fucking time.

So at worst these AI tools are as bad as the existing system. Worth complaining about? Absolutely. Worth holding to much higher standards? Nah I don't think so. Not at this stage at least. And folks are just disappointing themselves by setting up straw men expectations.

These tools are non-deterministic systems (like humans) which sometimes don't do exactly what you want (like humans) but are also extremely fast, much cheaper (for now), and have domain knowledge generation that is much broader than any single human has. Like anything else, there are pros and cons.

▲

applfanboysbgon

1 hour ago

[-]

They aren't "straw man expectations" when the entire US economy is now oriented around those expectations.

▲

dvlsg

1 hour ago

[-]

> ruining HN with 10 articles on the front page per day about how software engineering is dead.

Even this article, which is theoretically about playing games on a MacBook and not about AI, has devolved into AI discussions. It's honestly kind of tiring.

I suppose the article invites it by putting an AI blurb up top, and I suppose I'm also not helping by adding my own comment, but _still_.

▲

reaperducer

38 minutes ago

[-]

The NYT writes things that are factually incorrect. Why do we have such a high standard for these models when we don't apply them elsewhere?

The New York Times publishes a "corrections" section in each issue. Let me know where I can view the 60TB file where ChatGPT fesses up to its daily fails.

▲

corry

1 hour ago

[-]

LLMs are (broadly-speaking) poorly-positioned to give you a strong verdict on plausibility of a frontier topic. That said - ChatGPT was exactly right in its response to OP!

"Very deep", "border-line impractical" "in a research-sense" is the perfect summary of this article itself! :)

▲

perarneng

2 hours ago

[-]

This is why i use grok expert mode. It agressivly goes out searching the web for info. Its so much better then relying on year old data.

▲

_blk

2 hours ago

[-]

Yes, I really like that about Grok. It had a few good qualities but it was too verbose so now it's mostly Claude.

▲

JumpCrisscross

2 hours ago

[-]

Solid compromise is Kagi's research assistant. Aggressively cites, unlike Claude. Concise, unlike Grok.

▲

funimpoded

1 hour ago

[-]

Watching the entire economy of a superpower and ~all of online culture go absolutely ga-ga over Furbys has been one of the weirdest things I've ever witnessed.

▲

Apocryphon

1 hour ago

[-]

Eh, in this use case it's more like a goofy search engine.

▲

amluto

1 hour ago

[-]

At least ChatGPT is now aware that Codex exists. I have a chat, still in my history, from a few months ago, in which I asked for help wrangling npm to get @openai/codex working, and ChatGPT said:

> Important: Codex CLI no longer exists

> OpenAI discontinued the Codex model + CLI a while back. There is no official binary named codex in any current OpenAI npm packages. OpenAI’s current CLI tool is:

    npm install -g openai

> which installs the openai command, not codex.

The world knowledge of these models is not necessarily up to date :)

edit: I replayed the same prompt into current ChatGPT and it is less clueless now. Maybe OpenAI noticed that it was utterly dumb that GPT-5.whatever didn't believe that Codex existed and fine-tuned it.

▲

sigmoid10

1 hour ago

[-]

>The world knowledge of these models is not necessarily up to date :)

It's amazing how this still needs to be said. Codex was released in April 2025. The initial GPT-5 and 5.1 still had a knowledge cutoff in late 2024. Like, what did you expect? Always beware the knowledge cutoff for LLMs (although recent releases have gotten much better with researching the web for updates before answering modern software topics).

▲

38 minutes ago

[-]

OpenAI being more aware of the implications would help too--last year I also struggled with using Codex to write scripts to run Codex headless, because it kept insisting that Codex was a retired model from the GPT-3 days and not a program that could be called by a script.

▲

simonh

2 hours ago

[-]

It’s training data only goes up to late 2024 or early 2025 so that might be why, though it does have access to the internet.

▲

mywittyname

1 hour ago

[-]

Yeah, the solution was to link it to the nvidia page of the card, then it was like, 'oh, okay.' But at that point, I lost faith in it's ability to provide me with the information I was looking for. If it's information is so out of date that it doesn't know about the 5000 series, how could I be confident that it knew the details I was asking about (game engine related research)?

▲

asats

1 hour ago

[-]

Are you using the instant model?

▲

reaperducer

32 minutes ago

[-]

You're holding it wrong.

▲

weird-eye-issue

2 hours ago

[-]

Depending on your ChatGPT settings...

▲

divbzero

2 hours ago

[-]

This is pretty impressive. My impression was that eGPUs simply do not work with Apple Silicon.

(EDIT: Apple agrees with my impression. “To use an eGPU, a Mac with an Intel processor is required.” And, on top of that, the officially supported eGPUs were all AMD not NVIDIA. https://support.apple.com/en-us/102363)

▲

arjie

38 minutes ago

[-]

Wait, this is incredible. I have a spare 5090 lying around and run a claw-like on my M4 Mini. Just plugging it into some sort of 3D print frame for stability and plugging it into the TB port might get me a pretty viable tool for local inference. Would need something neat to ensure the power etc. is well fed.

The problem is `max-num-seqs` and `max-model-len` fight each other, and unless you're in the pure single-client mode you'll need multiple slots so to speak.

▲

SamiahAman

43 minutes ago

[-]

Very nice effort. This has incredible technical depth, particularly in the DMA and QEMU sections. I also like that you didn't oversell it as the ideal Mac gaming solution. I found the AI inference results to be the most fascinating. Overall, it was a great read.

▲

swiftcoder

2 hours ago

[-]

This is proper mad science, love it

▲

delbronski

2 hours ago

[-]

Nicely done! Glad to see real hacking is still alive in the age of AI.

▲

coder68

2 hours ago

[-]

This seems pretty useful for AI inference if it can pass Apple approval. I've wanted to use my Nvidia GPUs with a Mac Mini, this would enable it to run CUDA directly. Very cool!

▲

frollogaston

2 hours ago

[-]

I'm guessing the x86 emu is cause Windows games are rarely built for ARM, right? Was kinda curious how an ARM VM would fare. Anyway awesome article.

▲

hparadiz

2 hours ago

[-]

Yes. Valve has done a ton of work here because it's required to be able to run x86 games on a Steam Frame which has an ARM cpu.

▲

hypercube33

2 hours ago

[-]

Steam deck runs a full x86-64 AMD APU. The work valve has done for that was to get Windows games to run seamlessly on Linux.

Hopefully in 2026 the Valve Index VR headset which is ARM (Qualcomm?) we get what you're talking about here - basically proton for Win32/64 to Linux ARM64.

Side note that Windows on ARM isn't bad just that its priced out of its league and cooling is awful for gaming on current laptops. The only issue I had was OpenGL needing some obscure GL on DirectX thing for Maya3D to get games to work.

▲

delecti

1 hour ago

[-]

To keep the chain of Cunningham's Law going, Valve's 2026 headset is called the Steam Frame, not the Index (which came out in 2019).

But Valve's ARM efforts even mean that Android devices can play some (mostly less graphically intensive) Steam games. That makes me very excited about the prospects for the future of gaming handhelds.

▲

sva_

2 hours ago

[-]

As sibling pointed out, the Steamdeck basically runs a Ryzen 3 7335U which is x86.

▲

bigyabai

2 hours ago

[-]

The Steam Deck is pure x86, it's not an ARM-based CPU. The Steam Frame might be what you're thinking of.

▲

hparadiz

2 hours ago

[-]

You're right. I was thinking of what I was reading about the Steam Frame

▲

zer0zzz

1 hour ago

[-]

Once egpus work on Apple Silicon there will be little reason to own a pc

▲

bel8

1 hour ago

[-]

Mac GPU isn't the bottleneck for most games. Compatibility is.

▲

_blk

46 minutes ago

[-]

I assume your reasons are different to mine so for your reasons it might very well be true. But for my reasons definitely not as long as Apple Silicon can't run Linux somewhat decently natively - and even then, it's still an Apple..

▲

sharathdoes

1 hour ago

[-]

damn

▲

sharathdoes

1 hour ago

[-]

lol, is there a list of games tho, which mac pro's can support

▲

moralestapia

2 hours ago

[-]

Wow, phenomenal project and write-up, thanks for sharing it.

"no - not in any practical sense today, and "maybe" only in a very deep, borderline-impractical research sense."

This is why humans will always rule over crappy LLMs.

▲

falcor84

2 hours ago

[-]

Wait, why? This is exactly what I as a human would have said in this situation.

Or if you're referring to how the OP still decided to go ahead, I've seen AIs go ahead on impractical courses of action many times, and surprisingly succeed on some of them.

▲

moralestapia

2 hours ago

[-]

And I see that you succeeded in not doing it.

Congrats! Each one got what they wanted :).

▲

csours

2 hours ago

[-]

I believe that LLM (and ML in general) tools really shine when they are developed and used AS tools.

Unfortunately, I also believe that market forces may push away from this direction, as LLM companies try to capture the value stream

▲

rvz

2 hours ago

[-]

Exactly. AI psychosis is real.

Never let an AI tell you that you cannot do something practical for your own self for research, discovery or for fun.

The only thing that is close to impractical is expecting your non-technical friends or others to follow you without any incentive or benefit.

▲

nothinkjustai

2 hours ago

[-]

> As much as I hate to admit it, step one in most of my projects now is to ask AI about it. Maybe it’ll tell me something I don’t know.

It’s these people, not the ones who refuse to use LLMs, who are as they say, “cooked”.

▲

linkregister

1 hour ago

[-]

The author of the blog is not cooked; they're raw. Their inventive, multi-chain setup was tuff. Their PCI passthrough and qemu patches were straight fire. Unless you can point to something you've done this impressive, you're just an unc bro.