You need 96gb or 128gb to do non trivial things. That is not yet 749 usd
However, I agree with the article that people will run big LLMs on their laptop N years down the line. Especially if hardware outgrows best-in-class LLM model requirements. If a phone could run a 512GB LLM model fast, you would want it.
Uber is economical, too; but folks prefer to own cars, sometimes multiple.
And how there's market for all kinds of vanity cars, fast sportscars, expensive supercars... I imagine PCs & Laptops will have such a market, too: In probably less than a decade, may be a £20k laptop running a 671b+ LLM locally will be the norm among pros.
One time I took an Uber to work because my car broke down and was in the shop and the Uber driver (somewhat pointedly) made a comment that I must be really rich to commute to work via Uber because Ubers are so expensive
> However, for the average laptop that’s over a year old, the number of useful AI models you can run locally on your PC is close to zero.
This straight up isn’t true.
Most laptops can run at best a 7-14b model, even if you buy one with a high spec graphics chip. These are not useful models unless you're writing spam.
Most desktops have a decent amount of system memory but that can't be used for running LLMs at a useful speed, especially since the stuff you could run in 32-64GB RAM would need lots of interaction and hand holding.
And that's for the easy part, inference. Training is much more expensive.
Though maybe it depends on what you're doing? (Although if you're doing something simple like embeddings, then you don't need the Apple hardware in the first place.)
Also, macOS only has around 10% desktop market share globally.
A Lenovo T15g with a 16gb 3080 mobile doesn’t do too badly and will run more than just Windows.
The article kinda sucks at explaining how NPUs aren’t really even needed, they just have potential to make things more efficient in the future rather than depending on the power consumption involved with running your GPU.
What's he talking about? It's trivial to calculate that.
[0]: https://www.edge-ai-vision.com/2024/05/2024-edge-ai-and-visi...
> hundreds of millions of parameters
lol
lmao, even
- Addition of more—and faster—memory.
- Consolidation of memory.
- Combination of chips on the same silicon.
All of these are also happening for non AI reasons. The move to SoC that really started with the M1 wasn't because of AI, but unified memory being the default is something we will see in 5 years. Unlike 3D TV.
probably not after scam altman bought up half the world's supply for his shit company
No it did not. There were numerous SoC that came before it and was inevitable in this space.
- People wanting more memory is not a novel feature. I am excited to find out how many people immediately want to disable the AI nonsense to free up memory for things they actually want to do.
- Same answer.
- I think the drive towards SOCs has been happening already. Apple's M-series utterly demolishes every PC chip apart from the absolute bleeding-edge available, includes dedicated memory and processors for ML tasks, and it's mature technology. Been there for years. To the extent PC makers are chasing this, I would say it's far more in response to that than anything to do with AI.
They might not use Apple silicon often. Other options are encouraging.
"SNAPDRAGON X PLUS PROCESSOR - Achieve more everyday with responsive performance for seamless multitasking with AI tools that enhance productivity and connectivity while providing long battery life"
I don't want this garbage on my laptop, especially when its running of its battery! Running AI on your laptop is like playing Starcraft Remastered on the Xbox or Factorio on your steamdeck. I hear you can play DOOM on a pregnancy test too. Sure, you can, but its just going to be a tedious inferior experiance.
Really, this is just a fine example of how overhyped AI is right now.
>I don't want this garbage on my laptop, especially when its running of its battery!
The one bit of good news is it's not going to impact your battery life because it doesn't do any on-device processing. It's just calling an LLM in the cloud.
"AI PC" branded devices get "Copilot+" and additional crap that comes with that due to the NPU. Despite desktops having GPUs with up to 50x more TOPs than the requirement, they don't get all that for some reason https://www.thurrott.com/mobile/copilot-pc/323616/microsoft-...
When is Wintel going to finally happen?
Microsoft has roughly $102 billion in cash (+ short-term investments). Intel’s market value is approximately $176 billion.
I've never really understood why Microsoft helped Intel's bottom line over decades.
With Azure, Microsoft has even more reason to buy Intel.
MS wants everyone to run Copilot on their shiny new data centre, so they can collect the data on the way.
Laptop manufacturers are making laptops that can run an LLM locally, but there's no point in that unless there's a local LLM to run (and Windows won't have that because Copilot). Are they going to be pre-installing Llama on new laptops?
Are we going to see a new power user / normal user split? Where power users buy laptops with LLMs installed, that can run them, and normal folks buy something that can call Copilot?
Any ideas?
For example, the LG gram I recently got came with just such an app named Chat, though the "ai button" on the keyboard (really just right alt or control, I forget which) defaults to copilot.
If there's any tension at all, it's just who gets to be the default app for the "ai button" on the keyboard that I assume almost nobody actually uses.
MS doesn't care where your data is, they're happy to go digging through your C drive to collect/mine whatever they want, assuming you can avoid all the dark patterns they use to push you to save everything on OneDrive anyway and they'll record all your interactions with any other AI using Recall
For a slightly more charitable perspective, agentic AI means that there is still a bunch of stuff happening on the local machine, it's just not the inference itself.
But unified memory IS truly what makes an AI ready PC. The Apple Silicon proves that. People are willing to pay the premium, and I suspect unified memory will still be around and bringing us benefits even if no one cares about LLMs in 5 years.
If you don't use it, it will have no impact on your device. And it's not sending your data to the cloud except for anything you paste into it.
Windows is going more and more into AI and embedding it into the core of the OS as much as it can. It’s not “an app”, even if that was true now it wouldn't be true for very long. The strategy is well communicated.
Linux hears your cry. You have a choice. Make it.
AAA Games with anti-cheat that don't support Linux.
Video editing (DaVinci Resolve exists but is a pain to get up and running on many distros, KDenLive/OpenShot don't really cut it for most)
Adobe Suite (Photoshop/Lightroom specifically, and Premiere for Video Editing) - would like to see Affinity support Linux but hasn't happened so far. GIMP and DarkTable aren't really substitutions unless you pour a lot of time into them.
Tried moving to Linux on my laptop this past month, made it a month before a reinstall of Windows 11. Had issues with WiFi chip (managed to fix but had to edit config files deep in the system, not ideal), Fedora with LUKS encryption after a kernel update the keyboard wouldn't work to input the encryption key, no Windows Hello-like support (face ID). Had the most success with EndeavourOS but running Arch is a chore for most.
It's getting there, best it's ever been, but there's still hurdles.
I really don't understand people that want to play games so badly that they are willing to install a literal rootkit on their devices. I can understand if you're a pro gamer but it feels stupid to do it otherwise.
But a lot of the time it's peer-pressure for wanting to play with friends who couldn't care less.
Gimp isn't a solution, sure but it works for what I need. Darktable does way more than I've ever wanted, so I can forgive it for the one time it crashed. Inkscape and blender both exceed my needs as well.
And Adobe is so user hostile, that I feel I need to call you a mean name to prove how I feel.... dummy!
Yes, I already feel bad, and I'm sorry. But trolling aside, listing applications that treat users like shit, aren't reasons to stay on the platform that also treats you like shit.
I get it, sometimes, being treated like shit is worth it because it's easier now that you're used to being disrespected. But an aversion to the effort it'd take for you to climb the learning curve of something different, isn't valid reason to help the disrespectful trash companies making the world worse, recruit more people for them to treat like trash.
Just because you use it, doesn't make it worth recommending.
I know Adobe are... c-words, but their software is industry standard for a reason.
We definitely play very different games, I wouldn't touch it if you paid me. So I'm sure we both have a bit of sample bias in our expected rates of linux compatibility. Especially since EA is another company like Adobe. Also, the internet seems to think they have a cheating problem. I wonder how bad it really is, and if it's worth the cost of the anti-cheat.
They're industry standard because they were first. Not necessarily because they were better. They do have a feature set that's near impossible to beat, not even I can pretend like they don't. I'm just saying, respect and fairness is more important to me, than content aware fill ever will be.
Also, doesn't the Adobe suite work on Linux?
Photoshop CC 2024 apparently works somewhat, but no GPU support and the removal tool doesn't work apparently.
https://appdb.winehq.org/objectManager.php?sClass=version&iI...
Basically, no.
This is a nice companion to the article: https://www.pcworld.com/article/2965927/the-great-npu-failur...
The thing is nowhere near the performance as a macbook, but its silent and the battery lasts ages, which is a far cry from the same laptop with an Intel CPU, which is what many are running.
Company removes a lot of the AI bloat though.
A great analogy because there is Starcraft for a console - Nintendo 64 - and it is quite awkward. Split-screen multiplayer included.
But yeah, fresh install of OS is a must for any new computer.
If they made the M series fully open for Linux (I know Asahi is working away) I probably would never buy another non-M series processor again.
Like Big Data, LLMs are useful in a small niche of areas, like poorly summarizing meeting notes, or grammar check at a middle-school level.
On LLMs for coding tasks: I asked a programmer why they loved Claude and he showed me the output. Twenty years ago, that kind of code would have gotten someone PIP'd. Today it's considered better than most junior programmers...which is a sign of how far programming standards have fallen, and explains why most programs and apps are such buggy pieces of sh$t these days.
32 GB ram will be for enthusiasts with deep pockets, and professionals. Anything over that, exclusively professionals.
The conspiracy theorist inside me is telling me that big AI companies like OpenAI would rather see that people are using their puny laptops as terminals / shells only, to reach sky-based models, than to let them have beefy laptops and local models.
In three years we will be swimming in more ram than we know what to do with.
The fact that nowadays there are little to no laptops with 4 ran slots is entirely artificial.
Lesson learned: you should always listen to that voice inside your head that say: “but i need it…” lol
With graphics processing, you need a lot of bandwidth to get stuff in and out of the graphics card for rendering on a high-resolution screen, lots of pixels, lots of refreshes, lots of bandwidth... With LLMs, a relatively small amount of text goes in and a relatively small amount of text comes out over a reasonably long amount of time. The amount of internal processing is huge relative to the size of input and output. I think NVIDIA and a few other companies already started going down that route.
But probably graphics cards will still be useful for stable diffusion; especially AI-generated videos as the inputs and output bandwidth is much higher.
First, GPGPU is powerful and flexible. You can make an "AI-specific accelerator", but it wouldn't be much simpler or much more power-efficient - while being a lot less flexible. And since you need to run traditional graphics and AI workloads both in consumer hardware? It makes sense to run both on the same hardware.
And bandwidth? GPUs are notorious for not being bandwidth starved. 4K@60FPS seems like a lot of data to push in or out, but it's nothing compared to how fast modern PCIe 5.0 x16 goes. AI accelerators are more of the same.
24 bit pixels gives 16 million possible colors... For tokens, it's probably enough to represent every word of the entire vocabulary of every major national language on earth combined.
> You have to shuffle your 800GB neural network in and out of memory
Do you really though? That seems more like a constraint imposed by graphics cards. A specialized AI chip would just keep the weights and all parameters in memory/hardware right where they are and update them in-situ. It seems a lot more efficient.
I think that it's because graphics cards have such high bandwidth that people decided to use this approach but it seems suboptimal.
But if we want to be optimal; then ideally, only the inputs and outputs would need to move in and out of the chip. This shuffling should be seen as an inefficiency; a tradeoff to get a certain kind of flexibility in the software stack... But you waste a huge amount of CPU cycles moving data between RAM, CPU cache and Graphics card memory.
Yes.
It stays in on the hbm but it need to get shuffled to the place where it can actually do the computation. It’s a lot like a normal cpu. The cpu can’t do anything with data in the system memory, it has to be loaded into a cpu register. For every token that is generated, a dense llm has to read every parameter in the model.
This is why they use high bandwidth memory for VRAM.
- The house is the disk
- You are the RAM
- The truck is the VRAM
There won't be a single time you can observe yourself carrying the weight of everything being moved out of the house because that's not what's happening. Instead you can observe yourself taking many tiny loads until everything is finally moved, at which point you yourself should not be loaded as a result of carrying things from the house anymore (but you may be loaded for whatever else you're doing).
Viewing active memory bandwidth can be more complicated than it'd seem to set up, so the easier way is to just view your VRAM usage as you load in the model freshly into the card. The "nvtop" utility can do this for most any GPU on Linux, as well as other stats you might care about as you watch LLMs run.
I feel like the reverse has been true since after the Pascal era.
Don't worry! Sam Altman is on it. Making sure there never is healthy competition that is.
https://www.mooreslawisdead.com/post/sam-altman-s-dirty-dram...
Margins. AI usage can pay a lot more. Even if they sell less than can still be more profitable.
In the past there wasn’t a high margin usage. Servers didn’t charge such a high premium.
Maybe for creative suggestions and editing it’d be ok.
> How many TOPS do you need to run state-of-the-art models with hundreds of millions of parameters? No one knows exactly.
Why not extrapolate from open-source AIs which are available? The most powerful open-source AI (which I know of) is Kimi K2 and >600gb. Running this at acceptable speed requires 600+gb GPU/NPU memory. Even $2000-3000 AI-focused PCs like the DGX spark or Strix Halo typically top out at 128gb. Frontier models will only run on something that costs many times a typical consumer PC, and only going to get worse with RAM pricing.
In 2010 the typical consumer PC had 2-4gb of RAM. Now the typical PC has 12-16gb. This suggests RAM size doubling perhaps every 5 years at best. If that's the case, we're 25-30 years away from the typical PC having enough RAM to run Kimi K2.
But the typical user will never need that much RAM for basic web browsing, etc. The typical computer RAM size is not going to keep growing indefinitely.
What about cheaper models? It may be possible to run a "good enough" model on consumer hardware eventually. But I suspect that for at least 10-15 years, typical consumers (HN readers may not be typical!) will prefer capability, cheapness, and especially reliability (not making mistakes) over being able to run the model locally. (Yes AI datacenters are being subsidized by investors; but they will remain cheaper, even if that ends, due to economies of scale.)
The economics dictate that AI PCs are going to remain a niche product, similar to gaming PCs. Useful AI capability is just too expensive to add to every PC by default. It's like saying flying is so important, everyone should own an airplane. For at least a decade, likely two, it's just not cost-effective.
10-15 years?!!!! What is the definition of good enough? Qwen3 8B or A30B are quite capable models which run on a lot of hardware even today. SOTA is not just getting bigger, it's also getting more intelligence and running it more efficiently. There have been massive gains in intelligence at the smaller model sizes. It is just highly task dependent. Arguably some of these models are "good enough" already, and the level of intelligence and instruction following is much better from even 1 year ago. Sure not Opus 4.5 level, but still much could be done without that level of intelligence.
> it is highly task dependent... much could be done without that level of intelligence
This is an enthusiast's glass-half-full perspective, but casual end users are gonna have a glass-half-empty perspective. Quen3-8B is impressive, but how many people use it as a daily driver? Most casual users will toss it as soon as it screws up once or twice.
The phrase you quoted in particular was imprecise (sorry) but my argument as a whole still stands. Replace "consumer hardware" with "typical PCs" - think $500 bestseller laptops from Walmart. AI PCs will remain niche luxury products, like gaming PCs. But gaming PCs benefit from being part of gaming culture and because cloud gaming adds input latency. Neither of these affects AI much.
Maybe 100% of computer users wouldn't have one, but maybe 10-20% of power users would, including programmers who want to keep their personal code out of the training set, and so on.
I would not be surprised though if some consumer application made it desirable for each individual, or each family, to have local AI compute.
It's interesting to note that everyone owns their own computer, even though a personal computer sits idle half the day, and many personal computers hardly ever run at 80% of their CPU capacity. So the inefficiency of owning a personal AI server may not be as much of a barrier as it would seem.
Isn't that the Mac Studio already? Ok, it seems to max at 512 GB.
Part of the reason that RAM isn't growing faster is that there's no need for that much RAM at the moment. Technically you can put multiple TB of RAM in your machine, but no-one does that because it's a complete waste of money [0]. Unless you're working in a specialist field 16Gb of RAM is enough, and adding more doesn't make anything noticeably faster.
But given a decent use-case, like running an LLM locally, and you'd find demand for lots more RAM, and that would drive supply, and new technology developments, and in ten years it'll be normal to have 128TB of RAM in a baseline laptop.
Of course, that does require that there is a decent use-case for running an LLM locally, and your point that that is not necessarily true is well-made. I guess we'll find out.
[0] apart from a friend of mine working on crypto who had a desktop Linux box with 4TB of RAM in it.
> How many TOPS do you need to run state-of-the-art models with hundreds of millions of parameters? No one knows exactly. It’s not possible to run these models on today’s consumer hardware, so real-world tests just can’t be done.
We know exactly the performance needed for a given responsiveness. TOPS is just a measurement independent from the type of hardware it runs on..
The less TOPS the slower the model runs so the user experience suffers. Memory bandwidth and latency plays a huge role too. And context, increase context and the LLM becomes much slower.
We don't need to wait for consumer hardware until we know much much is needed. We can calculate that for given situations.
It also pretends small models are not useful at all.
I think the massive cloud investments will put pressure away from local AI unfortunately. That trend makes local memory expensive and all those cloud billions have to be made back so all the vendors are pushing for their cloud subscriptions. I'm sure some functions will be local but the brunt of it will be cloud, sadly.
A basic last-generation PC with something like a 3060ti (12GB) is more than enough to get started. My current rig pulls less than 500w with two cards (3060+5060). And, given the current temperature outside, the rig helps heat my home. So I am not contributing to global warming, water consumption, or any other datacenter-related environmental evil.
lol