FilterHN

Ask HN: Will local models on normal hardware ever compete?

1 points

2 hours ago

| 2 comments

I have a Macbook Air M3 with 24gb RAM. The other day, I wanted to try running an LLM locally for the first time ever. I ran gemma-4-e4b and threw some chats at it.

It reminded me of my very first experiences with ChatGPT a bit. Clearly less capable than something like Opus 4.6, but I made me excited about the possibilities.

I know that fairly capable models can be run by mere mortals who have a fancy GPU.

My real question is, will some combination of hardware and software optimizations get us anywhere close to "state of the art" models running on truly basic hardware?

With all the ridiculous capex being spent on datacenters etc, what if something akin to Moore's Law, or other algorithmic breakthroughs, will get us super capable LLMs that can run on the average machine?

▲

anuramat

1 hour ago

[-]

I'm pretty sure an average machine will always be less capable than a datacenter, the rest depends on your definition of super capable

▲

rvz

2 hours ago

[-]

> With all the ridiculous capex being spent on datacenters etc, what if something akin to Moore's Law, or other algorithmic breakthroughs, will get us super capable LLMs that can run on the average machine?

It is more a software problem and the next breakthrough will come from clever algorithms.

You have just seen TurboQuant create promising efficiency gains and there many other papers being released that propose more optimisations from software that make it possible to run 100B+ LLMs on device.

▲

bigyabai

1 hour ago

[-]

> It is more a software problem and the next breakthrough will come from clever algorithms.

I don't know if I can agree. The hardware side is extremely sub-optimal on raster-focused GPU architectures like Apple Silicon. If I had to bet, the hardware will improve a lot more than the software will over the next 10 years as more vendors adopt GPGPU characteristics.

> You have just seen TurboQuant create promising efficiency gains

TurboQuant looks like a vibe-laundered implementation of EDEN quantization: https://openreview.net/forum?id=tO3ASKZlok