Rotary GPU: Exploring Local Execution for Large MoE Models Under Limited VRAM
26 points
6 hours ago
| 2 comments
| arxiv.org
| HN
martinald
2 hours ago
[-]
Why is this a paper? It's just using the n-cpu-moe option on llama.cpp? What am I missing here?
reply
Farmadupe
1 hour ago
[-]
It's amazingly vacuous isn't it? I think the most interesting read was the fact that they were surprised llama.cpp crashed when they used a bad set of commandline arguments.

Although in the section immediately above the observation they claimed that they ran 10 whole completions with 100% success rate. So who knows.

I have to admit I slightly miss the flood of AI-psychosis research papers that seemed to be popping up a couple of months ago. Good to know there's still one or two new ones floating around.

reply
LoganDark
1 hour ago
[-]
Apparently the author has a patent about it, too.
reply
sandworm101
2 hours ago
[-]
Um, doesn't the 4060 laptop card have the ability to share system memory?

Wait... My mistake. Google AI says the 4060 mobile can access system memory but tech sheets say no.

reply