FilterHN

Rotary GPU: Exploring Local Execution for Large MoE Models Under Limited VRAM

26 points

by dryarzeg

6 hours ago

| past

| 2 comments

| arxiv.org

| HN

▲

martinald

2 hours ago

[-]

Why is this a paper? It's just using the n-cpu-moe option on llama.cpp? What am I missing here?

▲

Farmadupe

1 hour ago

[-]

It's amazingly vacuous isn't it? I think the most interesting read was the fact that they were surprised llama.cpp crashed when they used a bad set of commandline arguments.

Although in the section immediately above the observation they claimed that they ran 10 whole completions with 100% success rate. So who knows.

I have to admit I slightly miss the flood of AI-psychosis research papers that seemed to be popping up a couple of months ago. Good to know there's still one or two new ones floating around.

▲

LoganDark

1 hour ago

[-]

Apparently the author has a patent about it, too.

▲

sandworm101

2 hours ago

[-]

Um, doesn't the 4060 laptop card have the ability to share system memory?

Wait... My mistake. Google AI says the 4060 mobile can access system memory but tech sheets say no.