FilterHN

Efficient and Lossless Moe Diffusion LLM Inference with I/O-Aware Expert Offload

1 points

by imalomder

5 hours ago

| past

| 1 comment

| tide-paper.vercel.app

| HN

▲

imalomder

5 hours ago

[-]

Hi HN, this is my research project that allow people to locally deploy MoE Diffusion LLMs more efficiently. With this method, you can fit a 100B LLaDA2.0-flash model into a PC with a RTX5090 and run it faster than other methods.