FilterHN
new
ask
show
jobs
submit
FilterHN
show menu
Efficient and Lossless Moe Diffusion LLM Inference with I/O-Aware Expert Offload
1 points
by
imalomder
5 hours ago
|
past
| 1 comment
|
tide-paper.vercel.app
|
HN
▲
imalomder
5 hours ago
[-]
Hi HN, this is my research project that allow people to locally deploy MoE Diffusion LLMs more efficiently. With this method, you can fit a 100B LLaDA2.0-flash model into a PC with a RTX5090 and run it faster than other methods.
reply