Show HN: Nanbeige 4.1-3B running in the browser via WebGPU
2 points
1 hour ago
| 2 comments
| huggingface.co
| HN
maxxmini
26 minutes ago
[-]
Really cool to see a 3B model running fully client-side via WebGPU. The claim about beating Qwen3-32B on Arena-Hard at 10x smaller is interesting - do you know if those benchmarks hold up for more practical tasks like summarization or instruction following? Also curious about inference speed - what kind of tokens/sec are you seeing on a typical laptop GPU?
reply
victormustar
1 hour ago
[-]
This is a 3B parameter model from Nanbeige with surprisingly strong benchmarks. It beats Qwen3-32B on Arena-Hard and LiveCodeBench despite being 10x smaller. (Also be warned it thinks a lot).

I wrapped it in a simple browser demo using Transformers.js + WebGPU. It downloads the q4 ONNX weights (~1.7GB) and runs fully client-side. no server required. Falls back to WASM if WebGPU isn't available.

reply