The Deepseek provider may train on your prompts: https://openrouter.ai/deepseek/deepseek-v3.1-terminus
It's good that they made this improvement. But is there any advantages at this point using DeepSeek over Qwen?
that's basically choosing are random with extra steps!
I'm curious if my experience was unusual (it very much could be!) and I'd be interested to hear from anyone who's used both.
There's a GitHub bug about it that leads to more discussion here: https://github.com/deepseek-ai/DeepSeek-V3/issues/849
Good to see a fix and that it goes with some benchmark gains!
Also, given a partly Chinese prompt, Qwen will sometimes run its whole thinking trace in Chinese, which anecdotally seems to perform slightly worse for the same prompt versus an English thinking trace.
Twitter/X post link: https://twitter.com/deepseek_ai/status/1970117808035074215
Also Hugging Face model link: https://huggingface.co/deepseek-ai/DeepSeek-V3.1-Terminus
https://huggingface.co/mlx-community/Qwen3-Next-80B-A3B-Inst...
I usually use GPT-oss-120B with CPU MoE offloading. It writes at about 10tps, which is useful enough for the limited things I use it for. But I’m curious how Q3 Next will work (or whether I’ll be able to offload and run it with GPU acceleration at all.)
(4090)
1) We haven't managed to distill models enough to get good enough performance to fit in the typical gaming desktop (say, 7B-24b class models). Even then though - most consumers don't have high end desktops, so even a 3060 class GPU requirement would exclude a lot of people.
2) Nothing is stopping you/anyone from buying 24ish 5090s (a consumer hardware product) to get the required ~600GB-1TB of VRAM to run unquantized deepseek except time/money/know how. Sure, it's unreasonably expensive but it's not like labs are conspiring to prevent people from running these models, it's just expensive for everyone and the common person doesn't have the funding to get into it.
That really depends on what "good enough" means. Qwen3-30b runs absolutely fine at q4 on a 24GB card, although that's also stretching "typical gaming desktop". It's competent as a code completion or aider-type coding agent model in that scenario.
But really we need both. Yes it would be nice to have things targeted to our own particular niche, but there are only so many labs cranking these things out. Small models will only get better from here.