I shared a recording of audio I generated with that here: https://simonwillison.net/2026/Jan/22/qwen3-tts/
There are far more good and interesting use cases for this technology. Games will let users clone their voices and create virtual avatars and heroes. People will have access to creative tools that let them make movies and shows with their likeness. People that couldn't sing will make music.
Nothing was more scary than the invention of the nuclear weapon. And we're all still here.
Life will go on. And there will be incredible benefits that come out of this.
That said, I am likewise looking forward to the cool things to come out of this.
I was with you, until
But, yeah. Life will go on.
I'm a filmmaker. I've done it photons-on-glass production for fifteen years. Meisner trained, have performed every role from cast to crew. I'm elated that these tools are going to enable me to do more with a smaller budget. To have more autonomy and creative control.
Now, maybe the results were cherrypicked. i know everyone else who has released one of these cherrypicks which to publish. However, this is the first time i've considered it plausible to use AI TTS to remaster old radioplays and the like, where a section of audio is unintelligible but can be deduced from context, like a tape glitch where someone says "HEY [...]LAR!" and it's an episode of Yours Truly, Johnny Dollar...
I have dozens of hours of audio of like Bob Bailey and people of that era.
Although I like the model, I don't like the leadership of that company and how close it is, how divisive they're in terms of politics.
China would need an architectural breakthrough to leap American labs given the huge compute disparity.
1. Chinese researcher in China, to be more specific.
A financial jackknifing of the AI industry seems to be one very plausible outcome as these promises/expectations of the AI companies starts meeting reality.
What do you mean by this?
https://www.bloomberg.com/news/articles/2026-01-20/anthropic...
And that's the rub.
Many of us are not.
I prefer to have more open models. On the other hand China closes up their open models once they start to show a competitive edge.
Being critical of favorable actions towards a rival country shouldn't be divisive, and if it is, well, I don't think the problem is in the criticism.
Also the link doesn't mention open source? From a google search, he doesn't seem to care much for it.
Have you tested alternatives? I grabbed Open Code and a Minimax m2.1 subscription, even just the 10usd/mo one to test with.
Result? We designed a spec for a slight variation of a tool for which I wrote a spec with Claude - same problem (process supervisor tool), from scratch.
Honestly, it worked great, I have played a little further with generating code (this time golang), again, I am happy.
Beyond that, Glm4.7 should also be great.
See https://dev.to/kilocode/open-weight-models-are-getting-serio...
It is a recent case story of vibing a smaller tool with kilo code, comparing output from minimax m2.1 and Glm4.7
Honestly, just give it a whirl - no need to send money to companies/nations your disagree with with.
$20/month is a bit of an insane ask when the most valuable thing Anthropic makes is the free Claude Code CLI.
I spent 20 minutes yesterday trying to get GLM 4.7 to understand that a simple modal on a web page (vanilla JS and HTML!) wasn't displaying when a certain button was clicked. I hooked it up to Chrome MCP in Open Code as well.
It constantly told me that it fixed the problem. In frustration, I opened Claude Code and just typed "Why won't the button with ID 'edit' work???!"
It fixed the problem in one shot. This isn't even a hard problem (and I could have just fixed it myself but I guess sunk cost fallacy).
My experience is that all of the models seem to do a decent job of writing a whole application from scratch, up to a certain point of complexity. But as soon as you ask them for non-trivial modifications and bugfixes, they _usually_ go deep into rationalized rabbit holes into nowhere.
I burned through a lot of credits to try them all and Gemini tended to work the best for the things I was doing. But as always, YMMV.
That evening, for kicks, I brought the problem to GLM 4.7 Flash (Flash!) and it one-shot the right solution.
It's not apples to apples, because when it comes down to it LLMs are statistical token extruders, and it's a lot easier to extrude the likely tokens from an isolated query than from a whole workspace that's already been messed up somewhat by said LLM. That, and data is not the plural of anecdote. But still, I'm easily amused, and this amused me. (I haven't otherwise pushed GLM 4.7 much and I don't have a strong opinion about about it.)
But seriously, given the consistent pattern of knitting ever larger carpets to sweep errors under that Claude seems to exhibit over and over instead of identifying and addressing root causes, I'm curious what the codebases of people who use it a lot look like.
I use Opus 4.5 for planning, when I reach my usage limits fallback to GLM 4.7 only for implementing the plan, it still struggles, even though I configure GLM 4.7 as both smaller model and heavier model in claude code
Using speaker Ryan seems to be the most consistent, I tried speaker Eric and it sounded like someone putting on a fake exaggerated Chinese accent to mock speakers.
If it wasn't for the unpredictable level of emotions from each chunk, I'd say this is easily the highest quality TTS model I've tried.
Parakeet is pretty good, but there are times it struggles. Would be interesting to see how Qwen compares once Handy has it in.
And if you ask me, I think these models were trained on tween fiction podcasts. (My kids listen to a lot of these and dramatic over-acting seems to be the industry standard.)
Also, their middle-aged adult with an "American English" accent sounds like any American I've ever met. More like a bad Sean Connery impersonator.
100% I was thinking the same thing.
Edit: "Cross-lingual Voice Clone" https://qwen.ai/blog?id=qwen3tts-0115#voice-clone