FilterHN

xnx

8 months ago

[-]

Zonos is a new open weights text-to-speech model that has quality at least as good as ElevenLabs: https://www.zyphra.com/post/beta-release-of-zonos-v0-1

https://news.ycombinator.com/item?id=42973769

waynenilsen

8 months ago

[-]

TTS is increasingly being commoditized.

Kokoro was posted and it works on webgpu, absolutely incredible quality for where it can run

knowaveragejoe

8 months ago

[-]

Kokoro hasn't released their encoder, but they are already moving on to a newer model. Hopefully they release that!

pg5

8 months ago

[-]

When I type anything in their demo, it replies "I'm sorry but I can't I'm sorry but I can't..."

nialv7

8 months ago

[-]

Hey, they are using Mamba! Happy to see Mamba is used at least somewhere :/

https://claudio.uk/posts/audiblez-v4.html

8 months ago

[-]

You can get pretty close with open source software:

8 months ago

[-]

How does it hold up on long stuff? I use Elevenlabs Studio daily and once things start to get into the chapters long, the voice can really start to go off the rails. It'd say they've solved a lot of this over the past 2/3 months, but it does still happen on long stuff.

8 months ago

[-]

It holds up well, because Audiblez uses sentence splitting (via Spacy models) before audio synthesis

masteruvpuppetz

8 months ago

[-]

>> the voice can really start to go off the rails. Do you mean the AI gets tired?

zaptrem

8 months ago

[-]

In autoregressive models error accumulates over time. He likely means the voice starts to make odd sounds/gets lower quality. It would be really interesting if OP could share a clip of this phenomenon!

8 months ago

[-]

Various different things can happen, it would take me quite some time to dig up examples but at least with elevenlabs you don't get the clicks and pops you get like on notebook LM for example. 11labs instability comes in the forms of intonation, pitch, accent, garbled words or even once language. I've only seen it happen in the 3k+ words gen's I've done, usually actually around the 75% point of the narration of whatever I've converted, and on average lasting a couple of seconds top.

wrsh07

8 months ago

[-]

Yeah - I've experienced this with eleven reader (I don't think you can gen text this long anymore using the reader app, lol) but switching voices fixed it for me

I can go back and try to repro and get a recording....

rapind

8 months ago

[-]

Oh wow. Thanks for posting! Samples sound great (on par with eleven by my untrained ear). Will definitely use this.

laurentlb

8 months ago

[-]

Interesting! This uses the Kokoro-82M model, which has a pretty good quality, but the set of languages is still quite limited.

ultrasounder

8 months ago

[-]

Bravo!

simongray

8 months ago

[-]

Oh no, it doesn't run on Apple Silicon. That's too bad.

eamag

8 months ago

[-]

I wrote about a similar model for MLX that can run be on apple silicon https://eamag.me/2025/Voice-Cloning

8 months ago

[-]

Hi eamag, this sounds great! I'm gonna try add it to Audiblez

8 months ago

[-]

It works on Apple Silicon, but it doesn't use the GPU. Because Kokoro has not been implemented yet in MLX

simongray

8 months ago

[-]

Ah my bad! I just read the "We don't currently support Apple Silicon" on the official website, but I didn't realise that only pertains to GPU support.

mhuffman

8 months ago

[-]

>Oh no, it doesn't run on Apple Silicon. That's too bad.

Interesting, because the hero image is a Mac App screenshot.

_joel

8 months ago

[-]

> On my M2 MacBook Pro, on CPU, it takes about 1 hour, at a rate of about 60 characters per second.

Umm, it does.

simongray

8 months ago

[-]

My bad. I misread the official website:

> We don't currently support Apple Silicon, as there is not yet a Kokoro implementation in MLX. As soon as it will be available, we will support it.

I thought that meant that it didn't support Apple Silicon in general, but they were just talking about GPU support.

fl0id

8 months ago

[-]

though they wouldn't need to use MLX, could also use pytorch etc

stoobs

8 months ago

[-]

I think there's an issue somewhere in Kokoro though which means it doesn't actually take advantage of MPS, I did get a modified version up and running, but it was no faster than CPU, even though it passed all the internal tests using mps.

I might try using F5-TTS-MLX instead actually (https://github.com/lucasnewman/f5-tts-mlx) and see how that does.

tonyhart7

8 months ago

[-]

good, now how I can use this on mobile??

8 months ago

[-]

Generate the audiobook on a laptop and then listen to it on mobile

tonyhart7

8 months ago

[-]

this is the easy way, but I want the hard way

anonymous344

8 months ago

[-]

does this run on linux machine also?

nkmnz

8 months ago

[-]

third line on the page right below the first image says: > Audiblez 4.2 running on MacOSX via wxWidgets. Linux and Windows are supported too

emptysongglass

8 months ago

[-]

I would never trust the company that acquired Omnivore only to sunset it with 2 weeks notice to retrieve data.

Companies won't stop pulling this garbage unless we stop supporting them.

podgietaru

8 months ago

[-]

I want to say, a lot of effort has been made recently to allow you to Self-Host Omnivore. I have done a lot to move it over so that all the features are self-hostable, including rewriting the entire PDF stack. I received a lot of support from the devs doing this too.

I know the decisions of the Dev team were disappointing, but it's also worth pointing out that the site was kept up until around last month - despite the warning stating that'd be down in November.

Omnivore could have shut down their code base, and prevented self-hosting entirely. I'm glad they didn't.

letmeinhere

8 months ago

[-]

What's the contribution model moving forward? I see the repository is still active, but is it not still under the Eleven's control? How will it evolve when they stop accepting pull requests?

podgietaru

8 months ago

[-]

It won’t be under Elevens control, part of the deal I believe. They’re allowed to remain opensource. Not folded into ElevenLabs.

As for contribution model, it’s still something I’m trying to figure out. For the moment, it was just trying to get a self host build ready and working.

But I have admin rights to the repo, and am not working for ElevenLabs, nor officially Omnivore. I was just a contributor before.

letmeinhere

8 months ago

[-]

That's good to know, thanks.

echelon

8 months ago

[-]

You can fight back by supporting and advocating for open source foundation text to speech models. XTTS, GptSoVits, Tortoise, Zonos, etc.

Open source models drive proprietary foundation models' margin to zero.

The only reason elevenlabs became a unicorn was their margin. If they became a commodity, they'd find themselves in a deep pit.

qnleigh

8 months ago

[-]

Sounds good. Do any of these have iOS or Android apps?

james-bcn

8 months ago

[-]

OMG I didn't realize that had happened. That sucks. Omnivore was great. But now I'm really glad I didn't make it part of my processes.

agnishom

8 months ago

[-]

This is my main gripe with this company

zeroq

8 months ago

[-]

wait till you find out about Scam Altam and his company x)

8 months ago

[-]

I wish there was a reader app that was serious about text speech. This is not it, unfortunately. Reader apps need to focus on a text to speech experience that is identical to a music player so that you can use the app while in hands free situations. The app is also hard to use as a “read it later” tool on iOS.

I was really hoping they would fix these issues by now because it was promising. This app truly does feel like a portfolio demo app for a text to speech engine company rather than an actual reader app.

UPDATE: yes, I have actually used the app, no it does not work well. See replies for details.

billbrown

8 months ago

[-]

I find Readwise Reader to be a great RIL tool and I've used their TTS on my phone. I can't say I use it enough to know if it addresses your needs so I share this as "this might work for you." https://docs.readwise.io/reader/docs/faqs/text-to-speech

jhiggins777

8 months ago

[-]

Have you used it? I use it for both hands free and read later. When I'm on a webpage I just use the safari share sheet to send it to ElevenLabs Reader and then just listen whenever I have time.

8 months ago

[-]

Let’s say I have article 20 article articles of two minutes length each. On the iOS app, there are no next buttons and it does not automatically play the next article. If I am on a long drive, or I am running for two hours with my phone in my bag, I would need to reach into my bag and open the app every time and click the next article. If I I don’t like the article I am listening to, there is no way to skip to the next article using integrated controls on a Bluetooth device. These features already exist on apps like Pocket.

dyauspitr

8 months ago

[-]

I don’t know- I used the app last night as an audiobook reader before going to bed and it had automatic chapter detection, a sleep timer and you could even click on a word and it would start reading from there. It’s pretty solid.

Slippery_John

8 months ago

[-]

Speechify is pretty good. You gotta pay to get the most out of it, but I use it enough to justify it. (Mostly for an egregiously long serial novel.) Sometimes there's jank, but the support and dev teams are super responsive.

8 months ago

[-]

It’s interesting that they show people going on runs and driving cars in their demo videos. I’m pretty sure nobody developing that app has actually gone on a run or driven a car while using their app.

wrsh07

8 months ago

[-]

Wow really? I use it all the time for ~equivalent activities

8 months ago

[-]

How long are the articles you are reading? I’m reading blog articles rather than long form content. My queue is in the hundreds and the articles very in length from two minutes to 20 minutes. I found it really annoying to need to push buttons while driving to skip or auto play the next article.

wrsh07

8 months ago

[-]

Yeah, mostly super long form stuff. If it's only 2 minutes it's faster for me to just read it than to open it in their reader

Fwiw, I would use their app way more if it were better. Right now I use it for 1-2 long form articles at a time, I am sometimes willing to push buttons in order to stay focused but will bail out to eg my podcasts app if that becomes untenable

8 months ago

[-]

For me, it's less about the individual pieces length and more about the length of my total queue. An example might be reading the top 10 HN articles I saved for later while doing laundry or something like that. The current UI makes no accommodation for this use case, while other apps, like Pocket, do.

culi

8 months ago

[-]

I only heard of Eleven today. Downloaded and tried it and I was actually shocked by how well it works. It works perfectly with my headphones and I can skip forwards or backwards as I want. I can change the speed of the voice (tho, that does get a little buggy). I just put in a random Aeon article and was shocked how quickly it did everything. Even giving me an audio length

[1]: https://news.ycombinator.com/item?id=43004589

woadwarrior01

8 months ago

[-]

Hasn't this been around for ~4 months? Interesting to see this here, since their competitor Zyphra, just released two Apache 2.0 licensed open weights TTS models yesterday[1].

https://blog.nawaz.org/posts/2024/Apr/reading-articles-via-p...

BeetleB

8 months ago

[-]

If you want free/ultracheap, the Google Cloud TTS is good enough for simple use cases. You get enough free minutes that it may end up being free (I think I've paid a cent so far).

Some of their voices sound very artificial, some very real. I've been slowly making a list of the good ones.

I use it to convert long articles into audio, and have a script to add it to my podcast feed to listen to while driving:

kvn8888

8 months ago

[-]

Chirp (HD) gives you $30 per 1M characters for free on the free tier also

BeetleB

8 months ago

[-]

I'd have to analyze my usage. For me, having used it for over a year cost me a penny. If I can ensure my total cost is less than $1/month, I'll consider it if the quality is really good. The Google one is "good enough", but not great.

One other feature I'd really like: Having the AI figure out who is saying what and use different voices (e.g. one voice for overall narrator, and separate voices for each person who is quoted in the article).

Not sure if any of the solutions out there do that automatically without my guidance.

(Still probably wouldn't pay more than $2/mo for it - I just don't use it often enough to justify paying much).

wombatpm

8 months ago

[-]

You start doing that for text from ebooks and Audible is going to want to have words with you.

BeetleB

8 months ago

[-]

I do it only for long articles. Not interested in converting fiction into audio books unless the quality rivals that of real storytellers.

And, you know, this is not a service I'd provide others. Just for my own use running from my PC. Audible won't know or care, just as no one cares if you borrow a book from the library and photocopy it for your own use.

wombatpm

8 months ago

[-]

Kindle originally had text to speech functionality. The Audibook people sued and Amazon went in to buy Audible

kvn8888

8 months ago

[-]

The audio quality is amazing. It's transformer based. I use it occasionally

[1]: https://hiandrewquinn.github.io/selkouutiset-archive/

hiAndrewQuinn

8 months ago

[-]

This is excellent. I just tested the Finnish voices on my simple news archive [1], and the pronunciation was quite good and clear.

It's unfortunate that I can't export audio clips locally; otherwise I would immediately look into using this for generating my Finnish flashcard decks from the same material [2]. I've thought about doing the same with the audio and video feeds included with this news broadcast, but getting Whisper to sync up properly with what's written down and cutting up the raw audio in that way still seems like more effort than I'm willing to invest right now.

[2]: https://github.com/Selkouutiset-Archive/selkokortti

gwd

8 months ago

[-]

> It's unfortunate that I can't export audio clips locally; otherwise I would immediately look into using this for generating my Finnish flashcard decks from the same material [2].

elevenlabs has an API which seemed quite reasonable when I looked into it. A bit of python should get you what you want pretty quickly.

hiAndrewQuinn

8 months ago

[-]

Oh! I'll look into that, thanks.

w3p706

8 months ago

[-]

I did exactly this for my finnish anki flashcards. you can see the implementation here: https://github.com/w3p706/anki-gen-fin/blob/main/ankigenfin/...

If you are looking to convert very short texts or words into speach, I had best result with eleven_multilingual_v2 with the following text for tts "Hän sanoo rauhallisesti ja hitaasti: <break time=\"1.0s\" /> '${text}'" An then i use a postprocessing to split at the silence.

This was nessesary as you cannot set the language explicitly and it is detected from the input.

With eleven_turbo_v2_5 you can set the language, but the results are not as good.

hiAndrewQuinn

8 months ago

[-]

This is a cool repo. Interesting approach using uralicNLP for morphology, that's not one I've seen before. This repo's README.md is excellent and thorough too - I'll probably come back to this in March and give it a spin for myself, just to see what you're up to in a little more detail.

Kabukks

8 months ago

[-]

Last time I tried Elevenlabs for German text, it got a lot of numbers and dates wrong.

E. g. saying "1963" when the actual year in the text was 1967. Yeah, the voices sound very realistic. But I'm not sure how useful that is if you can't trust the spoken words.

Does anyone know if it got better in the last weeks?

aeroniero

8 months ago

[-]

Yes, it's better now, at least on the Reader app that I've tried.

bjackman

8 months ago

[-]

Really glad these products are appearing!

So much of my time for "reading" is in a context where I can't physically read, so audiobooks are incredibly useful. But being limited to the set of books that gets recorded by the publisher is a real shame.

Haven't tried it yet but AI TTV seems basically perfect now so I'm very optimistic this will work great.

VierScar

8 months ago

[-]

I'm interested for this reason too, even listened to AI TTS books before, but the issue is that they are very monotonous. The tone almost never changes, nor the pacing, it's all delivered with almost no variation which makes listening dull and easy to lose focus

rapind

8 months ago

[-]

I recommend John Doe if using eleven labs. Maybe too much variation, but I like it.

wedn3sday

8 months ago

[-]

I immediately copy/pasted in some smut to check if it was going to lecture me on my moral failings and was pleasantly shocked to find a corporate AI model that did what I asked without pushing puritanical nonsense one me.

kstrauser

8 months ago

[-]

The company won’t let you say “arse” though: https://news.ycombinator.com/item?id=43048345

barrell

8 months ago

[-]

Been using eleven labs for several years now. I was really impressed with their multilingual model a few years ago.

Since then, they’ve released a few cheaper models, but the quality suffers greatly (they still have the old models though so it’s not an issue). They’ve also been releasing a ton of different products around TTS.

I don’t mean this as a criticism — I just am curious why SOTA TTS has not improved from one model by one company several years ago, and why even said company isn’t able to improve on that model.

8 months ago

[-]

The biggest challenge with TTS is high quality voice data. The architectures of closed providers still mostly trace their roots to stuff like Tortoise with a few exceptions.

Which is why it's especially ridiculous ElevenLabs allows professionals to upload their voices, charges users of those voices a minimum of $50 per million characters, likely pays under $1 for the compute... and then passes on a whopping $2 back to the professional.

I think the next disruptive TTS competitor is going to form out of just offering to pay better rates than ElevenLabs to their PVC users.

Finetuning established architectures on cleaner synthetic data is already getting open source models increasingly competitive, so getting top PVC samples from the source would likely put you right about where they are today.

limo11

8 months ago

[-]

Rev share is up to 20% on default rates (depending on notice period). With custom rates they can make their voice more expensive and earn up to $0.2 for every 1000 characters. So you can do the math.

8 months ago

[-]

The math is you're paying a pittance considering the insane margins involved and the fact you're using their voices in a flywheel that's actively obsoleting them.

Edit: And since you're concerned we might not be aware of Elevenlabs' generous terms... why is your documentation so cagey about them? https://elevenlabs.io/docs/product-guides/voices/payouts#thi...

I see users need to keep paying you a subscription fee in order to even get their payouts... but "up to 20%" isn't saying particularly much without the kind of details that should probably be on that page.

Considering how much your company owes to an open source model, it's also impressive how little you've returned to the commons.

But no worries, the top comment under this post is an open source model that was finetuned for a couple of thousand dollars by a single dude soliciting the public for random voice samples.

If Google has no moat, you're out to sea.

8 months ago

[-]

Why would you pay more than necessary to attract the voice talent you need? There aren’t (m)any businesses that pay multiples of market rates just to be nice.

8 months ago

[-]

There are plenty of businesses that assign integrity a non-zero value, because most businesses reflect people.

Maybe you're in a bubble devoid of that kind of thinking, so it seems very foreign or quaint.

Even then it's short-sighted thinking at best: the "market rate" is not some magic self-optimizing number.

Underpaying their creators is just creating the opportunity for someone to take the best of them on better terms.

Elevenlabs is also able to raise trivially in this environment: you'd think while they're still floating out here without a moat other than high quality data, they'd overpay if anything and make narrators feel like royalty until they're replaced.

This isn't unlike Uber initially paying drivers massive bonuses and undercharging riders until they were able to leverage their massive network to increase prices past what the taxi providers they had decimated were charging. But in this case the marginal cost of providing the service is so low they don't even have to lose money to run a similar play, just take less of it. (in other words, even ruthless greed is not antithetical to paying these folks better)

8 months ago

[-]

I don’t disagree that overpaying can be a good strategy, but I question whether it has any bearing on integrity.

I hire someone to paint a fence. We agree on $200 for the job and I pay them $200. We both know that it’s undifferentiated work and I could find a dozen other people who would do it for $200.

Where’s the lack of integrity? Or does it just appear if I know that I actually could afford to pay them $10,000, but chose not to?

8 months ago

[-]

We're in a market where people paint fences for $200.

You're building fence painting robot and need someone to teach it how to paint fences by example.

You decide you won't pay the fence painters to teach the robot upfront.

Instead, painters will pay you $20 to even visit the factory.

Then, if a particular painter's fence painting is especially highly rated, you'll pay them a small royalty.

So you send your fence painting robot to compete with fence painters for $20 a fence, passing on a tiny slice of the $20 to the ones who helped teach it.

We can consider the creation of fence robot and the competition with the existing market just another piece of the steady march of progress, but there's still obvious room to act with more integrity in this situation.

There was no established rate for what fair wages to teach the robot to paint were, and you can't pay $200 because you don't charge $200... but it's also probably not "-$22/month + 1% royalties".

sky2224

8 months ago

[-]

The video shows scenarios of people listening to pdfs of pretty dense material (e.g., computer science, bio mechanics).

Does anyone here actually have positive results doing this? It seems to me listening to anything that's even remotely complex with the intent of learning it just isn't something that's feasible.

woodson

8 months ago

[-]

I used to have papers read to me via TTS when I had a long commute. This was before the current crop of neural TTS, mind you, so the quality and naturalness wasn’t as good, but it was good enough to tolerate and to get the gist of a paper. It failed terribly on equations, of course, but that’s often not too important on the first reading.

8 months ago

[-]

Severe dyslexia here, but ask me about any conversation or audio book or class I've listened to. Gimme anything audio and gimme it at 1.5x plz! I spend so much money gen'ing audio these days but it's soooo nice to be able to learn so quickly now.

sky2224

8 months ago

[-]

Oh that's totally fair, and I'm happy to hear that audio books have been such a huge help for you!

But just to clarify, do you listen to dense material on commutes or while on walks or are you listening and taking notes at a table or something?

That's my main point. As someone that doesn't have dyslexia, sitting at a table and taking notes while reading dense material is already quite difficult to do when I'm trying to actually learn the material. I couldn't imagine being effective in my learning by just listening while on an early morning commute or something like the promotional video shows.

8 months ago

[-]

to your point, if it needs to sink in I'm sitting in silence in my thinking room looking at my white wall or eyes closed, listening, pausing, thinking, listening, rewinding. notes etc are pretty useless for me, this probably sounds insane to you, but my memory can't process written words that I'm aware of (reading is just me saying things back to myself in a inaudible whisper) - It's slightly uncomfortable, but can listen to multiple things at the same time and think about them separately (the news and an audio book), not as good at that was I was in my 20s and 30s, but still there. I find it impossible to imagine how people who can read and can take notes etc think, I have no clue, so I presume my thinking style might be hard for someone like that to grasp also?

qnleigh

8 months ago

[-]

It depends a lot on the paper. I've been using a TTS app to read papers for years. Papers that are really equation dense, convey they key ideas in figures or get too detailed aren't listenable. But sometimes review articles or papers with one clear message hit that sweet spot and are very listenable. There's one topic where everything I know about it I learned by listening to a review article on a long run. It was actually quite pleasant!

nice__two

8 months ago

[-]

That’s my biggest gripe with audiobooks: good for fiction, not so good for learning.

yreg

8 months ago

[-]

For me they are actually best for non-fiction, but it has to be books. Papers are too information dense.

I get easily distracted and lose attention while listening to an audiobook. This is usually problematic with fiction, because suddenly I don't know who this new character is or what's happening. And rewinding to the precise position where I stopped paying attention is of course much more difficult than in written text.

I found that non-fiction books work great for me, because even if you ignore a page or two it makes no difference, the author keeps repeating their point and propping it up with many arguments anyway.

theothertimcook

8 months ago

[-]

This is so impressive.

No audiobook exists, drop epub into ElevenReader and have Bert Reynolds read it to you, honestly better than some human narrators.

benrutter

8 months ago

[-]

I've been looking for a good and convenient way to read papers that are published in PDF for a while.

Ideally, I'd be able to strip out the text content and send it to my kindle in readable form. Since apparently that's science fiction, this looks like a really good plan B! Will definitely give it a go.

[1] https://kindlemodding.org/jailbreaking/WinterBreak/

elashri

8 months ago

[-]

You can jailbreak you kindle [1] and install KOReader[2] and this will allow you to do this science fiction.

[2] https://koreader.rocks/

billbrown

8 months ago

[-]

Readwise Reader does PDFs very well (and apparently can do TTS on them, but I've never tried that). https://docs.readwise.io/reader/docs/faqs/text-to-speech

janpmz

8 months ago

[-]

You can try https://www.pdftomp3.com/ as well.

darkwater

8 months ago

[-]

I know I'm growing old but this is the kind of tech application that I don't like. Arts should be the last thing to be 100% fully done by a program. Enhancing capabilities in artists? Hell yeah. Replacing completely voice actors? No, thanks.

ramonverse

8 months ago

[-]

AI voice is literally the only way I have to "read" an obscure article during 1h non-static commutes.

darkwater

8 months ago

[-]

I understand, It can do things that weren't previously possible, but it will also replace things that were done by humans, by artists before. Overall, in my opinion, is still a loss.

reustle

8 months ago

[-]

I think calling this art is a stretch, as they usually aren’t the author.

By automating it, it lowers the barrier to access this type of audio content for the masses. If you want to choose to pay someone you read something for you, the market allows that. This feels like a net gain.

haswell

8 months ago

[-]

> I think calling this art is a stretch, as they usually aren’t the author.

I can't even remotely agree.

Narrating a book is absolutely an art. Listen to a book narrated by Stephen Fry, and all other books will sound awful. Considerable care and craft goes into a well-read book.

But this is why I'm actually excited about good TTS tools. Not because I want to displace Stephen Fry, but because there are so many books read by awful narrators and something like ElevenReader would be a huge step up in quality.

I share the parent commenter's concerns about the displacement of artists, but I'm less convinced that TTS tools are a net negative.

darkwater

8 months ago

[-]

If the AI content is good enough, nobody will use it, or at least not in the numbers that Audible et similia had before. It will just be a tiny minority following their principles.

We lived this already with social networks. Initially us tech enthusiasts were all like "it will democratize access to news, it democratize producing the news! curated work will still be there, it's a net gain". And we all saw how it actually developed. As someone on the Internet said, I want AI to do my laundry and repeating task so I can do art or other more interesting things, I don't want AI to do arts and force me to do laundry by hand because due to AI taking my job now I don't have money to pay for a washing machine.

noizejoy

8 months ago

[-]

> I think calling this art is a stretch, as they usually aren’t the author.

So I guess in your worldview a concert violinist also doesn’t make art, when they are playing a Mozart composition?

nathanyukai

8 months ago

[-]

"replace things that were done by humans" isn't a loss by itself, if it frees up human labour to do other things. If human replaced by AI can't find better things to do, such that it makes them poorer, or anti-social its a loss but not necessarily AI's fault.

Martinussen

8 months ago

[-]

Doesn't apply to all situations, but "replace things that were done by humans" in arts can absolutely be a loss by itself. Making graphics/speech/video a commodity doesn't replace designers, voice actors, or directors, but we've definitely see it can directly harm them and the people that enjoy their work.

> can't find better things to do, such that it makes them poorer, or anti-social its a loss

I feel like this misses the point a bit - lost income/sustainability for artists is obviously a big issue we'll be facing, but looking for a performance indicator in an artistic endeavour doesn't really get you anywhere. There's more ways to value a painting than "what the market would pay" and "potential heat output as firewood", right?

8 months ago

[-]

How do you feel about what word processors did to the typist career?

add-sub-mul-div

8 months ago

[-]

How do you feel about replacing general labor, period, and doing so for a class that no longer maintains a semblance of a social safety net? Do you think there's a difference between displacing one profession and displacing most professions at once?

Do you people ever step out of the abstract and think about the actual context you're living in?

lern_too_spel

8 months ago

[-]

I will gladly pay taxes directed for retraining artists, but I will not pay to listen to Wil Wheaton narrate another book badly when my computer can do it better.

8 months ago

[-]

I mentioned typists, you abstracted it to “most professions at once”, and you give me a hard time for being too abstract?

I agree with your criticism, just not sure you understand who you were criticizing. But I hope you can think about actual context and see if that tempers what seems like a pretty emotional take on AI.

Kerbiter

8 months ago

[-]

Would've been valid if TTS was, indeed, art, but it's not. Audiobooks won't be able to replace TTS in e-readers just because they need to be produced first. And I don't think my mom would be able to find an audiobook of all the Russian books, or, especially, articles she's reading, and especially synchronise it with the actual book in her reader app.

vunderba

8 months ago

[-]

Of all the criticisms leveled against GenAI, I'd say making the case against "TTS on-demand" would probably be the weakest.

Having natural sounding TTS enhances accessibility for blind users, enables language localizations, etc. It's 100% a win even though there will be (and already is) disruption in the VA community.

msh

8 months ago

[-]

I feel conflicted about this. I somewhat agreeing with you, but the other hand not needing voice actors is a big help to people with disabilities that prevent them from reading.

randysalami

8 months ago

[-]

I’ve actually used this extensively for months now since it’s free and works with PDFs I’ve downloaded off the internet. I was so frustrated with ridiculously overpriced TTS (must pay for annual sub! no monthly) when I found this gem.

My main use case is comp. sci and philosophy books. I download PDFs of varying quality off the internet onto my phone and import them into this app. The text translation is always solid but for the former, graphs and diagrams really break it. It’s a tricky problem because these often are important to the text so skipping them (for the app) isn’t ideal but the current solution just makes the reader goof up. I think it would be cool if the model could identify these objects and maybe generate some text describing the object and TTSing that. Minor gripe and for the latter, it’s perfect.

I’ve probably used this app for 70 reading hours at 1.5x speed across long road trips and walking my dog at the park. I’ve gotten through numerous books I wouldn’t have and for free. I’m happy!

(annoying bug I find often: it seems certain characters or tokens just break it and it freezes. I need to manually skip ahead hoping it doesn’t get stuck again. Really detracts from the hands free nature and is difficult to manage while driving)

_qua

8 months ago

[-]

I recognize and appreciate that this is free right now. But surely it won't always be. And I can't keep paying $10-20/mo for every individual AI tool.

cube2222

8 months ago

[-]

So, I wanted to like this, but frankly the quality isn't fantastic.

The text to speech is alright, but it lacks almost any emotion, and it reads everything literally, which when the article/pdf has a weird layout, or has figures, doesn't sound natural. Though I expect they're just not using their top-of-the-line models for this - I've had much more luck pushing a pdf through Claude to generate the "verbal version" (which is mostly literal, but also describes the layout and figures) and then the result through the top-of-the-line ElevenLabs model.

Now, I've also checked out the podcast feature, and it's pretty clear they first do a textual generation, and then a simple text to speech. Again, lack of emotion, very mechanical flow.

I made a podcast of a technical article[0] in both ElevenLabs reader and Google's NotebookLM, and the NotebookLM podcast is a night-and-day improvement - maybe they use a better model, maybe they use straight "article to podcast" end-to-end multimodal generation, I don't know, but the quality, flow, emotion, is just on a completely different level. I had to quickly turn off the ElevenLabs-generated podcast cause I couldn't keep listening to it, while NotebookLM's one is legitimately enjoyable.

Now to finish on a more positive note, fingers crossed for the ElevenLabs team improving this, and us getting some competition in the area of article-to-audio, both podcast-style, and direct! I think, in general, it's a very promising product direction. Feature-wise, I would also love to get a daily overview podcast based on all my RSS feed articles for a given day.

[0]: https://huggingface.co/blog/modernbert

nmca

8 months ago

[-]

I’ve listened to a few audiobooks on long drives, and have been surprised how hard it is to find good voices on audible. Often a book that might otherwise be good has a prohibitively annoying tone. So honestly the exciting thing here is the customisation.

That said, even in their cherries the emphasis still isn’t quite right in the Tolkien example.

smoothbenny

8 months ago

[-]

Tried this app last week w/ an EPUB. It read all of the drop caps as individual letters, before moving on to the remaining portion of the word. It said “tilde” before each item in an unordered list. Too distracting to be of any practical use for me, unless there’s a setting I missed.

jnsaff2

8 months ago

[-]

It seems that this is using one of the less refined models. In English it sounds like a 4th grader reading in front of a class. Kinda stilted word by word voicing with static pauses between words and no variation in intonation. Tried with two voices and both are the same.

ipsum2

8 months ago

[-]

I use ElevenReader on a weekly basis, and it sounds fine. Definitely not what you describe.

stavros

8 months ago

[-]

Well, you get what you pay for...

codybontecou

8 months ago

[-]

I just wish this had a Chrome Extension so I can listen to article while on my computer.

zeroq

8 months ago

[-]

A friend of mine is an actor (cinema, tv, theater) who makes a significant amount of his income as a voice actor.

For a long time I wanted to make a game - think The Stanley Parable or Thomas Was Alone - that would be narrated by the voice of either David Attenborough or Morgan Freeman. You know, it's a low hanging fruit, you can have a two hours long footage of zebras running around narrated by either of these and it's suddenly eerily fascinating.

So far I'm AI skeptic, but this voice thing really makes me think about an actual shift in how certain jobs can become irrelevant in foreseeable future.

reustle

8 months ago

[-]

I’ve been using this for a few weeks, it works great. Can’t wait until this is built natively into browsers or even the OS (ios voice is currently terrible)

frontalier

8 months ago

[-]

ios voice works better than read-aloud from chatgpt does. it sucks but doesn't fail after the first paragraph or so

andrewstuart

8 months ago

[-]

TTS seemed to take a great leap forward a few years ago and seems to have stalled again.

Services are expensive and in most cases the voices are easily detectable as not human. I would find it very hard to listen to such voices for a long period of time.

Even ElevenLabs voices which seem to be known as the best have only a few that are really good quality but even then they're very, very far from the capabilities of a human.

saeedesmaili

8 months ago

[-]

It's a nice idea, but pretty much useless without a pocket integration or an api to programmatically import content.

milofeynman

8 months ago

[-]

This raises an interesting question around the rights of the author/publisher and who they sold their ebook rights to. If in 3 years we have a perfect AI voice that can read any book as good or better than mid-level narrators, why would you ever buy an audiobook when you could just buy the ebook and pick your voice(s). What a time to be alive

evrenesat

8 months ago

[-]

The rise of streaming has made CDs and other offline media obsolete and publishing rights for them largely irrelevant. Audiobooks are likely to face a similar demise. One by one, all the frictions, I mean the colours of life, are fading away, sacrificed for the sake of convenience.

Edit: I think the effect of the invention of vinyl on live performers is more akin to how the commoditisation of HQ TTS will be detrimental to audiobook narrators.

wiether

8 months ago

[-]

I guess it's the same with other jobs: AI will replace the mid/low quality workers, but the good ones will keep delivering something AI can't.

Two audiobooks that come to my mind:

- The Lord of the Ring series read by Andy Serkis; not only he perfectly switches between each characters voice, but also the feeling of listening "Gollum" for ours is something else altogether

- David Goggins' books; the audiobook version is completely different than the book, since he's not just reading the book, and overall it makes the content easier to digest

https://chasingperfection.co.uk/post/2013/01/14/text-to-spee...

vunderba

8 months ago

[-]

I don't know if you remember but some of the earlier Kindles had both speakers and TTS built in but were sadly pressured to remove the feature.

[1] - https://unicornriot.ninja/2024/sextortion-coms-inside-a-vile...

mkmk3

8 months ago

[-]

Damn, tried a unicornriot article [1] and it just skipped several paragraphs past the grisly stuff.

Can anyone else confirm?

limo11

8 months ago

[-]

Did you have some iconic voice selected? It skipped most likely due to inappropriate content. You can try with some non iconic voice

mkmk3

8 months ago

[-]

Wasn't using an iconic voice but it does seem to be voice specific, good call

ravetcofx

8 months ago

[-]

Important article but horrific content. It seems to read it all for me.

mkmk3

8 months ago

[-]

For sure, I saw where it was skipping and I wouldn't have been surprised if it were intentional, but good to disprove. Thanks for checking, have a good day

macco

8 months ago

[-]

How is the quality compared to speechify?

I use it to listen to PDFs. It works, but has plenty of hiccups with headers, footers and colons.

limo11

8 months ago

[-]

Way better

jacek

8 months ago

[-]

I love the idea as I listen to a lot of podcasts and an occasional audiobook.

The first impression is not that great. There's nothing natural about the voice. While individual words and phrases sound good, there's still no decent cadence and intonation. Feels flat and robotic.

However, I will definitely experiment some more.

davidanekstein

8 months ago

[-]

I use ElevenLabs to narrate tutorials for my app and I’m a happy customer thus far.

Here is an example: https://youtube.com/shorts/UKjqrydITLA?si=iC7ehp6LmlLH0M-U

zoba

8 months ago

[-]

I've been enjoying this app except I could not find a way to export the content to an audio file. I want to send the content to others - I'd even take a link to a website with a Play button (just not one that forces an app download)

https://www.youtube.com/watch?v=ua4rYsMdC4U

crakhamster01

8 months ago

[-]

The generative podcasts feature feels so dystopian. I didn't realize this SNL skit was based off of a real product lol

wink

8 months ago

[-]

> Application error: a client-side exception has occurred (see the browser console for more information).

Probably because I have WebGL disabled in this browser. Not exactly sure what they're doing with it on the landing page, maybe the fluffy effects.

berbec

8 months ago

[-]

I have used Moon+ Reader [1] for years with the build-in Android TTS service. It works very well, is free, and sounds good enough for me.

1: https://moondownload.com/

eitally

8 months ago

[-]

ReadEra is pretty good, too. I used it to listen to books 2 & 3 of The Three Body Problem series last year .

t0lo

8 months ago

[-]

This is definitely the future, I'm worried about the electric slip and slide world we're heading into though, where everything is completely spoonfed and consumptive. I can't help but think we're heading back into animalism.

falcor84

8 months ago

[-]

> heading back into animalism

Could you expand upon this? Any milestones towards that which we should be mindful of?

t0lo

8 months ago

[-]

Technology is pretty quickly and apparently not only coming for our critical thinking, but our agency

With llms, "knowing things" is already starting to feel like a thing of a past, not to me, but to a lot of others, there's no longer an incentive to "switch on".

Why should a kid learn anything if a robot is instantly better at everything? Maths got replaced by calculators, deep critical thinking will get replaced by llms a lot of the time, which are word calculators, which is the closest thing we have to a logic calculator.

This is more passive autopilot software, which further promotes learning as something you 'consume' rather than something you seek.

The public consciousness has absolutely taken a semptember 11 tier nosedive since social media, we're approaching what I term cultual schizophrenia, which I posted about on my blog which I deleted, but I've readded it if you're interested [https://substack.com/home/post/p-156983317]. There's no contextualisers in the media to give the right emphasis to the right things.

This is just my perspective, from what I've seen from other younger people of my age. We are heading into extremely interesting times, everything profoundly destabalising thing we've speculated about is happening at the exact same time. We desperately need visionaries in politics.

Basically I'm not doing too hot

8 months ago

[-]

I’m sorry you’re stressed, but please at least consider that you may be falling into the generational “kids these days” trap. I’m old, so I have lived through the world being on the brink of disasters caused by AI, social media, gay marriage, violent video games, the internet in general, cell phones, pagers, nuclear weapons, television. Probably a bunch more world-ending crises I forgot.

The world is changing, but then again it always has been. IMO some things will get better, some will get worse, but ghe overall arc of human health and prosperity will continue upwards. There is less poverty, less starvation, more opportunity today than ever… even though some aspects of the world are bad and getting worse. That’s the way it’s always been.

falcor84

8 months ago

[-]

Some good observations there, but I'm still unclear on why you used the term "animalism" - none of that seems to me at all similar to how other species engage with the world.

mozzieman

8 months ago

[-]

The best ive heard but still too monotone over time compared too real productions. Feel blown away at first but listen a chapter or two gets difficult. Just a matter of time most likely until it becomes as good or better then the real thing.

juliendorra

8 months ago

[-]

You should try it with your own voice! (By first creating a custom voice on the web interface. The quick basic clone should be enough).

I found that it’s my preferred way to use their reader, as it makes the reading more neutral and transparent for my brain.

Klaster_1

8 months ago

[-]

Personally, I can't stand my voice when I hear its recording. I wish there was a way to easily tune it to sound more like what you hear. Maybe even use that adjusted voice during calls.

layer8

8 months ago

[-]

Most people don’t like hearing their own voice (how it sounds like in reality, not in your head).

sys32768

8 months ago

[-]

I was briefly excited to try this on out-of-print books I find on Google Books, but alas the OCR in Acrobat PRO is super glitchy.

I need to find some AI-assist OCR to fix tons of mistakes like "186o" for 1860 or "gla)" for glad.

https://github.com/Dicklesworthstone/llm_aided_ocr

eigenvalue

8 months ago

[-]

I made a site like that, fixmydocuments.com

Also check out my open source project for that:

sys32768

8 months ago

[-]

Will definitely check it out.

Hyphenated words, page numbers and chapter titles seem to be my main issue. I can easily do search and replace on chapter titles though.

dazzaji

8 months ago

[-]

I rely on ElevenReader several times a week for quick text to voice on snippets of text I’m working on or sometimes on full web pages when I hand it a url. It’s quick and easy to use and the performance and quality is high.

jdlyga

8 months ago

[-]

The voices are excellent, but the app needs work. It lost my place in a book a few times, so I switched back to VoiceDream (don't use VoiceDream, it stinks unless you're a legacy purchaser).

unbecoming

8 months ago

[-]

As a first impression, french sounding names should be read as french sounding, even in english text. The voice per se is ok, but as delivery goes (pausing, title vs content), it could be better.

mindwork

8 months ago

[-]

Downloaded the app and inserted 3 publicly visible URLs. Only last one was to be downloaded and listened. Not sure whether it's their UserAgent string or what

cooper_ganglia

8 months ago

[-]

The company I work for has been using ElevenLabs to translate hour-long programs into Spanish, French, Portuguese, Greek, German, and Chinese. We have a large international audience, so it's worked great for this purpose!

Before, we were hiring people to translate, and then hiring others to dub the audio. Now, our files are automatically translated and spoken in the voice of the actual speaker, and we just have a small Quality Control team of native speakers quickly verify the results are accurate. We've reduced costs and increased the quality of our translated media.

__rito__

8 months ago

[-]

Is there a pricing page? I am not seeing any.

jampekka

8 months ago

[-]

In the FAQ:

> Is the app free?

> Yes. The app is completely free to download and use today. Listening to content on the app will not consume credits from your monthly web plan. We do plan to eventually launch some premium version of the app, but even then we will maintain a generous free plan.

whazor

8 months ago

[-]

Is there any technology that can do separate voices for each individual person speaking in an audiobook?

Kerbiter

8 months ago

[-]

Would've been great as a TTS component that could be installed and used in existing e-readers.

freefaler

8 months ago

[-]

Android Moon Reader Pro TTS plugin works OK for me.

b33f

8 months ago

[-]

Is this streaming server-side audio or is the TTS running locally on device ? Can it work offline ?

yawnxyz

8 months ago

[-]

all server-side

you could build your local TTS using kokoro browser though — https://huggingface.co/spaces/webml-community/kokoro-webgpu

flakiness

8 months ago

[-]

Are there any good papers from which I can learn the recent development of TTS tech?

tarponjargon

8 months ago

[-]

try https://clipcast.it

Ingests URLs in a variety of ways, converts to natural language audio, puts it in your podcast feed.

Free to use.

gtirloni

8 months ago

[-]

The ToS are a nightmare, as usual for these services.

leumon

8 months ago

[-]

Unfortunately the app is not compatible with Android 15.

layer8

8 months ago

[-]

Does this work for reading articles on websites?

jeswin

8 months ago

[-]

The ad shows someone listening to an article or a story while driving a large vehicle - this is unsafe (depending on the individual). It's not like listening to music.

yreg

8 months ago

[-]

I'm curious, is there evidence it is unsafe?

jeswin

8 months ago

[-]

I can listen to a song while coding. I can't listen to a podcast while coding. A podcast demands way more attention than a song.

1.2 million people die in road accidents, and most of them are children and young people. Even more are seriously injured.

cess11

8 months ago

[-]

Are you saying you can't drive a car if passengers are talking?

If that's the case, maybe a driver's license isn't your thing?

yreg

8 months ago

[-]

That's a hypothesis but not evidence. I can present a counter-hypothesis: I fall asleep while listening to music (or staying in silence). Listening to spoken word keeps me awake or at least helps me notice that I'm getting tired.

1.2 million people die in road accidents, and most of them are children and young people. Even more are seriously injured.

brokensegue

8 months ago

[-]

People listen to talk radio while driving all the time

vunderba

8 months ago

[-]

If there were any substantial evidence of this, they would have shutdown the entire A.M. spectrum 50 years ago.

ratedgene

8 months ago

[-]

Honestly, why isn't this same service baked into my OS? the reader there is really atrocious, but I imagine even for a single voice a pretty small model can be downloaded and made available as a plugin for the reader app.