Technology is not just sprouting out of the ground out of its own. It is humans who are making it. Therefore if technology is helpful it was humans who helped.
> Let’s not mention the fact the particular large language model, LLM called Chat GPT they chose, was never the right kind of machine learning for the task of describing images.
Weird. I would think LLMs are exactly the right kind of tool to describe images. Sadly there is no more detail about what they think would be a better approach.
> I fully predict that blind people will be advocating to make actual LLM platforms accessible
Absolutely. The LLM platforms indeed very much should be accessible. I don't think anyone would have beef with that.
> I also predict web accessibility will actually get worse, not better, as coding models will spit out inaccessible code that developers won’t check or won’t even care to check.
Who knows. Either that, or some pages will become more accessible because the effort of making it accessible will be less on the part of the devs. It probably will be a mixed bag with a little bit of column A and column B.
> Now that AI is a thing now, I doubt OCR and even self-driving cars will get any significant advancements.
These are all AI. They are all improving leaps and bounds.
> An LLM will always be there, well, until the servers go down
Of course. That is a concern. This is why models you can run yourself are so important. Local models are good for latency and reliability. But even if the model is run on a remote server as long as you control the server you can decide when it becomes shut down.
> Weird. I would think LLMs are exactly the right kind of tool to describe images.
TFA is from 2023, when multimodal LLMs were just picking up. I do agree that that prediction (flat capability increase) has aged poorly.
> I doubt OCR and even self-driving cars will get any significant advancements.
This particular prediction has also aged quite poorly. Mistral OCR, an OCR-focused LLM, is working phenomenally well in my experience compared to "non-LLM OCRs".
> Absolutely. The LLM platforms indeed very much should be accessible. I don't think anyone would have beef with that.
AIs I have used have fairly basic interfaces - input some text or an image and get back some text or an image - is that not something that accessibility tools can already do? Or do they mean something else by "actual LLM platform"? This isn't a rhetorical question, I don't know much about interfaces for the blind.
In some languages, pronunciation(a+b) == pronunciation(a) + pronunciation(b). Polish mostly belongs to this category, for example. For these, it's enough to go token-by-token.
For English, it is not that simple, as e.g. the "uni" in "university" sounds completely different to the "uni" in "uninteresting."
In English, even going word-by-word isn't enough, as words like "read" or "live" have multiple pronunciations, and speech synthesizers rely on the surrounding context to choose which one to use. This means you probably need to go by sentence.
Then you have the problem of what to do with code, tables, headings etc. While screen readers can announce roles as you navigate text, they cannot do so when announcing the contents of the live region, so if that's something you want, you'de need to build a micro screen-reader of sorts.
Image understanding is still drastically lower than text performance, making glaring mistakes that are hard to understand but gemini 2.5 models are far and away the best in what I've tried.
> Weird. I would think LLMs are exactly the right kind of tool to describe images.
Not sure but the Grok avatars or characters, whatever, I've experimented with them, though I hate the defaults that xAI made, because they seem to not be generic simple AI robot or w/e after you tell them to stop flirting and calling you babe (seriously what the heck lol) they can really hold a conversation. I talked to it about a musician I liked, very niche genre of music, and they were able to provide an insanely accurately relatable song from a different artist I did not know, all in real time.
I think it was last year or the year before? They did a demo where they had two phones, one could see, one could not, and the two ChatGPT instances were talking to each other, one was describing the room to the other. I think we are probably there by now to where you can describe a room.
If their number is significant, they could themselves be foundation for some AI business, even if all other consumers will turn away from AI.
For about accessibility of web, must say it is terrible even for non-blind person, but AI could also change this to better.
What I mean, you may hear about PalmPilot organizers, they was very limited in hardware, but existed private company, which provided proxy-browser, which input ordinary web sites and shown on PalmPilot small display optimized version, plus have mode for offline reading. With existing AI now could do much better.
Between people, it's extremely commonly considered impolite to request excess help from other people. -- So, having an info retrieval / interactive chat which will patiently answer questions is a boon for everyone.
I guess you can try and frame all 'helping' as "you're +1 if you're being helpful", but don't be surprised if not everyone sees things that way all the time.
Transformer models making screen readers better is cool. Companies deciding to fire their human voice actors and replacing all audiobooks with slop is decidedly not cool.
You can really see this happening in translation right now. Companies left and right are firing human translators and replacing their work with slop, and it's a huge step down in quality because AI simply cannot do the previous level of quality. (Mr Chad Gippity isn't going to maintain puns or add notes for references that the new audience won't catch.)
And that's in a market where there is commercial pressure to have quality work. Sloppy AI translations are already hurting sales.
In accessibility, it's a legal checkbox. Companies broadly do not care. It's already nearly impossible to get people to do things like use proper aria metadata. "We're a startup, we gotta go fast, ain't got no time for that".
AI is already being used to provide a legally-sufficient but practically garbage level of accessibility. This is bad.
But the coverage of audiobooks is… also not great? Of the books I've purchased recently, maybe 30% or less have audiobooks? What if I want to listen to an obscure book? Should I be paying a human narrator to narrate my personal library?
The copyright holders are incentivized to make money. It does not make financial sense to pay humans to narrate their entire catalog. As long as they're the only ones allowed to distribute derivative works, we're kind of in a pickle.
You weren't doing that before AI either, were you?
The practical answer has already been "you pipe an ebook through a narrator/speech synthesizer program".
> The copyright holders are incentivized to make money.
Regulations exist. It'd be rather trivial to pass a law mandating every ebook sold to be useable with screen readers. There's already laws for websites, albeit a bit poorly enforced.
That said, I use it with pretty direct prompting, and I strongly prefer the "AI Partners with a Human" model.
But for what it concerns my previous comment: It doesn't really matter what the "state of the art" AI is because companies simply do not use that. They just pipe it through the easiest & cheapest models, human review (that does not actually get the time to do meaningful review) optional.
AI is going great for the blind.
That . (not present in the Hacker News posting) made me think it was sarcastic, combined with the author's clear dislike of generative AI.There seems to be a sizable crowd of cryptocurrency hype critics that have pivoted to criticizing the AI hype (claiming that the hype itself is also largely caused by the same actors, and that accordingly neither crypto nor AI have much object-level merit to them) – ironically and sadly in a quite group-think-heavy way, considering how many valid points of criticism there are to be made of both.
Ugh, this is what sort of bothers me about the accessibility community. Something about it is always coming off preachy and like a moral argument. This is the worst way to get folks to actually care. You’re just making them feel bad.
Look the fact is everyone needs to use technology to live these days. And us devs suck ass at making those things accessible. Even in the slightest. It won’t be until we all age into needing it when it finally becomes a real issue that’s tackled. Until then, tools like LLMs are going to be amazingly helpful. Posts like this are missing the forest in the trees.
My mom has been using ChatGPT for a ton of things that’s helpful. It’s a massive net positive. The LLM alt tags Facebook added a long time ago, massively helpful. Perfect? Hell no. But we gotta stop acting like these improvements aren’t helpful and aren’t progress. It comes across as whiny. I say this as someone who is in this community.
Great to read that blind folks get so much benefit from LLMs. But this one quote seemed odd. The most amazing OCR and document attribution products are becoming available due to LLMs
I recall speaking to a girl who thanked these voice assistants for helping her order food and cook.
Right now I'm using AI while traveling, it gets stuff 85% right which is enough for lunch.
Other blind people are all in with the AI hype, describing themselves as partially sighted because of AI with their Meta Rayban glasses. Sidenote, Rayban glasses report that I died last year. I somehow missed my funeral, sorry to all those who were there without me. I do like brains, though...
Meanwhile many LLM sites are not blind-friendly, like Claude, Perplexity, and there are sites that try but fail so exasperatingly hard that I lose any motivation for filing reports where I can't even begin to explain what's breaking so hard. It's evident that OpenWebUI have not tested their accessibility with a screen reader. Anyway blindness organizations (NFB mainly) have standardized on just use ChatGPT and everything else is the wild west where they absolutely do not care. Gemini could be more accessible, on web and especially Android, but all reports have been ignored so I'm not going to bother with them anymore. It's sad since their AI describes images well. Thank goodness for the API, and tools like [PiccyBot](https://www.piccybot.com/) on iOS/Android and [viewpoint](https://viewpoint.nibblenerds.com/) and [OmniDescriber](https://audioses.com/en/yazilimlar.php) on Windows. I'm still waiting for iOS to catch up to Android in LLM image descriptions built into the screen reader. Meanwhile, at least we have [this shortcut](https://shortcutjar.com/describe/documentation/). It uses GPT 4O but at least it's something. Apple could easily integrate with their own Apple Intelligence to call out to ChatGPT or whatever, but I guess competition has to happen. Or something. Maybe next year lol. In the meantime I'll either use my own cents to get descriptions or share to Seeing AI or something like a cave man.
I help people with very mundane and human tasks: cooking, gardening, label identification.
Did the volume of calls change meaningfully with the introduction of AI into Be my Eyes?
autocomplete="off" is an instance of something that user agents willfully ignore based on their own heuristics, and I'm assuming accessibility tools have always ignored a lot of similar things.
I don't understand what's going on here. He's angry at us horrible sighteds for refusing to give them incorrect information? Or because we refuse to tell them when their LLMs give them incorrect information? Or he thinks that we're refusing to give them correct information which makes it okay that the LLM is giving them incorrect information?
It's not about you so no need to be personally offended.
Have a bit of empathy and do a bit of research and it's not hard to understand that accessibility is limited.
My empathy is not the problem here. Having a disability doesn't give you a free pass to be a bitter asshole.