A few other examples:
- The Touch Bar was much more than an OLED strip, it was Apple’s first move in the transition to Apple Silicon on macs. The Apple T1 chip in the 2016 Touch Bar MacBooks was the first solely Apple-designed processor to appear in a Mac and took over several responsibilities away from intel chipsets like power management, fans, sleep/wake, access to the camera & mic, and the secure enclave powering touch ID. Then the T2 added encryption of the SSD, audio management, image processing for the camera, and prevented tampering with the boot process
- The iPhone 3G shipped with a Liquidmetal SIM eject tool, which is made from a strong custom metal alloy which is "practically unbendable by hand unless you want to hurt or cut your fingers." Although Apple hasn’t released anything with the alloy since then, now nearly 20 years later Apple is rumored to be using liquid metal in their upcoming foldable iPhone.
- RealityKit had 3D scanning and a lot of other cool AR capabilities for years which didn’t make sense until the Apple Vision Pro was released.
If the bar had been added on top of those, I don't think there would've been the same kind of hate for it.
I read their glasses when taking video or pics the lense will light up and or flash more prominently then Metas. Maybe that will help the whole privacy issue and also it's not Meta (do love my Meta or smart glasses as a whole will ditch Metas for Apple quickly as both pair of Metas broke & there's no store for support).
This should really be the last resort.
Input on the iPhone is so dreadful nowadays. Their palm rejection is definitely worse than before, so mistyping is more frequent. Their text-correction algorithm for typing is worse than before, and it frequently makes incorrect corrections to words that I don't notice, because they change words a few words back from where I typed. And STT hasn't improved. On top of that, my fingers are tired of the phone form factor. Please make the iphone not a chore to use, apple.
Looks like Wispr Flow uses a cloud model [0]:
> Cloud based speech processing infrastructure for 1B users
It gets to be a messy comparison because my iPhone can do STT with no latency pretty well fully on device, but Wispr Flow requires a cloud model, but to be fair, older Apple devices do as well. It's not an apples and oranges comparison, but I think those technical details make this a non direct comparison in a few ways.
For on-device with low system resource usage, Apple's is pretty damn good.
I found another dreadful iPhone input "feature" yesterday. If you are browsing around in third party carplay apps, and ready to tap your selection, but instead press the accelerator first, it truncates the list to only a few items, and scrolls to the top.
Way to reduce driving distractions guys! What's next? If the car is moving, maps changes destinations?
I really wish human computer interaction research were more broadly applied, and if you do dumb stuff like all of the automotive / carplay world, then you'd be liable in court.
I once had a car that hid the backup cam behind a legal disclaimer every time you turned it on. I'm sure at least one pedestrian was hit by a car in reverse while that screen was on. The manufacturer should be 100% liable for the poor UI decision.
my go to example of this is this talk by Saqib Shaikh (a blind software engineer at Microsoft) giving a talk about Visual Studio. Link is timestamped
I wish more people would watch videos like this just because having a realistic idea of how blind people do certain tasks can help you move from pity or even compassion to a more productive kind of understanding. I think sometimes when you haven't seen it, you can't really even imagine how it can be done.
Likewise, YouTube’s “premium” feature of not displaying ads is laughable when displaying content is literally an internal browser function.
I pay anyway, because I was going to pay for an on-demand streaming music service anyway.
What really frustrates me is watching/listening to discussion of music, because I am forced to listen to the talking at 1x because the music sounds wrong (and is wrong) at anything other than 1x.
Ideally it should be done while encoding.
Maybe it’s just a matter of practice.
It's not rare among the blind in general.
Unless you're completely technologically illiterate, the kind of person who has no idea how to install an app or sign up for an online account, you're probably doing something of the sort.
I'm not even sure what to say, but discoveries like this are why I use hackernews, I'd never have known this otherwise.
I can easily understand Eloquence (the speech synthesizer he's using) at that speed, but I struggled a bit with this one.
You have two modes: "focus mode", where you can edit text in text fields and keys are passed straight to the browser, and "browse mode", where keys move a virtual cursor around the page.
In browse mode, navigating with just arrow keys all the time would be just as slow as you might imagine, so you use single-key keyboard shortcuts to move by role, E.G. to the next heading, button, table or unvisited link.
The keyboard layout is optimized for memorizability and not efficiency, you use the actual arrow keys instead of hjkl for example, but the concepts are eerily similar.
There are a couple of other approaches to solve this problem, Mac OS's Voice Over is much more Emacs-like for example, and each approach has its own pros and cons, but that's definitely one way to do it.
We all do that, I mean unless you’re hearing impaired.
Everyone’s familiar with dropping a coin or such and knowing exactly where it landed without looking.
That’s more passive sonar though.
Do I recall seeing videos of guys mountain biking and making a hissing sound for an active sonar style echo location?
Or am I making that memory up.
RIP kid https://youtu.be/fnH7AIwhpik
If he’d like your humor I like it too :dolphin:
Even better, fire up Orca (or whatever screenreader application your OS comes with) yourself and try to use your computer while shutting your eyes, kind of eye-opening (no pun intended) what kind of experience these sort of users typically get. And also, you quickly start to understand why they set the speech rate for their voice synthesizer to be so fast, it's almost unbearable navigating applications (and particularly lists) otherwise.
Unfortunately it seems impossible to get all that much funding for accessibility work :/ I wonder what ever happened to the Newton accessibility bus intended to supplement Wayland...
Hm, never heard about it, but now I'm wondering too. I just finished implementing proper accessibility support for my native app toolkit for Linux, macOS and Windows, but only done it for X11 so far, I was just gonna get started with Wayland. What is the accessibility story on Wayland, couldn't people rely on the same protocols as with X11? That was my impression, but haven't really dig into yet.
There are apps I use semi-regularly that less-experienced screen reader users thought were inaccessible, and I couldn't even explain what they were doing wrong from memory. The ways of working around accessibility issues are just so ingrained in me that all I can usually remember is "yeah I did this somehow, but it was six months ago and I have absolutely no idea which specific tricks I needed for this one."
I imagine that for coding it also helps deal with the fundamental problem of an ephemeral stream rather than a persistent document that you can navigate visually in multiple dimensions. Working memory is limited, and getting more text in in a short period of time probably helps you work within that better. I also imagine that working with text via audio all the time gradually stretches and improves memory.
You can show a lot more info on a screen than you can transmit through speech in a short period of time. That doesn't mean you read faster than you listen, just that sighted people essentially use their eyeballs as an "input device" to decide what information to look at.
If there's an object on the screen that you want to examine but that you don't need to click, you can just "navigate to it" with your eyeballs, without ever touching a mouse or keyboard. We don't have that luxury.
This means we need a much more efficient system for navigating what's on the screen, but that only gets you so far. Eventually, the easiest way to deal with this problem is just to increase the bandwidth of your channel, and you do that by increasing the speech rate.
Wouldn’t opposite mean you listen at sub 1x speed.
Whereas as your definition seems to be ”I’m the same, but less so.”
I'm not getting my hopes up though given apple's history with Siri, which is truly awful.
This has been the typical pattern for Apple for the last few years. The flashy features are announced at WWDC, accessibility has a dedicated, earlier press release. Before this practice, accessibility announcements would usually be tucked in some WWDC slide that most people wouldn't even notice.
I just would not wanna promise anything. Except “available for download this Friday“ once the gold master is passing tests.
Twenty years and text input & manipulation on iPhone sucks a big fat hair pair of dogs balls still.
The last time I daily drove Android was over two years ago and it was immeasurably less God-damn-I-wanna-dig-Jobs-corpse-up-n-give-the-guy-a-piece-of-my-mind, only problem is his grave is unmarked. Arsehole!
After a few more years of Thanksgivings and Christmases and Mothers' Days, we'll finally train her up to a reasonable speed lmao.
Whether that control you see visually is actually accessible to a blind user is a different matter entirely. Further, it maxes out at 2x, but a blind person would typically screen read at the equivalent of 3-6x.
Related, it seems like YouTube recently paywalled speed increase beyond 2x. Another way in which it's not cheap to lose sight, I guess.
True.
We can frame it even more strongly: "default societal practices actively discriminate against people with disabilities; they intentionally, consciously choose to make life harder for people who're disadvantaged".
And by you I mean you specifically. And you, and you. You’re next.
Literally all we do is sit around in meetings all day making charts of who we’re going to fuck over next, and graphing how much fucking-you-over we’ve achieved in the last quarter. In fact, it’s the major KPI our Jewish overlords rate us on.
Give me a break.
Seems like it would be a win-win to have a user setting to opt out of video in exchange for ungating that feature.
Pretty sure there's enough blind people who don't listen to voice at insane speeds, because they listen in their non-native second language or for whatever other reason. What's wrong in using lowest common denominator that's 100% accessible to those people as well as people who want faster speeds? Unlike "too fast", "too slow" doesn't get entirely inaccessible, it's just boring.
Such a random reason to criticize for.
Some blind people listen to things at superhuman speeds, but not all blind people. Using a normal reading speed is a sensible choice for an ad trying to appeal to blind people since you don't want to intimidate those who don't use superhuman speeds.
Going from that to "heh a sighted person made this because it's normal speed" is simply incorrect.
It was the sort of statement an HNer might make to showcase some trivia they have about some other group, but they oversold it.
Yes, for lots of reasons. It takes practice to get up to a high speed with a given TTS. People who go blind later in life are just beginning, and it can take a long time for them to get up to really high speeds. You may also need to reset somewhat when you change from one TTS to another. And blind people's ears are subject to problems just like anyone else's; if your hearing isn't great you may need slower speeds or higher volumes or both. That's why even though most people use screenreaders at much higher speeds, the defaults when you turn on a new device are painfully slow. You have to set a conservative default so people with less experience/worse ears/whatever can get by.
Anyway I don't think it's a criticism. It's just noting that it doesn't depict how most people will use end up using it, and if you're curious about what typical usage sounds like, you should look for another example.
It's like how in videos that teach people a foreign language, everyone speaks slowly and uses simple words, even though native speakers don't talk like that at all. The GP is simply saying that an actual blind person would be way more efficient at it, but they made the video with inefficient settings so sighted people could understand what was going on.
What does this mean?
I wish more companies focused on how they can help humans instead of replacing us or squeezing us as hard as possible in the name of productivity.
My experience is limited to my elderly parents who have trouble seeing. With the text size Apple allows them to set it to, their phones are unreadable. Text runs off the screen in every app, 1st and 3rd party.
In their bill example, the user is told to confirm with the provider. Why not offer to call the number on the bill? Instead of telling them to use text detection, do it for them? Presumably Apple Intelligence would already have that capability. I’m afraid this will be a gimmick at best.
EDIT: Forgot to mention, the grip is good to see. Hopefully they don’t charge the apple tax on it.
I have a problem with astigmatic halation that makes ‘dark mode’ difficult to read. Since iOS 26, multiple aspects of the system have been made dark only, contrary to the system setting. Writing text correctly should be the lowest of low-hanging fruit.
I suspect this is more of a flashy ‘AI’ promotion rather than reflective of any real commitment.
They treat new industry advancements as technology, not products itself.
AI will be a feature to improve the customer experience, not the product itself.
https://blog.google/products-and-platforms/platforms/android...
https://android-developers.googleblog.com/2024/09/talkback-u...
Increasing their productivity is helping humans.
My one hope is that this eventually becomes widespread enough to stop alt text scolds.
Don't get me wrong, Apple using these technologies to help humans who are in need of help is laudable. But let's not pretend we don't know why most corporations don't look into this kind of thing. I think if we're being honest, we all very much know why they leave this sort of thing to the always nebulous "others".
> “When we work on making our devices accessible by the blind,” he said, “I don’t consider the bloody ROI.” It was the same thing for environmental issues, worker safety, and other areas that don’t have an immediate profit. The company does “a lot of things for reasons besides profit motive. We want to leave the world better than we found it.”
— https://www.forbes.com/sites/stevedenning/2014/03/07/why-tim...
I was just answering the question of why other corporations don't.
Money.
There's relatively little money in helping the visually impaired. You have to do it because you want to do it. Not because you're going to get rich.
I assume almost everyone looks into spending less money than more money for equivalent goods and services.
this sort of thing really needs input from someone that uses it before we can judge it
Full VR hasn't done well, but it does continue to make me wonder if there's a market for a stripped and slimmed device. I'd maybe be interested in a device that does optical controls if it fit in regular-sized glasses. I'd be super interested if it had a HUD system (even a super basic one that can only show a handful of symbols). Better still if it had some basic audio, but maintaining the "regular glasses" form factor is more important to me than the HUD or audio.
1. Use AI to determine how much a bill is for
2. Call up the people who billed you and ask them how much they billed you
3. Pay billed amount
(I'm also picturing the poor CSR at the other end of the phone wading through hundreds and hundreds of call logs over the years for simple requests and managers up above screaming 'why is this guy calling us all the damn time costing us money'...)
The form-factor is a significant issue for real-world usage, and it's kind of unclear if there is a plan for a future product line given its (pretty abysmal) initial receiption.
The price and lack of content and developer interest have been the main problems.
And ultimately, people just don’t seem that interested in this product category. Meta ran into the same issue, though at least they targeted gaming where there is a decent niche.
VR/AR tech seems cool and futuristic, but hasn’t quite found its killer app yet.
https://www.youtube.com/watch?v=B3SmsSCvoss
Those made the ad stand out in my opinion.
Without that, there wouldn’t really be great vlm and conversational models.
The AI companies might have paid for the dictation of some videos on their own but voice assistants etc wouldn’t have existed and our ability to have AI that eventually understands the world would be much much harder.
You however…. Maybe need to switch to decaf?
Edit: was thinking about this feature https://support.apple.com/guide/iphone/get-live-captions-of-...
I think the trap in creating anything is doing it for a crowd. Art, software, anything... it turns out better when it is made with a specific, named individual in-mind.
Accessibility features are almost always championed and field-tested with one specific loved one in mind and I think that's what keeps the technical solutions personable and grounded.
Maybe just don't wear them in a car?
I use those motion cues on my iPhone even though I don't struggle with motion sickness https://www.youtube.com/shorts/OxbjggMcKrk
Someone using this feature will want motion cues as well.
And in your quote: Dwell Control is a feature set to interact with an Apple Vision Pro using only your eyes. Lingering your gaze on a button will press it. An AVP is now more comfortable to use in more situations because of motion cues.
Maybe just rethink your "maybe just" comment...?
Still somewhat odd when a bus drives out from behind your Terminal mind.
Why not?
Don’t be so scared of variety. You just keep subjecting yourself to more of the same. The unending familiarity makes you dull.
Did not test it yet, but blind users may be more prone to dominate Command Line Interfaces, which are becoming increasingly popular due to its easy integration with LLM
The above caption for Apple Vision Pro is for a video that to me, as an Apple Vision Pro user, is discomforting.
More questions are raised than are answered by the short video: Is the user able to fit the Apple Vision Pro by him/herself? What happens when dwelling on a directional control misregisters? Can the user recalibrate the "Eyes and Hands" setting? Dwelling on a control displaces focus and there may be impeding objects in the path of the power wheelchair. Is this really a good idea?
To my sensibility, the video is unsettling (at best), especially given how cumbersome Apple Vision Pro is.
[0] https://www.apple.com/newsroom/2026/05/apple-unveils-new-acc...
Through that lens, this all looks a bit performative to me, but again, maybe I'll be pleasantly surprised.
The one thing I'm mildly excited to see is the improvement to Voice Control, as guessing what the programmatic name of a button is or having to constantly use a numbers grid to target elements doesn't sound fun.
To respond to what I see in some of the comments:
- On speech rate: It does take quite a bit of practice to crank up the speech rate and there's a degree of retraining you need to do when you switch voices. A lot of more "human" sounding voices are harder to follow at super high speeds which is why a lot of people prefer more robotic but consistent speech and generally aren't convinced by AI-powered TTS yet; they often fall apart if you raise the speech rate past a certain point. - Re: actually waiting for the target audience's verdict: This is so important. I see more and more companies, individuals etc. talk about accessibility, build accessibility solutions and evangelize AI for accessibility without EVER talking to the people they claim to help. This will almost certainly mean mistakes will be made, up to and including doing more harm than good. If you want to do accessibility right, that includes AI products of any kind, hire people with lived experience or you'll get the equivalent of machine-translated text, hackerproof security in one click or an AI-powered coffee bar that orders thousands of rubber gloves. Coincidental note: I have time for new projects right now :P
https://developer.apple.com/documentation/accessibility/brai...
Probably 80% of "LLM's are below expectation" complaints (from the general population) involves some form of image analyses.
Image tokenization is hard because unlike language tokenization, where every token is extremely dense with meaning, image tokens tends to be meaningless or irrelevant but are processed all the same.
Give an SOTA LLM a picture of toothpicks and ask it to move one to make a square, and it will probably struggle and fumble it. But give a mid-size LLM from 2 years ago the same problem in verbal form, and it will nail it almost every time.
That takeaway is, do everything you can to avoid having the LLM need to rely on images for the answer.
The other thing is that if you're around others, voice input means you have no privacy. Even if you're not doing anything particularly private, it's a bit awkward and potentially embarrassing. If you use touch input in conjunction with a screen reader, you can be more like a "normal" user in that what you're doing is just between you and your phone.
iOS is just painfully good. I can pause a video, put my finger on text inside the video, and copy it. Until they added it, I didn’t even know how much I needed that.
People talk a lot about how MacOS has gone downhill but I feel like it would have been a good start if developers could continue to patch over Apple's shortcomings like they used to be able to.
I imagine that we would be a few years into a spectrum of tools like this if they didn't lock it down like they do.
Totally aware that plenty of HN commenters are very glad that Apple keeps this locked down. I'm just the other opinion, that's all.
I have fond memories of an old coworker 10 years ago who is blind. He would use his phone no problem, texting, going about his day, he was even on Tinder (credit to Tinder for making their app so accessible long ago). He would commute on his own, walk to the train station, even transfer to another train during peak rush hour. I’m not saying it was all easy for him, but nothing in this video really stood out to me more than what shirt was on the bed. I know other services/apps have long existed to be the “eyes” for people who need support, but this video feels….uneventful?
I may be cynical about this though, as I often hate how Apple’s marketing makes these emotional bids about how life-critical they are to society - which is fair to a degree..but it just feels cheap to be glamorising “look! we saved this person from pending doom, cool right??”
If this includes improvements to the screen recognition feature in Voice Over, it could provide accessibility for apps where the developer doesn't care about accessibility, which is extremely common.
The vision capabilities could be useful if they are done well, but I suspect that will always be covered better by 3rd party apps.
Additionally I don't believe this is just marketing. This is adaption to a changing market. Apple's customer base is aging and having these kinds of features will allow them to keep using Apple products for a longer.