The second demo seems to be a wash: there's no time saved in saying "move this" versus "move crab".
The third demo doesn't seem to warrant the use of a pointer at all, since there is only one way to interpret the prompt.
None of this means that this approach will not be successful, but there's a reason why so many attempts to revolutionize user interfaces ended up going nowhere. For example, talking to your computer was always supposed to be the future, but in practice, it's slower and more finicky than typing.
In fact, the only new UI paradigm of the past 28+ years appears to have been touchscreens and swipe gestures on phones. But they are a matter of necessity. No one wants to finger-paint on a desktop screen.
Sometimes I go to a different page to take a screenshot and other times I'm browsing for a file, and other times I'm highlighting some log lines. Cursor did this well, with selecting text in the terminal auto-focusing the Cursor agent textbox so you could talk to the agent and then select some text and you didn't have to re-select the original agent textbox again. The agent is a top-level function in that system not "just another app I have to switch to" to take my context with.
I have some small amount of bias because I've always felt input-constrained on computers. I have to move my hands to go places and that's exasperating. I've tried head tracking, had a vim pedal for a while, and used tiling WMs, and things like this to aid but while my vim-fu is pretty good and I function inside things very well with it, my cross-application interface isn't.
In the end, perhaps we all have our home offices with our Apple Vision Pros and we talk to them like this to maneouvre faster through our machines and get our ideas into them.
Cool research. I wonder what we'll end up with.
Anything with voice controls for routine use is a pretty tough sell. Doing this when you're not completely alone would be annoying to everyone around you.
Most of their examples seem like they could have been done with a right click drop down menu so they don't really need to "re-invent the mouse pointer".
So is this thing talking to Google's servers all the time for the AI integration? So it won't work if you're not connected to the internet? Privacy concerns are obvious; now Google wants to have an AI watching literally everything you do on your computer?
Does it cost the user anything for the LLM use? If it's free will it stay free forever? That's quite a lot to give away if they're expecting people to use it to change a single word like in one of their examples. I guess they're expecting to make the money back by gathering data about literally everything you do on your computer.
There might be a killer app for AI integration with personal computers that has yet to be invented, but this doesn't look like it.
But Google is a very ill positioned candidate for such OS. I would rather trust Apple and local-first on-device models.
It reminds me of Microsoft Recall in the sense that some portion of the screen is going to be continuously transmitted outside of the users control.
What happens when someone browses something very private (planning a surprise engagement. looking at medical data. planning a protest)? All that data gets slurped to google and subject to a warrant or discovery or building your advertising fingerprint.
Maybe the idea is that the data is sent to AI only when you right click, but that seems like a very thin firewall that a product manager will breach in the interests of delivering "predictive AI" via some kind of precomputed results.
I like text selection exactly how it is. I want precise controls.
It's fine for a touch interface like a phone, but on a computer I expect precision. As much as I can get.
(Not going to happen)
Perhaps a text box and file upload isn’t the perfect interface for every use case but it is versatile which is a huge barrier to overcome.
They have so many great software engineers but unable to use them to speed up coding AI research. Hopefully with Sergey's focus it will get better.
This cursor thing is just another experiment nobody cares about.
At some point I fully expect eye tracking (or attention tracking) to be common enough to be a first-class input method.
On a less serious note, the audience for this is people who want to optimize for what seems like the least amount of effort.
Aaaaand now I can't remember the name of it
I'm hoping for a const-reference joke.
Nightmares are dreams as well and this is a nightmare like Windows Recall.
Technically wonderful though.
We couldn't quite track you well enough before. So we're fixing that under the guise of "AI powered capabilities."