Reimagining the mouse pointer for the AI era
52 points
2 hours ago
| 29 comments
| deepmind.google
| HN
chromacity
7 minutes ago
[-]
My reaction to the first demo (recipe) is that it was slower than typing the same thing on your keyboard.

The second demo seems to be a wash: there's no time saved in saying "move this" versus "move crab".

The third demo doesn't seem to warrant the use of a pointer at all, since there is only one way to interpret the prompt.

None of this means that this approach will not be successful, but there's a reason why so many attempts to revolutionize user interfaces ended up going nowhere. For example, talking to your computer was always supposed to be the future, but in practice, it's slower and more finicky than typing.

In fact, the only new UI paradigm of the past 28+ years appears to have been touchscreens and swipe gestures on phones. But they are a matter of necessity. No one wants to finger-paint on a desktop screen.

reply
arjie
22 minutes ago
[-]
Oh interesting, this is very cool. At first I thought it was just focus-follows-mouse but it's more interesting. You have certain keywords trigger "add to prompt". Ignoring the voice functionality (which is admittedly crucial currently because other inputs currently take over focus), I've often wanted to just have a continuous conversation with the LLM as I 'point and click' (or tab over and select) at various things. Might be neat to have text input focus continue to go to the LLM where I'm typing text etc.

Sometimes I go to a different page to take a screenshot and other times I'm browsing for a file, and other times I'm highlighting some log lines. Cursor did this well, with selecting text in the terminal auto-focusing the Cursor agent textbox so you could talk to the agent and then select some text and you didn't have to re-select the original agent textbox again. The agent is a top-level function in that system not "just another app I have to switch to" to take my context with.

I have some small amount of bias because I've always felt input-constrained on computers. I have to move my hands to go places and that's exasperating. I've tried head tracking, had a vim pedal for a while, and used tiling WMs, and things like this to aid but while my vim-fu is pretty good and I function inside things very well with it, my cross-application interface isn't.

In the end, perhaps we all have our home offices with our Apple Vision Pros and we talk to them like this to maneouvre faster through our machines and get our ideas into them.

Cool research. I wonder what we'll end up with.

reply
why_at
37 minutes ago
[-]
My first impression coming away from this is skepticism.

Anything with voice controls for routine use is a pretty tough sell. Doing this when you're not completely alone would be annoying to everyone around you.

Most of their examples seem like they could have been done with a right click drop down menu so they don't really need to "re-invent the mouse pointer".

So is this thing talking to Google's servers all the time for the AI integration? So it won't work if you're not connected to the internet? Privacy concerns are obvious; now Google wants to have an AI watching literally everything you do on your computer?

Does it cost the user anything for the LLM use? If it's free will it stay free forever? That's quite a lot to give away if they're expecting people to use it to change a single word like in one of their examples. I guess they're expecting to make the money back by gathering data about literally everything you do on your computer.

There might be a killer app for AI integration with personal computers that has yet to be invented, but this doesn't look like it.

reply
AirMax98
17 minutes ago
[-]
Right — it does seem cool but the voice is patching over a major gap. If I'm talking already, why wouldn't I just describe what I'm looking at and have the AI grab it for me?
reply
nolist_policy
32 minutes ago
[-]
The "Edit an Image" Demo at the bottom is pretty fun. Maybe this is just Google flexing their LLM inference capacity.
reply
dandaka
6 minutes ago
[-]
Next generation of OS should have constant video and audio recognition by on device LLM. This will provide valuable context for a lot of scenarios. So instead of frequent copy-pasting we are used to, we can let agents access context of our whole workflows from different apps.

But Google is a very ill positioned candidate for such OS. I would rather trust Apple and local-first on-device models.

reply
kjellsbells
46 minutes ago
[-]
I sense a privacy problem brewing.

It reminds me of Microsoft Recall in the sense that some portion of the screen is going to be continuously transmitted outside of the users control.

What happens when someone browses something very private (planning a surprise engagement. looking at medical data. planning a protest)? All that data gets slurped to google and subject to a warrant or discovery or building your advertising fingerprint.

Maybe the idea is that the data is sent to AI only when you right click, but that seems like a very thin firewall that a product manager will breach in the interests of delivering "predictive AI" via some kind of precomputed results.

reply
juancn
18 minutes ago
[-]
Please don't.

I like text selection exactly how it is. I want precise controls.

It's fine for a touch interface like a phone, but on a computer I expect precision. As much as I can get.

reply
maheenaslam
12 minutes ago
[-]
The concept is good but accuracy in cluttered environment can be a concern, also misinterpreting context can be a problem
reply
nolist_policy
50 minutes ago
[-]
Wiggle at CAPTCHAs, wiggle at Termux, wiggle at Emacs, wiggle at the Godot Editor, wiggle at my remote desktop.

(Not going to happen)

reply
jpatten
49 minutes ago
[-]
Reminds me of Put That There https://m.youtube.com/watch?v=RyBEUyEtxQo
reply
loaderchips
1 hour ago
[-]
It's beautiful how the human mind can take something very obvious but overlooked and make it into this fantastic innovation. Fab stuff.
reply
tintor
1 hour ago
[-]
Of course, it isn't a Google Demo, if you can't use it to book a table at restaurant. (shown at the bottom of the page)
reply
jaccola
55 minutes ago
[-]
This seems like one of those things that is usable infrequently enough to be forgotten/poorly developed/never used. (Even before accounting for the actual failure rate of the LLM which will be none-zero).

Perhaps a text box and file upload isn’t the perfect interface for every use case but it is versatile which is a huge barrier to overcome.

reply
hmokiguess
28 minutes ago
[-]
Don't build these things, instead build protocols and expose system level APIs for application developers to build things.
reply
AbuAssar
1 hour ago
[-]
so Google will be monitoring whatever on the screen continuously or only when the user say the magic words (this, that, here, there)?
reply
EdgeExplorer
1 hour ago
[-]
Indeed. "AI-enabled pointer" is misdirection. This isn't an AI-enabled pointer; it's sending screen to AI, which yes, includes pointer position. The AI doesn't live in the pointer. The AI lives, apparently, so thoroughly in the system that it can see and do anything, and the pointer is just a way of giving it context.
reply
OtomotO
51 minutes ago
[-]
Google Recall. Hey, it's all about the marketing.
reply
xiphias2
12 minutes ago
[-]
Google needs to beat OpenAI and Antropic in coding models because that's where the big money is going. I love using the Gemini pro model for quick questions, but that's not where I'm spending the real money.

They have so many great software engineers but unable to use them to speed up coding AI research. Hopefully with Sergey's focus it will get better.

This cursor thing is just another experiment nobody cares about.

reply
iridione
1 hour ago
[-]
Interesting! I wonder how UI will evolve in the long-term? If there are browser-use/computer-use and clicky-clones automating pointer actions, do we really need complex UI anymore? If yes, when?
reply
Ancapistani
48 minutes ago
[-]
I've been playing with writing a visionOS app that allows an AI agent to be aware of what you're looking at at any given time.

At some point I fully expect eye tracking (or attention tracking) to be common enough to be a first-class input method.

reply
strgrd
1 hour ago
[-]
No thanks
reply
SirFatty
1 hour ago
[-]
It only took Google and their AI offering to come up with Graffiti.
reply
mcookly
1 hour ago
[-]
I wonder what sort of monstrous power would be unleashed if Google used Plan9 as a foundation.
reply
bitwize
42 minutes ago
[-]
They'd half-finish it then bury it, like they did with Fuchsia which is heavily Plan-9-inspired.
reply
Joker_vD
32 minutes ago
[-]
Just seven hours ago there was a plea on HN [0] to please not do this. Seriously, what are they smoking at Google right now?

[0] https://news.ycombinator.com/item?id=48107027

reply
jinkuan
54 minutes ago
[-]
being able to make precise edits would be huge for AI
reply
mvdtnz
1 hour ago
[-]
Both of the text based demos would have been simpler and faster with traditional mouse and keyboard interactions. What is the AI adding?
reply
hyperhello
51 minutes ago
[-]
They’re going to take your abilities to do anything and spread it across many places so you have to run around to do them, same as all the moneyed technology.
reply
wartywhoa23
17 minutes ago
[-]
Hype-flavored surveillance!
reply
dfxm12
42 minutes ago
[-]
It tracks what's on the screen and sends it back to Alphabet. If you're watching a video about BBQ, enjoy a bunch of ads for Omaha steaks and big green egg in your Gmail.

On a less serious note, the audience for this is people who want to optimize for what seems like the least amount of effort.

reply
slopinthebag
1 hour ago
[-]
It feels like everything modern is like this. No value added, just the appearance of it.
reply
pmarreck
29 minutes ago
[-]
There's already a product that does this lol

Aaaaand now I can't remember the name of it

reply
simondw
46 minutes ago
[-]
Maybe I'm misunderstanding, but what is new about the pointer itself? Seems to be functionally the same as selecting + tooltips / context menus.
reply
kwertyoowiyop
41 minutes ago
[-]
Shush, how is anyone going to get promoted with that kind of talk!?
reply
DaiPlusPlus
30 minutes ago
[-]
> but what is new about the pointer itself?

I'm hoping for a const-reference joke.

reply
LocalH
55 minutes ago
[-]
do not want
reply
OtomotO
52 minutes ago
[-]
Like a dream come true...

Nightmares are dreams as well and this is a nightmare like Windows Recall.

Technically wonderful though.

reply
themafia
1 hour ago
[-]
> We’ve been exploring new AI-powered capabilities to help the pointer not only understand what it’s pointing at, but also why it matters to the user.

We couldn't quite track you well enough before. So we're fixing that under the guise of "AI powered capabilities."

reply
brgsk
34 minutes ago
[-]
what the hell is going on at google
reply
SirMaster
46 minutes ago
[-]
Thanks, I hate it
reply