Ace: Realtime Computer Autopilot
87 points
22 hours ago
| 8 comments
| generalagents.com
| HN
joshstrange
2 hours ago
[-]
Every time I see "Book me a trip to X" I immediately shut down. I have yet to see any LLM handle all the cases that human would/needs to. Those sites are hard to navigate for a human, doing a search and clicking on the first result is not "autopilot". If that's all I was going to do then I'd just do it myself.

Instead I need to read through 10s+ of listings, keeping track of cleaning/other fees, weighing location and price as well amenities if any. [0]

I have yet to see a model do any of that (yes, I'm aware it's possible and maybe someone is doing that).

> We're just beginning. As we increase training resources, Ace will become more intelligent and capable.

Can we not? "As time goes on we will get better because magic" - I truly hate this hopium in the LLM community. LLM problems are not like normal software problems, you cannot code your way out of a hole. You can prompt or re-train, both suck and (training at least) has a long turn-around time and is not cheap.

I really enjoy LLMs and love trying new things that use them. This idea/product/service/whatever is just not compelling. It feels like I'd need to babysit this process to make sure it didn't do something stupid. It's the same reason I have never in my life bought something through my Amazon Echos, the upside is minimal and the downside can be massive.

[0] OpenAI's Deep Research is closer to what I'm talking about but even that is laughably bad sometimes. It looks impressive as hell, it impressed me, and then I went to ask it an "easy" question so I could share the question to a friend to show them how cool it was. The "easy" question was something I was familiar with and the final results were lacking (to be nice). I asked it to research local bakeries and it missed a ton of places that show up in 1-2 google searches. -- This is the problem with LLMs across the board, they are great at producing good sounding output but that doesn't make it right/true/complete.

reply
quantumHazer
57 minutes ago
[-]
OpenAI should have called Wide Research (but it's not as catchy). In all my tests in things I have some expertise it seems to provide a wide range of information but it never goes really deep. It lacks understanding of what is more valuable and what not.
reply
vivzkestrel
12 hours ago
[-]
There was a very nice article yesterday titled "The case against conversational interfaces" What do you have to say about that https://julian.digital/2025/03/27/the-case-against-conversat...
reply
quantumHazer
9 hours ago
[-]
Very good article.

The founder of generalagents even says that they want free humans from digital labor. I can’t stand these takes. Leave my computer to me!

reply
iamleppert
20 hours ago
[-]
The recruitment use-case was hilarious, thanks I needed that.
reply
dilDDoS
18 hours ago
[-]
That example made me wonder if this was satire...I'm guessing not, but pretty funny nonetheless. I bet my boss would love me texting him the names of random people from LinkedIn.
reply
unfunnytard
10 hours ago
[-]
i'd love to see it being real (someday)
reply
the__alchemist
21 hours ago
[-]
This isn't what I was expecting at all based on the title!
reply
sherjilozair
21 hours ago
[-]
I'm the founder and CEO of General Agents. Happy to answer questions!
reply
chews
21 hours ago
[-]
From your site, "Ace works like we do—performing mouse clicks and keystrokes based on the screen and prompt—trained with <3 by our team of software specialists and domain experts on over a million tasks."

Is there a way to train or augment training on applications you've never seen before? We have a bunch of custom Java applications that we use in finance, curious about some additional automation.

reply
sherjilozair
20 hours ago
[-]
Ace is actually uniquely designed to support that. Our training staff simply record their screens and mouse+keyboard events. We transform that into behavior cloning data to train the model. It's quite easy for us to do custom agents for enterprise or other lesser-known software and workflows. Reach out to us at contact@generalagents.com if you're interested.
reply
gkolli
20 hours ago
[-]
Hi! Looks pretty interesting - few questions/thoughts:

1. Could you talk a bit more about your behavioral-training? If ace-control is trained on behavioral recordings, would it choose the most efficient path for the agent to take to complete a task? I'm guessing humans choose naturally take less-optimal steps.

2. What causes the huge speed increase? I'm guessing there were a lot of optimizations made, especially since this behavioral-training seems very different from vision models. I'm guessing the model is smaller, so it's interesting that accuracy is highest. I'd be interested to see a comparison vs. 4o-mini

3. Would be neat for it to handle instructions offline/locally - like "connect me to wifi" ;)

4. Would be cool if agent could work in the background so I can do something else in the meantime. ;)

reply
timabdulla
20 hours ago
[-]
How does it perform on e.g. WebVoyager, WebArena, or OSWorld? These seem to be the oft-cited benchmarks when comparing computer-use agents.
reply
xfr
21 hours ago
[-]
First, I am extremely impressed by the demo. It looks truly groundbreaking.

Could you elaborate on the types of tasks and data sources used to train Ace, and how these contribute to its performance on desktop automation?

Ace is said to outperform other models on your suite of computer use tasks. Can you provide more details on these benchmarks and how Ace compares to existing automation tools?

reply
martin_
18 hours ago
[-]
Amazing performance! Do you anticipate making the model available for commercial use or are you primarily focused on releasing agents built upon it?
reply
lilyhills03
10 hours ago
[-]
so excited!!
reply
cbiscuit
20 hours ago
[-]
cool!!
reply
misbah143
20 hours ago
[-]
This is super fast. Future of computer agents. Bullish on this.
reply