FilterHN

Ace: Realtime Computer Autopilot

87 points

by huerne

22 hours ago

| past

| 8 comments

| generalagents.com

| HN

▲

joshstrange

2 hours ago

[-]

Every time I see "Book me a trip to X" I immediately shut down. I have yet to see any LLM handle all the cases that human would/needs to. Those sites are hard to navigate for a human, doing a search and clicking on the first result is not "autopilot". If that's all I was going to do then I'd just do it myself.

Instead I need to read through 10s+ of listings, keeping track of cleaning/other fees, weighing location and price as well amenities if any. [0]

I have yet to see a model do any of that (yes, I'm aware it's possible and maybe someone is doing that).

> We're just beginning. As we increase training resources, Ace will become more intelligent and capable.

Can we not? "As time goes on we will get better because magic" - I truly hate this hopium in the LLM community. LLM problems are not like normal software problems, you cannot code your way out of a hole. You can prompt or re-train, both suck and (training at least) has a long turn-around time and is not cheap.

I really enjoy LLMs and love trying new things that use them. This idea/product/service/whatever is just not compelling. It feels like I'd need to babysit this process to make sure it didn't do something stupid. It's the same reason I have never in my life bought something through my Amazon Echos, the upside is minimal and the downside can be massive.

[0] OpenAI's Deep Research is closer to what I'm talking about but even that is laughably bad sometimes. It looks impressive as hell, it impressed me, and then I went to ask it an "easy" question so I could share the question to a friend to show them how cool it was. The "easy" question was something I was familiar with and the final results were lacking (to be nice). I asked it to research local bakeries and it missed a ton of places that show up in 1-2 google searches. -- This is the problem with LLMs across the board, they are great at producing good sounding output but that doesn't make it right/true/complete.

▲

quantumHazer

57 minutes ago

[-]

OpenAI should have called Wide Research (but it's not as catchy). In all my tests in things I have some expertise it seems to provide a wide range of information but it never goes really deep. It lacks understanding of what is more valuable and what not.

▲

vivzkestrel

12 hours ago

[-]

There was a very nice article yesterday titled "The case against conversational interfaces" What do you have to say about that https://julian.digital/2025/03/27/the-case-against-conversat...

▲

quantumHazer

9 hours ago

[-]

Very good article.

The founder of generalagents even says that they want free humans from digital labor. I can’t stand these takes. Leave my computer to me!

▲

iamleppert

20 hours ago

[-]

The recruitment use-case was hilarious, thanks I needed that.

▲

dilDDoS

18 hours ago

[-]

That example made me wonder if this was satire...I'm guessing not, but pretty funny nonetheless. I bet my boss would love me texting him the names of random people from LinkedIn.

▲

unfunnytard

10 hours ago

[-]

i'd love to see it being real (someday)

▲

the__alchemist

21 hours ago

[-]

This isn't what I was expecting at all based on the title!

▲

sherjilozair

21 hours ago

[-]

I'm the founder and CEO of General Agents. Happy to answer questions!

▲

chews

21 hours ago

[-]

From your site, "Ace works like we do—performing mouse clicks and keystrokes based on the screen and prompt—trained with <3 by our team of software specialists and domain experts on over a million tasks."

Is there a way to train or augment training on applications you've never seen before? We have a bunch of custom Java applications that we use in finance, curious about some additional automation.

▲

sherjilozair

20 hours ago

[-]

Ace is actually uniquely designed to support that. Our training staff simply record their screens and mouse+keyboard events. We transform that into behavior cloning data to train the model. It's quite easy for us to do custom agents for enterprise or other lesser-known software and workflows. Reach out to us at contact@generalagents.com if you're interested.

▲

gkolli

20 hours ago

[-]

Hi! Looks pretty interesting - few questions/thoughts:

1. Could you talk a bit more about your behavioral-training? If ace-control is trained on behavioral recordings, would it choose the most efficient path for the agent to take to complete a task? I'm guessing humans choose naturally take less-optimal steps.

2. What causes the huge speed increase? I'm guessing there were a lot of optimizations made, especially since this behavioral-training seems very different from vision models. I'm guessing the model is smaller, so it's interesting that accuracy is highest. I'd be interested to see a comparison vs. 4o-mini

3. Would be neat for it to handle instructions offline/locally - like "connect me to wifi" ;)

4. Would be cool if agent could work in the background so I can do something else in the meantime. ;)

▲

timabdulla

20 hours ago

[-]

How does it perform on e.g. WebVoyager, WebArena, or OSWorld? These seem to be the oft-cited benchmarks when comparing computer-use agents.

▲

xfr

21 hours ago

[-]

First, I am extremely impressed by the demo. It looks truly groundbreaking.

Could you elaborate on the types of tasks and data sources used to train Ace, and how these contribute to its performance on desktop automation?

Ace is said to outperform other models on your suite of computer use tasks. Can you provide more details on these benchmarks and how Ace compares to existing automation tools?

▲

martin_

18 hours ago

[-]

Amazing performance! Do you anticipate making the model available for commercial use or are you primarily focused on releasing agents built upon it?

▲

lilyhills03

10 hours ago

[-]

so excited!!

▲

cbiscuit