Instead I need to read through 10s+ of listings, keeping track of cleaning/other fees, weighing location and price as well amenities if any. [0]
I have yet to see a model do any of that (yes, I'm aware it's possible and maybe someone is doing that).
> We're just beginning. As we increase training resources, Ace will become more intelligent and capable.
Can we not? "As time goes on we will get better because magic" - I truly hate this hopium in the LLM community. LLM problems are not like normal software problems, you cannot code your way out of a hole. You can prompt or re-train, both suck and (training at least) has a long turn-around time and is not cheap.
I really enjoy LLMs and love trying new things that use them. This idea/product/service/whatever is just not compelling. It feels like I'd need to babysit this process to make sure it didn't do something stupid. It's the same reason I have never in my life bought something through my Amazon Echos, the upside is minimal and the downside can be massive.
[0] OpenAI's Deep Research is closer to what I'm talking about but even that is laughably bad sometimes. It looks impressive as hell, it impressed me, and then I went to ask it an "easy" question so I could share the question to a friend to show them how cool it was. The "easy" question was something I was familiar with and the final results were lacking (to be nice). I asked it to research local bakeries and it missed a ton of places that show up in 1-2 google searches. -- This is the problem with LLMs across the board, they are great at producing good sounding output but that doesn't make it right/true/complete.
The founder of generalagents even says that they want free humans from digital labor. I can’t stand these takes. Leave my computer to me!
Is there a way to train or augment training on applications you've never seen before? We have a bunch of custom Java applications that we use in finance, curious about some additional automation.
1. Could you talk a bit more about your behavioral-training? If ace-control is trained on behavioral recordings, would it choose the most efficient path for the agent to take to complete a task? I'm guessing humans choose naturally take less-optimal steps.
2. What causes the huge speed increase? I'm guessing there were a lot of optimizations made, especially since this behavioral-training seems very different from vision models. I'm guessing the model is smaller, so it's interesting that accuracy is highest. I'd be interested to see a comparison vs. 4o-mini
3. Would be neat for it to handle instructions offline/locally - like "connect me to wifi" ;)
4. Would be cool if agent could work in the background so I can do something else in the meantime. ;)
Could you elaborate on the types of tasks and data sources used to train Ace, and how these contribute to its performance on desktop automation?
Ace is said to outperform other models on your suite of computer use tasks. Can you provide more details on these benchmarks and how Ace compares to existing automation tools?