FilterHN

Show HN: SpongeCake – open-source SDK for OpenAI computer use agents

13 points

by theonlyt3

1 month ago

| past

| 3 comments

| github.com

| HN

Hey HN! Wanted to quickly put this together after seeing OpenAI launched their new computer use agent

We were excited to get our hands on it, but quickly realized there was still quite a bit of set-up required to actually spin up a VM and have the model do things. So we wanted to put together an easy way to deploy these OpenAI computer use VMs in an SDK format and open source it

Hopefully this tooling is helpful to other folks building AI agents! Here’s a link to the repo (https://github.com/aditya-nadkarni/spongecake) - please try it out and give us a star. If you have any feedback, add it as a comment to this post! Or if you simply just love spongecake, show support for the delicious treat

▲

pradhit

1 month ago

[-]

Looks really cool! Curious, do you have any thoughts/advice on how to improve agent reliability? I usually run into a lot of inconsistency when I need it to execute a workflow at even small scale

Also how do you guys think about multi agent workflows? i.e. having a couple agents take actions in parallel. Wondering if its possible to have two share a vm.

▲

theonlyt3

1 month ago

[-]

Yep, we've also run into inconsistency issues when trying to build with these agents. The biggest thing we've seen help is by breaking the task down into smaller actions, effectively writing a script for the agent (e.g., go to google.com. type 'hello world', etc). The more loaded the prompt, the more off the rails it might go. We want to create more tools to help with reliability/this inconsistency, but it's also something that I hope improves relatively soon from the foundational model companies investing more here

In terms of multi agent workflows - it's something we've been thinking about! We especially think this could be especially helpful when filling out a form to speed things up even more. It's hard for me to think of other use cases though where multiple agents might need to share a vm (as opposed to just spinning up another vm with another agent), but curious to hear your thoughts!

▲

shanmohta

1 month ago

[-]

Looks pretty interesting! Would this be something I can set up to have an agent fill out forms online? Or monitor a specific inbox and auto-respond to emails? There's stuff like that which I need to do manually today that could be cool to automate with agents.

▲

theonlyt3

1 month ago

[-]

Yes, and yes!

Filling out forms is I think one of the biggest use cases for computer use agents. We're working on some stuff to specifically make that use case faster and more accurate

Regarding monitoring a specific inbox, and auto-responding, you could also set-up an agent to do that, but I would advise having the agent automate more formulaic/simple emails rather than having it use too much of its own judgement when writing the copy for these emails. You can also add in checks for the agent to check in before you before hitting send

▲

abhardwaj_1

1 month ago

[-]

This looks pretty cool. Quick question, are you guys thinking about adding support for other computer-use agents like Claude computer use? Could be interesting to have this with Claude's larger context window

▲

theonlyt3

1 month ago

[-]

Yes, agreed! We're planning on adding Claude as well as any other computer use models that pop up

▲

abhardwaj_1

1 month ago

[-]

Awesome...looking forward to that