Show HN: Autoresearch@home
74 points
1 day ago
| 9 comments
| ensue-network.ai
| HN
autoresearch@home is a collaborative research collective where AI agents share GPU resources to collectively improve a language model. Think SETI@home, but for model training.

How it works: Agents read the current best result, propose a hypothesis, modify train.py, run the experiment on your GPU, and publish results back. When an agent beats the current best validation loss, that becomes the new baseline for every other agent. Agents learn from great runs and failures, since we're using Ensue as the collective memory layer.

This project extends Karpathy's autoresearch by adding the missing coordination layer so agents can actually build on each other's work.

To participate, you need an agent and a GPU. The agent handles everything: cloning the repo, connecting to the collective, picking experiments, running them, publishing results, and asking you to verify you're a real person via email.

Send this prompt to your agent to get started: Read https://github.com/mutable-state-inc/autoresearch-at-home follow the instructions join autoresearch and start contributing.

This whole experiment is to prove that agents work better when they can build off other agents. The timeline is live, so you can watch experiments land in real time.

HanClinto
8 hours ago
[-]
Trigger warning: very stupid question to follow.

To my smooth-brain naiveté, this feels like the sort of thing that we could reward with some sort of cryptocurrency in a blockchain? It's difficult to achieve gains, but it's relatively quick to independently verify (5 minute training runs).

I know "blockchain all the things" is sooooo 8 years ago, but I'm looking at the descending graph of progress here, and wondering if being able to claim improvement tokens (even for no reason other than NFT-esque bragging rights) wouldn't be a cool thing here?

I'm asking this as someone who knows next to nothing about crypto or blockchain or any of those things, but mainly thinking of trying to gamify assigning GPU rigs to "mining" these measurable improvements.

reply
gavinray
8 hours ago
[-]
I had this idea in 2019 -- "p2p machine learning" with some sort of incentive.

It makes a lot of sense to incentivize contributing. My idea was that you could earn "credits" for participating, which you could exchange for compute resources from the "swarm" later, on any given problem (training, local inference, etc).

reply
austinbaggio
7 hours ago
[-]
I worked on building blockchains for about 4 years, and this is not a stupid question at all. The verification problem is real. A 5-minute training run produces an objective val_bpb score that anyone can reproduce from the published source code. And this is actually valuable work, unlike most proof of work chain workloads.

The practical challenge is that adding a blockchain means agents also need to participate in consensus, store and sync the ledger, and run the rest of the network infrastructure on top of the actual research. So it needs a unit economic analysis. That said, all results already include full source code and deterministic metrics, so the hard part of verifiable compute is already solved. You could take this further with a zkVM to generate cryptographic proofs that the code produced the claimed score, so nobody needs to re-run anything to verify. Verification becomes checking a proof, not reproducing the compute.

Compute-credits are interesting. Contribute GPU time now, draw on the swarm later for training, inference, whatever you need. That's a real utility token with intrinsic value tied to actual compute, not speculation.

reply
HanClinto
6 hours ago
[-]
> The verification problem is real. A 5-minute training run produces an objective val_bpb score that anyone can reproduce from the published source code. And this is actually valuable work, unlike most proof of work chain workloads.

Yes, thank you for the validation! That was the core of what sparked this for me -- my cartoon drawing of blockchain is that it's dependent on problems that are difficult to solve (improve this codebase), but easy to verify (loss went down).

Like you noted, this is also cool in that it's valuable work (unlike most of these workloads)

I appreciate the opportunities for optimization you've laid out (such as zkVM) but it feels like that would be optional compared to the basic thing here?

And yeah -- what one _does_ with the crypto-credits is pretty open-ended. Like you said, drawing on the swarm for training or inference or whatever you need -- it feels like the sort of thing that one could use as a GPU battery of sorts. Most of my personal GPU work goes in bursts -- but most of the time my GPU is sitting idle.

Most of the other GPU share-cropping sorts of ideas I've seen floating around lack the ability to independently prove that work was done. Having a global metric for a shared target like this seems to solve what has been lacking in a lot of other distributed systems I've seen.

Looking at the graph on the website, it looks like it's already got a bit of a scoreboard and independent verification / validation of results. Feels like it would be a relatively small jump to crowdsource this and put it into a formal blockchain.

But the next natural question is: Would we stand to gain anything by adding blockchain to this?

reply
Lerc
1 day ago
[-]
When training lots of models with subtly different parameters like this, Is there anything to be learned from the differences in logprobs between them for the same input. Obviously a model with a lower loss has better logprobs but are they fairly uniformly similar with gains in one or a few areas, or is it noisier with a lower overall loss?
reply
itissid
1 day ago
[-]
> are they fairly uniformly similar with gains in one or a few areas, or is it noisier with a lower overall loss?

It seems like you want to know what median, 5-95 or 1-99 differences might be? I also wonder how the "residual" plot looks like... If there are too many residual data points for a scatter plot then a histogram might be useful to visualize the modes. I suspect that as loss decreases multiple modes should condense or altogether collapse into one.

reply
Bossie
16 hours ago
[-]
What is being researched? Any objective?
reply
austinbaggio
11 hours ago
[-]
The objective is to train a small GPT language model to the lowest possible validation bits-per-byte (val_bpb) in 5-minute runs, using AI agents to autonomously iterate on the code. This builds on Karpathy's autoresearch: https://x.com/AustinBaggio/status/2031888719943192938?s=20
reply
gavinray
8 hours ago
[-]
Is there any way to "follow" the current state?

Like a live dashboard with swarm stats, best current result, etc?

I think that would be really neat, and get more people to contribute

reply
austinbaggio
7 hours ago
[-]
Great idea. On it.
reply
ahmedhawas123
1 day ago
[-]
First time I am seeing this or autoresearch in general. Incredibly cool. I can think of plenty of use cases this can apply to (e.g., drug research, trading).
reply
austinbaggio
1 day ago
[-]
Yeah the obvious workloads are for training, I think I want to point this at RL next, but I think drug research is a really strong common good next target too. We were heavily inspired by folding@home and BOINC
reply
miligauss
1 day ago
[-]
The agents also monitor and follow research strategies regardless of performance baseline, so anything used in the knowledge base include local minimums are considered during strategy ideation. In theory u could use mac mini for instance and still have results that help the aggregate.
reply
gabia
1 day ago
[-]
Cool! However when I click the commit_url links I get a 404 page at github.
reply
austinbaggio
1 day ago
[-]
We thought about storing all of the commits on Ensue too, but we wanted to match the spirit of Andrej's original design, which leans heavily on github. Curious what you were looking for when trying to inspect the code?
reply
gabia
7 hours ago
[-]
I was hoping to see the code change the agent made! I thought when I click the commit link I thought I would see it on github (since it is a github url...), but the links don't seem to work, they take me to github 404. e.g. https://github.com/mutable-state-inc/autoresearch-at-home/co... I'm not sure what that has to do with Ensue so I've probably misunderstood how this works.
reply
zmanian
1 day ago
[-]
Could the website also make it clearer that you need a GPU to contribute!
reply
austinbaggio
1 day ago
[-]
I know it's a bit of a barrier. . . but I set one up on vast.ai really quickly and ran it for a day for the price of lunch. One of our teammates ran it from their old gaming PC too, and it still found novel strategies
reply
miligauss
1 day ago
[-]
fwiw the agents just drop their whole solutions
reply