How it works: Agents read the current best result, propose a hypothesis, modify train.py, run the experiment on your GPU, and publish results back. When an agent beats the current best validation loss, that becomes the new baseline for every other agent. Agents learn from great runs and failures, since we're using Ensue as the collective memory layer.
This project extends Karpathy's autoresearch by adding the missing coordination layer so agents can actually build on each other's work.
To participate, you need an agent and a GPU. The agent handles everything: cloning the repo, connecting to the collective, picking experiments, running them, publishing results, and asking you to verify you're a real person via email.
Send this prompt to your agent to get started: Read https://github.com/mutable-state-inc/autoresearch-at-home follow the instructions join autoresearch and start contributing.
This whole experiment is to prove that agents work better when they can build off other agents. The timeline is live, so you can watch experiments land in real time.
To my smooth-brain naiveté, this feels like the sort of thing that we could reward with some sort of cryptocurrency in a blockchain? It's difficult to achieve gains, but it's relatively quick to independently verify (5 minute training runs).
I know "blockchain all the things" is sooooo 8 years ago, but I'm looking at the descending graph of progress here, and wondering if being able to claim improvement tokens (even for no reason other than NFT-esque bragging rights) wouldn't be a cool thing here?
I'm asking this as someone who knows next to nothing about crypto or blockchain or any of those things, but mainly thinking of trying to gamify assigning GPU rigs to "mining" these measurable improvements.
It makes a lot of sense to incentivize contributing. My idea was that you could earn "credits" for participating, which you could exchange for compute resources from the "swarm" later, on any given problem (training, local inference, etc).
The practical challenge is that adding a blockchain means agents also need to participate in consensus, store and sync the ledger, and run the rest of the network infrastructure on top of the actual research. So it needs a unit economic analysis. That said, all results already include full source code and deterministic metrics, so the hard part of verifiable compute is already solved. You could take this further with a zkVM to generate cryptographic proofs that the code produced the claimed score, so nobody needs to re-run anything to verify. Verification becomes checking a proof, not reproducing the compute.
Compute-credits are interesting. Contribute GPU time now, draw on the swarm later for training, inference, whatever you need. That's a real utility token with intrinsic value tied to actual compute, not speculation.
Yes, thank you for the validation! That was the core of what sparked this for me -- my cartoon drawing of blockchain is that it's dependent on problems that are difficult to solve (improve this codebase), but easy to verify (loss went down).
Like you noted, this is also cool in that it's valuable work (unlike most of these workloads)
I appreciate the opportunities for optimization you've laid out (such as zkVM) but it feels like that would be optional compared to the basic thing here?
And yeah -- what one _does_ with the crypto-credits is pretty open-ended. Like you said, drawing on the swarm for training or inference or whatever you need -- it feels like the sort of thing that one could use as a GPU battery of sorts. Most of my personal GPU work goes in bursts -- but most of the time my GPU is sitting idle.
Most of the other GPU share-cropping sorts of ideas I've seen floating around lack the ability to independently prove that work was done. Having a global metric for a shared target like this seems to solve what has been lacking in a lot of other distributed systems I've seen.
Looking at the graph on the website, it looks like it's already got a bit of a scoreboard and independent verification / validation of results. Feels like it would be a relatively small jump to crowdsource this and put it into a formal blockchain.
But the next natural question is: Would we stand to gain anything by adding blockchain to this?
It seems like you want to know what median, 5-95 or 1-99 differences might be? I also wonder how the "residual" plot looks like... If there are too many residual data points for a scatter plot then a histogram might be useful to visualize the modes. I suspect that as loss decreases multiple modes should condense or altogether collapse into one.
Like a live dashboard with swarm stats, best current result, etc?
I think that would be really neat, and get more people to contribute