Launch HN: Spine Swarm (YC S23) – AI agents that collaborate on a visual canvas
26 points
1 hour ago
| 9 comments
| getspine.ai
| HN
Hey HN! We're Ashwin and Akshay from Spine AI (https://www.getspine.ai).

Spine Swarm is a multi-agent system that works on an infinite visual canvas to complete complex non-coding projects: competitive analysis, financial modeling, SEO audits, pitch decks, interactive prototypes, and more.

Here's a video of Spine Swarm in action: https://youtu.be/R_2-ggpZz0Q

We've been friends for over 13 years. We took our first ML course together at NTU, in a part of campus called North Spine, which is where the name comes from. We went through YC in S23 and have spent about 3 years building Spine across many product iterations.

The core idea: chat is the wrong interface for complex AI work. It's a linear thread, and real projects aren't linear. Sure, you can ask a chatbot to reference the financial model from earlier in the thread, or run research and market sizing together, but you're trusting the model to juggle that context implicitly. There's no way to see how it's connecting the pieces, no way to correct one step without rerunning everything, and no way to branch off and explore two strategies side by side. ChatGPT was a demo that blew up, and chat stuck around as the default interface, not because it's the right abstraction. We thought humans and agents needed a real workspace where the structure of the work is explicit and user-controllable, not hidden inside a context window.

So we built an infinite visual canvas where you think in blocks instead of threads. Each block is our abstraction on top of AI models. There are dedicated block types for LLM calls, image generation, web browsing, apps, slides, spreadsheets, and more. Think of them as Lego bricks for AI workflows: each one does something specific, but they can be snapped together and composed in many different ways. You can connect any block to any other block, and that connection guarantees the passing of context regardless of block type. The whole system is model-agnostic, so in a single workflow you can go from an OpenAI LLM call, to an image generation mode like Nano Banana Pro, to Claude generating an interactive app, each block using whatever model fits best. Multiple blocks can fan out from the same input, analyzing it in different ways with different models, then feed their outputs into a downstream block that synthesizes the results.

The first version of the canvas was fully manual. Users entered prompts, chose models, ran blocks, and made connections themselves. It clicked with founders and product managers because they could branch in different directions from the same starting point: take a product idea and generate a prototype in one branch, a PRD in another, a competitive critique in a third, and a pitch deck in a fourth, all sharing the same upstream context. But new users didn't want to learn the interface. They kept asking us to build a chat layer that would generate and connect blocks on their behalf, to replicate the way we were using the tool. So we built that, and in doing so discovered something we didn't expect: the agents were capable of running autonomously for hours, producing complete deliverables. It turned out agents could run longer and keep their context windows clean by delegating work to blocks and storing intermediary context on the canvas, rather than holding everything in a single context window.

Here's how it works now. When you submit a task, a central orchestrator decomposes it into subtasks and delegates each to specialized persona agents. These agents operate on the canvas blocks and can override default settings, primarily the model and prompt, to fit each subtask. Agents pick the best model for each block and sometimes run the same block with multiple models to compare and synthesize outputs. Multiple agents work in parallel when their subtasks don't have dependencies, and downstream agents automatically receive context from upstream work. The user doesn't configure any of this. You can also dispatch multiple tasks at once and the system will queue dependent ones or start independent ones immediately.

Agents aren't fully autonomous by default. Any agent can pause execution and ask the user for clarification or feedback before continuing, which keeps the human in the loop where it matters. And once agents have produced output, you can select a subset of blocks on the canvas and iterate on them through the chat without rerunning the entire workflow.

The canvas gives agents something that filesystems and message-passing don't: a persistent, structured representation of the entire project that any agent can read and contribute to at any point. In typical multi-agent systems, context degrades as it passes between agents. The canvas addresses this because agents store intermediary results in blocks rather than trying to hold everything in memory, and they leave explicit structured handoffs designed to be consumed efficiently by the next agent in the chain. Every step is also fully auditable, so you can trace exactly how each agent arrived at its conclusions.

We ran benchmarks to validate what we were seeing. On Google DeepMind's DeepSearchQA, which is 900 questions spanning 17 fields, each structured as a causal chain where each step depends on completing the previous one, Spine Swarm scored 87.6% on the full dataset with zero human intervention. For the benchmark we used a subset of block types relevant to the questions (LLM calls, web browsing, table) and removed irrelevant ones like document, spreadsheet, and slide generation. We also disabled human clarification so agents ran fully independently. The agents were not just auditable but also state of the art. The auditability also exposed actual errors in an older benchmark (GAIA Level 3), cases where the expected answer was wrong or ambiguous, which you'd never catch with a black-box pipeline. We detail the methodology, architecture, and benchmark errors in the full writeup: https://blog.getspine.ai/spine-swarm-hits-1-on-gaia-level-3-...

Benchmarks measure accuracy on closed-ended questions. Turns out the same architecture also leads to better open-ended outputs like decks, reports, and prototypes with minimal supervision. We've seen early users split into two camps: some watch the agents work and jump in to redirect mid-flow, others queue a task and come back to a finished deliverable. Both work because the canvas preserves the full chain of work, so you can audit or intervene whenever you want.

A good first task to try: give it your website URL and ask for a full SEO analysis, competitive landscape, and a prioritized growth roadmap with a slide deck. You'll see multiple agents spin up on the canvas simultaneously. People have also used it for fundraising pitch decks with financial models, prototyping features from screenshots and PRDs, competitive analysis reports and deep-dive learning plans that research a topic from multiple angles and produce structured material you can explore further.

Pricing is usage-based credits tied to block usage and the underlying models used. Agents tend to use more credits than manual workflows because they're tuned to get you the best possible outcome, which means they pick the best blocks and do more work. Details here: https://www.getspine.ai/pricing. There's a free tier, and one honest caveat: we sized it to let you try a real task, but tasks vary in complexity. If you run out before you've had a proper chance to explore, email us at founders@getspine.ai and we'll work with you.

We'd love your feedback on the experience: what worked, what didn't, and where it fell short. We're also curious how others here approach complex, multi-step AI work beyond coding. What tools are you using, and what breaks first? We'll be in the comments all day.

BloondAndDoom
14 minutes ago
[-]
I didn’t read the post, I checked out the website just like 99% of the people will do.

Simple advice, if you are selling a product with a selling point of being visual, show it on your website. Not in a YouTube video but actual screenshots, short cut 10 sec video/gif

reply
a24venka
12 minutes ago
[-]
Definite miss on our part, we're working on making the product experience more visible upfront on our landing page.
reply
pqs
12 minutes ago
[-]
I had to read this text in order to understand what this tool does, because I could not know from the website (without watching a video). You should use Spine to improve your website. ;-)
reply
gravity2060
29 minutes ago
[-]
In the demo video you shared (yt link) how many credits did that whole project take? What is the prices to fix elements of it (for example of you dislike a minor aspect of the generated spreadsheet do follow up instructions utilize only the narrow subset of agents that has been demoed to that subtask, or does it create new agents who have to create new context in the narrow follow up task?)
reply
a24venka
21 minutes ago
[-]
Credits are consumed by the blocks that get generated, not by the agents themselves. Some blocks are cheaper than others. A simple prompt or image block is a single model call, while browser use or deliverable blocks like documents and spreadsheets run models in a loop and cost more. Blocks also cost more when they have more blocks connected to them (more input tokens).

In the demo video I shared, the task cost about ~7,000 credits since it ran around 10 BrowserUse blocks and produced multiple deliverables.

If you want to fix a specific block (or set of blocks), you can select them and the chat will scope itself to primarily work on those. In that case fewer blocks run, so it's cheaper.

reply
nusl
9 minutes ago
[-]
7000 credits, ouch. The tool is really cool, I do think it's super useful. I also like the swarm particle animations in the backround.
reply
sebmellen
1 hour ago
[-]
Just as a tiny first piece of feedback, the main marketing website is very hard to understand or grok without a demo of how the tool works. Even just the quick YouTube video that you added in your post here, if embedded, would make a difference.

There are so many "agentic tools" out there that it's really hard to see what differentiates this just based on the website.

reply
a24venka
1 hour ago
[-]
Thanks for the feedback! Definitely agree that we could do more with the marketing site. We're working on a gallery page to showcase some demos.
reply
gravity2060
37 minutes ago
[-]
Is it possible to build self-improving swarm loops? (ie swarm x builds a thing, swarm y critiques and improved x’s work, repeat…)
reply
a24venka
32 minutes ago
[-]
We've only partially explored this so far, but it's a great suggestion.

The canvas architecture naturally supports this kind of loop since agents can already read and build on each other's outputs — so the plumbing is there, it's more about building the right orchestration on top. Definitely something we're exploring.

reply
jpbryan
50 minutes ago
[-]
Why do I need a canvas to visualize the work that the agents are doing? I don't want to see their thought process, I just want the end product like how ChatGPT or Claude currently work.
reply
a24venka
40 minutes ago
[-]
That is definitely a valid way of using Spine as well. You can just work in the chat and consume the deliverables similar to how you would in other tools.

The canvas helps when you want to trace back why an output wasn't what you expected, or if you're curious to dig deeper.

Even beyond auditability, the canvas also helps agents do better work: they can generate in parallel, explore branches, and pass context to each other in a structured way (especially useful for longer-running tasks).

reply
dude250711
37 minutes ago
[-]
Dark UI pattern: pretends that it is immediately usable only to redirect for sign-up.
reply
a24venka
30 minutes ago
[-]
Fair point, we should be more upfront about the sign-up step. Given that tasks are long-running and token-intensive, we do need an auth barrier to protect against abuse, but we can definitely do a better job signaling that before you hit the canvas.
reply
garciasn
20 minutes ago
[-]
Or, just show us in an animated GIF how the product works in practice. Then, should we somehow find benefit in a visual representation of a swarm's workflow, we could sign up rather than having to, unintuitively, scroll down to watch a YouTube video.

e: 'be' to 'we'; oops.

reply
a24venka
14 minutes ago
[-]
Good call and noted. We're working on making the product experience more visible upfront.
reply
esafak
22 minutes ago
[-]
Is the value prop that I can see what the agent is doing? This is not the way: https://youtu.be/R_2-ggpZz0Q?t=158

How am I supposed to get anything out of this? Consider that agents are going to get faster and run more and more tasks in parallel. This is not manageable for a human to follow in real time. I can barely keep up with one agent in real-time, let alone a swarm.

What I could see being useful is if you monitored the agents and notified me when one is in the middle of something that deserves my attention.

reply
a24venka
16 minutes ago
[-]
This is a fair point, we are exploring progressive disclosure on the canvas to better utilize the space and make the key artifacts more readily visible. We do have other panels (the chat, task and deliverable) that have alternate views of what the agent did and the key deliverables.

Beyond human auditability, the canvas helps the agents do a better job by generating in parallel, exploring branches and passing context to each other in a structured way.

reply
stuckkeys
57 minutes ago
[-]
That is a bold claim for a wrapper lol
reply
mlnj
26 minutes ago
[-]
Elaborate?
reply