Here's a demo video: https://www.tella.tv/video/stage-demo-1pph. You can play around with some example PRs here: https://stagereview.app/explore.
Teams are moving faster than ever with AI these days, but more and more engineers are merging changes that they don't really understand. The bottleneck isn't writing code anymore, it's reviewing it.
We're two engineers who got frustrated with GitHub's UI for code review. As coding agents took off, we saw our PR backlog pile up faster than we could handle. Not only that, the PRs themselves were getting larger and harder to understand, and we found ourselves spending most of our time trying to build a mental model of what a PR was actually doing.
We built Stage to make reviewing a PR feel more like reading chapters of a book, not an unorganized set of paragraphs. We use it every day now, not just to review each other's code but also our own, and at this point we can't really imagine going back to the old GitHub UI.
What Stage does: when a PR is opened, Stage groups the changes into small, logical "chapters". These chapters get ordered in the way that makes most sense to read. For each chapter, Stage tells you what changed and specific things to double check. Once you review all the chapters, you're done reviewing the PR.
You can sign in to Stage with your GitHub account and everything is synced seamlessly (commenting, approving etc.) so it fits into the workflows you're already used to.
What we're not building: a code review bot like CodeRabbit or Greptile. These tools are great for catching bugs (and we use them ourselves!) but at the end of the day humans are responsible for what gets shipped. It's clear that reviewing code hasn't scaled the same way that writing did, and they (we!) need better tooling to keep up with the onslaught of AI generated code, which is only going to grow.
We've had a lot of fun building this and are excited to take it further. If you're like us and are also tired of using GitHub for reviewing PRs, we'd love for you to try it out and tell us what you think!
Your solution here seems to exclusively surface "what" changes, but it's impossible for me to know if it's right or not, unless I also see the "how" first and/or together with the change itself. So the same problem remains, except instead of reviewing in git/GitHub/gerrit + figure out the documents/resources that lays out the task itself, I still have to switch and confirm things between the two.
Currently on Stage we also generate a PR summary next to the chapters and that's where we want to do more "why" that pulls in context from Linear, etc.
And I know there's a lot of cool teams like Mesa and Entire working on embedding agent context into git history itself so that could an interesting area to explore as well
You cannot solve this problem by adding more AI on top. If lack of understanding is the problem, moving people even further away will only worsen the situation.
We don't think of Stage as moving people further away from code review, but rather using AI to guide human attention through the review process itself
AI guiding human attention means that humans aren't guiding human attention, which means less human understanding of their reviews.
If AI is good enough to explain what the change is and call out what to focus on in the review, then why isn't AI good enough to just do the review itself?
I understand that the goal of this is to ensure there's still a human in the review cycle, but the problem I see is that suggestions will quickly turn into todo lists. Devs will read the summary, look at the what to review section, and stop reviewing code outside of things called out in the what to focus on section. If that's true, it means customers need to be able to trust that the AI has enough context to generate accurate summaries and suggestions. If the AI is able to generate accurate summaries and suggestions, then why can't we trust it to just do the review itself?
I'm not saying that to shit on the product, because I do get the logic behind it, but I think that's a question you should have a prepared answer for since I feel like I can't be the only one thinking that.
I think our perspective is that: software design has always had a subjective element to it. There's never been a "right" way to design a system, there are always trade offs that have to be made that depend on things like business context etc.
To that extent, most engineers probably still want to be part of that decision making process and not just let agents make all the high level decisions, especially if they're responsible for the code that ultimately gets merged
Most of human review I see of AI code is rubber stamping at this point, the volume is too big for human to keep up. What used to take Developers a few days to do is taking a few hours so PR volume is higher and human reviewing can't keep up. At this point, human review seems like CYA then anything else, "Why yes SOC2 auditor, we review all PRs."
I'm also seeing a lot more outages as well but management is bouncing around all happy about feature velocity they are shipping so :shrug:
Much more interesting part is how exactly you map Context/Why/Verify to a product spec / acceptance criterions.
And I already posted how to do this. SCIP indexes from product spec -> ACs -> E2E tests -> Evidence Artifacts -> Review (approve/reject, reason) -> if all green then we make a commit that has #context + #why + #verify (I believe this is just points to e2e specs that belong to this AC)
Here's full schema: https://tinyurl.com/4p43v2t2 (-> https://mermaid.ai/live/edit)
What I'm trying to visualize is exactly where the cognitive bottleneck happens. So far I've identified three edges:
1. Spec <-> AC (User can shorten URL -> which ACs make this happen?)
2. AC <-> Plan (POST /urls/new must create new DB record and respond with 200) -> how exactly this code must look like?
3. Plan/Execute/Verify -> given this E2E test, how can I verify that test doing what AC assumes?
The cognitive bottleneck is when we transforming artifacts:
- Real world requirements (user want to use a browser) -> Spec (what exactly matters?)
- Spec -> AC (what exactly scenarios we are supporting?)
And you can see on every step we are "compressing" something ambiguous into something deterministic. That's exactly what is going on in Engineer's head. And so my tooling that I'm gonna release soon is targeted exactly to eliminate parts that can we spend most time on: "figuring out how this file connects to the Spec I have in my head, that I built from poorly described commit messages, outdated documents, Slack threads from 2016, and that guy who seemingly knowed everything before he left the company".
This argument reminds me of the HN Dropbox announcement top comment:
Isn't that what commits are for? I see no reason for adding this as an after-thought. If the committers (whether human or LLM) are well-behaved, this info is already available in the PR.
We thought git wasn't the right level of abstraction and decided to tackle things at the PR level instead. Curious to hear your experiences!
Sure, it is. But it's worth it, not just for code review, but for a myriad other things: bisect, blame, log, etc.
Your tool makes one thing (the code review) easier, while decreasing people's motivation to make well-mannered commits, thus making everything else (bisect etc) worse.
I'm sure it's net positive in some cases, and I think it's net negative in other cases.
Have you heard about `rebase -i` ?
The frick is a PR abstraction? Is this a GitHub PR abstraction where the commits are squashed and the PR description is whatever was hallucinated at 5 am? Yes, that’s certainly an abstraction, aka loss of information.
You either have the information stored in the version control database or you don’t. You can curate and digest information but once it’s lost it’s lost.
People layering stuff on top of Git or Subversion makes no sense. Your AI is not so dainty and weak that it cannot write a commit message. And if it can’t then you can recuperate the information that you trashed.
It is certainly possible to do topic-grouping in commits, but it requires significant effort to het that consistent on a team level.
It keeps a repository with markdown files as the agent context, makes those available (via a simple search and summarise MCP) and when closing a merge request it checks whether the context needs updating based on the review comments. If it needs updating a PR is opened on the context repository with suggested changes/additions.
So getting assistance in the review, in making the decisions and giving me more clarity feels interesting.
Maybe its people like me, who became involved into coding after the LLMs who might be your niche.
One thing I dont understand, the UI/UX? Is this visible only on git itself? Or can I get it working in Codex?
We've wondered about what the review experience should look like for newly technical or non-technical people now that they are increasingly putting up PRs themselves. These people will be less opinionated about certain technical decisions in general so maybe the future looks like review processes very personalized to your experience level and your background. Definitely a lot to think about
Right now the chapters UI is only available on our website but we're exploring possible integrations and/or a desktop app
and a code tour feature about to ship: https://x.com/backnotprop/status/2043759492744270027/video/1
- integrated comment feedback for agents
- inline chat
- integrated AI review (uses codex and claude code defaults)
Stage (op product) navigation tour is nice UX, about a day worth of work in addition to the incoming code tour.
Sort of related to that, we've been thinking a lot about the future of code review for OSS. Its clear with Cal.com going closed source that something needs to change. Would love to hear any thoughts you have
I personally see the value of code review but I promise you the most vocal vibe coders I work with don’t at all and really it feels like something that could be just automated to even me.
The age of someone gatekeeping the codebase and pushing their personal coding style foibles on the rest of the team via reviews doesn’t feels like something that will exist anymore if your ceo is big on vibe coding.
In our view, even vibe coders should understand how the codebase works, and we think review is a natural place to pause and make sure you know what you and your coworkers are shipping. And we should have tools to reduce the mental load as much as possible.
Do you think there's a problem of cognitive debt among your coworkers who aren't reading the code or reviewing PRs?
Do you see a world where it splits them up on the git level?
Can't you push back on that? I feel like this tool is trying to fix misbehaved colleagues...
But I see it working together with chapters, not instead of bc it's still good to see the granularity within a PR
https://sscarduzio.github.io/pr-war-stories/
Basically it’s distilling knowledge from pr reviews back into Bugbot fine tuning and CLAUDE.md
So the automatic review catches more, and code assistant produces more aligned code.
Do you find that this list of learnings that end up BUGBOT.md or LESSONS.md ever gets too long? Or does it do a good job of deduplicating redundant learnings?
The deduplication and generalisation steps really help, and the extra bugbot context ends up in just about 2000 tok.
Global LESSONS.md has less than 20 “pearls” with brief examples
In the ideal world, each PR is as small and self-contained as possible but we've noticed people struggling to justify the extra overhead every time.
Do you or your team use stacking in your workflows?
Looks inside.
Now that we are all eating Soylent it can get a little bland sometime. That’s why we are releasing our international, curated spice package for your Soylent...
Reconstituting messy things is exactly where LLMs can help.
You can regenerate the chapters anytime, but it might lead to similar results as the first time
I think one thing we've seen from early users that surprised us is how chapters was quickly becoming the unit of review for them as opposed to files - and they've asked us to add functionality to mark chapters as viewed and comment on them as a whole
Another big surprise: now that agents are the ones writing most (if not all) the code right now, we've found that a lot of early users are using Stage to not only review others PRs but also their own PRs, before they have others review it