To try it, replace github.com with 0github.com in any pull-request URL. Under the hood, we split the PR into individual files, and for each file, we ask an LLM to annotate each line with a data structure that we parse into a colored heatmap.
Examples:
https://0github.com/manaflow-ai/cmux/pull/666
https://0github.com/stack-auth/stack-auth/pull/988
https://0github.com/tinygrad/tinygrad/pull/12995
https://0github.com/simonw/datasette/pull/2548
Notice how all the example links have a 0 prepended before github.com. This navigates you to our custom diff viewer where we handle the same URL path parameters as github.com. Darker yellows indicate that an area might require more investigation. Hover on the highlights to see the LLM's explanation. There's also a slider on the top left to adjust the "should review" threshold.
Repo (MIT license): https://github.com/manaflow-ai/cmux
cmux-agent requires access to your Github account:
    Verify your GitHub identity
    Know what resources you can access
    Act on your behalf
    View your email addresses
Just tested these example links in incognito and seemed to work?
https://0github.com/manaflow-ai/cmux/pull/666
https://0github.com/stack-auth/stack-auth/pull/988
https://0github.com/tinygrad/tinygrad/pull/12995
https://0github.com/simonw/datasette/pull/2548
> you've disabled logging issues on the repo
Sorry, wasn't aware. Turning it on right now. EDIT: https://github.com/manaflow-ai/cmux/issues seems to be fine?
To keep it short, GitHub has oauth App and "GitHub Apps". GitHub Apps are the new model and they can be installed to particular repos instead of having wide access to your account. GitHub recommends you use them. There is one catch however: GitHub did architecture these apps so that they can "act on the user behalf". Even if your app only asks for "an email address", they will still have that "permission" even though it is against nothing.
Thus, the scary popup. I've found the only solution to this is to "complicate" your flow. If you go to https://codeinput.com (my app), and click login with GitHub, you'll be taken to a less scarier popup that only asks for your email (it's an oauth app!). This, however, is at the expense of you having to do the "authenticate + install" dance again after you login! So I had to create an onboarding step, kind of to explain to the user the different steps he has to take.
But still, this is very creative and a nice application of LLMs that isn't strictly barf.
I loaded https://0github.com/laravel/framework/pull/57499. Completely random, it's a PR in the last github repo I had open.
At 60%, it highlights significantly more test code than the material changes that need review. Strike one.
At no threshold (0-100) does it highlight the deleted code in UniqueBroadcastEvent.php, which seems highly important to review. The maintainer even comments about the removal in the actual PR! Strike two.
The only line that gets highlighted at > 50% in the material code diffs is one that hasn't changed. Strike three.
So, honest attempt, but it didn't work out for me.
I’m not sure an LLM can really capture project-specific context yet from a single PR diff.
Honestly, a simple data-driven heatmap showing which parts of the code change most often or correlate with past bugs would probably give reviewers more trustworthy signals.
> I’m not sure an LLM can really capture project-specific context yet from a single PR diff.
We had an even more expensive approach that cloned the repo into a VM and prompted codex to explore the codebase and run code before returning the heatmap data structure. Decided against it for now due to latency and cost, but I think we'll revisit it to help the LLM get project context.
Distillation should help a bit with cost, but I haven't experimented enough to have a definitive answer. Excited to play around with it though!
> which parts of the code change most often or correlate with past bugs
I can think of a way to do the correlation that would require LLMs. Maybe I'm missing a simpler approach? But agree that conditioning on past bugs would be great
As for interactive reviews, one workflow I’ve found surprisingly useful is letting Claude Code simulate a conversation between two developers pair-programming through the PR. It’s not perfect, but in practice the dialogue and clarifying questions it generates often give me more insight than a single shot LLM summary. You might find it an interesting pattern to experiment with once you revisit the more context-aware approaches.
At first I thought this to but now I doubt that's a good heuristic. That's probably where people would be careful and/or look anyway. If I were to guess, regressions are less likely to occur in "hotspots".
But this is just a hunch. There are tons of well reviewed and bug reported open source projects, would be interesting if someone tested it.
I mean these tools are fine. But let's be on the same page that they can only address a sub-class of problems.
Very fun to see my own PR on Hacker News!
This looks great. I'm probably gonna keep the threshold set to 0%, so a bit more gradient variety could be nice. Red-yellow-green maybe?
Also, can I use this on AI-generated code before creating a PR somehow? I find myself spending a lot of time reviewing Codex and Claude Code edits in my IDE.
What form factor would make the most sense for you? Maybe a a cli command that renders the diff in cli or html?
a cli command with two options, console (color) and HTML opens all doors, right?
After we add the heatmap diff viewer into cmux, I expect that I'll be spending most of my time in between the heatmap diff and a browser preview: https://github.com/manaflow-ai/cmux/raw/main/docs/assets/cmu...
You likely will be able to keep it without trouble, but many corporate security systems would flag it.
For the most part, it seems to draw the eye to the general area where you need to look closer. It found a near-invisible typo in a coworker's PR which was kind of interesting as well.
https://0github.com/geldata/gel-rust/pull/530
It seems to flag _some_ deletions as needing attention, but I feel like a lot of them are ignored.
Is this using some sort of measure of distance between the expected token in this position vs the actual token?
EDIT: Oh, I guess it's just an LLM prompt? I would be interested to see an approach where the expected token vs actual token generates a heatmap.
> Is this using some sort of measure of distance between the expected token in this position vs the actual token?
The main implementation is in this file: https://github.com/manaflow-ai/cmux/blob/main/apps/www/lib/s...
EDIT: yeah it's just a LLM prompt haha
Just a simple prompt right now, but I think we could try an approach where we directly see which tokens might be hallucinated. Gonna try to find the paper for this idea. Might be kinda analogous to the "distance between the expected token in this position vs the actual token."
I think most reviewers do this to some degree by looking at points of interest. It'd be cool if this could look at your prior reviews and try to learn your style.
Is this the correct commit to look at? https://github.com/manaflow-ai/cmux/commit/661ea617d7b1fd392...
This file has most of the logic, the commit you linked to has a bunch of other experiments.
> look at your prior reviews and try to learn your style.
We're really interested in this direction too of maybe setting up a DSPy system to automatically fit reviews to your preferences
Another perspective where this exact feature would be useful is in security review.
For example - there are many static security analyzers that look for patterns, and they're useful when you break a clearly predefined rule that is well known.
However, there are situations that static tools miss, but a highlight tool like this could help bring a reviewer's eyes to a high risk "area". I.e. scrutinize this code more because it deals with user input information and there is the chance of SQL injection here, etc.
I think that would be very useful as well.
Now, how does any of my experience translate to building tools like cmux? I genuinely want to understand how.
Is the answer to go line by line of cmux code base or make an attempt to open a PR on one of the bugs issues on cmux and, by magic and time, I will eventually understand?
Then, you can point Claude Code to a file/a function/a few lines and ask follow-up questions.
After that, there are even more things to do. If you want a different perspective, you could try completely reimplementing the thing. My guess is that Claude will use Next.js. You can ask Claude not to do that but instead use a different UI framework/no framework combined with C#, if that's something you are interested in. If you want to actually learn all the details, you can start setting things up yourself and write the website. You can add features or try making the site scalable, under AI-assisted or vibe coding mode.
It will not produce the most elegant code or have the best architecture, but will be good enough for your purpose. I think it's the most efficient way to get some learning that is specifically suited to your needs in this age.
Besides, this is just a thin layer on an LLM, with questionable actual quality. Learn to do the real work, no magic machine can take learning and skill building off your shoulders.
Use Claude Code or Codex for everything, learn how to prompt well. >90% of cmux and 0github.com was written by LLMs. Most of it was just me asking the LLM to implement something, testing it to see if it works, and if it doesn't, I'll ask the LLM to write logs, and I'll paste the logs back to the LLM. Ask gpt-5-pro for architecture choices, like what tech/dependencies to use.
But if your goal is to learn React, I'd recommend going through the official getting started documentation, it's pretty good.
If you want to learn web apps start with the docs, eg. Official react docs or even just learning vanilla JavaScript if you don’t know it.
Start with little pieces like hitting the github API and displaying some json in the terminal
You could also just start prompting an llm to scaffold a project for you and then trying to debug whatever issues come up (and they will)
File `apps/client/electron/main/proxy-routing.ts` line 63
Adding a comment to explain why the downgrade is done would have resulted in not raising the issue?
Also two suggestions on the UI
- anchors on lines
- anchors on files and ability to copy a filename easily
> Adding a comment to explain why the downgrade is done would have resulted in not raising the issue?
Trying it out here with a new PR on same branch: https://0github.com/manaflow-ai/cmux/pull/809
Will check back on it later!
EDIT: seems like my comment online 62 got highlighted. Maybe we should surface the ability edit the prompt.
Thinking about it with the feedback, I'm not sure of what I would have liked to see actually.
First I was expecting no highlight once you added a comment explaining why.
But then, seeing the highlight, I'm thinking that a comment shouldn't a magical tool to allow doing crazy stuff.
I don't know anything about the Electron wrapper, so maybe it is actually possible to do HTTPS and someone could point out how to achieve this. And having the downgrade highlighted can help having this someone finding out.
I'll keep thinking about it! Thanks!