The problem was that context kept disappearing between tasks. With multiple Claude agents running in parallel, I’d lose track of specs, dependencies, and history. External PM tools didn’t help because syncing them with repos always created friction.
The solution was to treat GitHub Issues as the database. The "system" is ~50 bash scripts and markdown configs that:
- Brainstorm with you to create a markdown PRD, spins up an epic, and decomposes it into tasks and syncs them with GitHub issues - Track progress across parallel streams - Keep everything traceable back to the original spec - Run fast from the CLI (commands finish in seconds)
We’ve been using it internally for a few months and it’s cut our shipping time roughly in half. Repo: https://github.com/automazeio/ccpm
It’s still early and rough around the edges, but has worked well for us. I’d love feedback from others experimenting with GitHub-centric project management or AI-driven workflows.
89% less time lost to context switching
5-8 parallel tasks vs 1 previously
75% reduction in bug rates
3x faster feature delivery"
The rest of the README is llm-generated so I kinda suspect these numbers are hallucinated, aka lies. They also conflict somewhat with your "cut shipping time roughly in half" quote, which I'm more likely to trust.
Are there real numbers you can share with us? Looks like a genuinely interesting project!
Every epic gets its own branch. So if multiple developers are working on multiple epics, in most cases, merging back to the main branch will need to be done patiently by humans.
To be clear, I am not suggesting that this is a fix-all system; it is a framework that helped us a lot and should be treated just like any other tool or project management system.
Test runner sub agent knows exactly how to run tests, summarize failures etc. It loads up all the context specific to running tests and frees the main agent's context from all that. And so on...
How are people using auto-edits and these kind of higher-level abstraction?
When using agents like this, you only see a speedup because you’re offloading the time you’d spend thinking / understanding the code. If you can review code faster than you can write it, you’re cutting corners on your code reviews. Which is normally fine with humans (this is why we pay them), but not AI. Most people just code review for nitpicks anyways (rename a variable, add some white space, use map reduce instead of for each) instead of taking time to understand the change (you’ll be looking a lots of code and docs that aren’t present in the diff).
That is, unless you type really slowly - which I’ve recently discovered is actually a bottle neck for some professionals (slow typing, syntax issues, constantly checking docs, etc). I’ll add I experience this too when learning a new language and AI is immensely helpful.
Claude! Get off HN and get back to work.
I keep wondering why. All projects I ever saw need lines of code, nuts and bolts removed instead of added. My best libraries consist of a couple of thousand lines.
Of course, there are many many other kinds of development - when developing novel low-level systems for complicated requirements, you're going to get much poorer results from an LLM, because the project won't as neatly fit in to one of the "templates" that it has memorized, and the LLM's reasoning capabilities are not yet sophisticated enough to handle arbitrary novelty.
Great engineers who pick up vibe coding without adopting the ridiculous "it's AI so it can't be better than me" attitude are the ones who are able to turn into incredibly proficient people able to move mountains in very little time.
People stuck in the "AI can only produce garbage" mindset are unknowingly saying something about themselves. AI is mainly a reflection of how you use it. It's a tool, and learning how to use that tool proficiently is part of your job.
Of course, some people have the mistaken belief that by taking the worst examples of bullshit-coding and painting all vibe coders with that same brush, they'll delay the day they lose their job a a tiny bit more. I've seen many of those takes by now. They're all blind and they get upvoted by people who either haven't had the experience (or correct setup) yet, or they're in pure denial.
The secret? The secret is that just as before you had a large amount of "bad coders", now you also have a large amount of "bad vibe coders". I don't think it's news to anyone that most people tend to be bad or mediocre at their job. And there's this mistaken thinking that the AI is the one doing the work, so the user cannot be blamed… but yes they absolutely can. The prompting & the tooling set up around the use of that tool, knowing when to use it, the active review cycle, etc - all of it is also part of the work, and if you don't know how to do it, tough.
I think one of the best skills you can have today is to be really good at "glance-reviews" in order to be able to actively review code as it's being written by AI, and be able to interrupt it when it goes sideways. This is stuff non-technical people and juniors (and even mediors) cannot do. Readers who have been in tech for 10+ years and have the capacity to do that would do better to use it than to stuff their head in the sand pretending only bad code can come out of Claude or something.
You can. People do. It's not perfect at it yet, but there are success stories of this.
I mean, the parent even pointed out that it works for vibe coding and stuff you don't care about; ...but the 'You can't' refers to this question by the OP:
> I really need to approve every single edit and keep an eye on it at ALL TIMES, otherwise it goes haywire very very fast! How are people using auto-edits and these kind of higher-level abstraction?
No one I've spoken to is just sitting back writing tickets while agents do all the work. If it was that easy to be that successful, everyone would be doing it. Everyone would be talking about it.
To be absolutely clear, I'm not saying that you can't use agents to modify existing code. You can. I do; lots of people do. ...but that's using it like you see in all the demos and videos; at a code level, in an editor, while editing and working on the code yourself.
I'm specifically addressing the OPs question:
Can you use unsupervised agents, where you don't interact at a 'code' level, only at a high level abstraction level?
...and, I don't think you can. I don't believe anyone is doing this. I don't believe I've seen any real stories of people doing this successfully.
My view, after having gone all-in with Claude Code (almost only Opus) for the last four weeks, is ”no”. You really can’t. The review process needs to be diligent and all-encompassing and is, quite frankly, exhausting.
One improvement I have made to my process for this is to spin up a new Claude Code instance (or clear context) and ask for a code review based on the diff of all changes. My prompt for this is carefully structured. Some issues it identifies can be fixed with the agent, but others need my involvement. It doesn’t eliminate the need to review everything, but it does help focus some of my efforts.
It really depends on the area though. Some areas are simple for LLMs, others are quite difficult even if objectively simple.
Granted atm i'm not a big believer in vibe coding in general, but imo it requires quite a bit of knowledge to be hands off and not have it fall into wells of confusion.
For me, that's "just one", and that's why LLM coding doesn't scale very far for me with these tools.
if you have to understand the code to progress, it's regular fucking programming.
I don't go gushy about code generation when I use yasnippet or a vim macro, why should super autocomplete be different?
this is an important distinction because if Karpathy's version becomes real we're all out of a job, and I'm sick of hearing developers role play publicly towards leaders that their skills aren't valuable anymore
The question is how much do you review, and how much does your experience help it? Even if you didn't know code you're still going to review the app. Ideally incrementally or else you won't know what's working and what isn't. Reviewing the technical "decisions" from the LLM is just an incremental step towards reviewing every LOC. There's a large gulf between full reviews and no reviews.
Where in that gulf you decide to call it "vibe coding" is up to you. If you only consider it vibing if you never look at the code though, then most people don't vibe code imo.
I think of "vibe coding" as synonymous with "sloppy/lazy coding". Eg you're skipping details and "trusting" that the LLM is either correct or has enough guardrails to be correct in the impl. How many details you skip though is variable, imo.
Is that where the goalposts are now?
There are a lot of people who are entering programming via this thing.
In my experience, if you can't review the code and point out the LLM's mistakes to it, the codebase gets brittle fast. Maybe other people are better vibe coders than me, but I never managed to solve that problem, not even with Opus 4.1.
There is no magic way. It boils down to less strict inspection.
I try to maintain an overall direction and try to care less about the individual line of code.
You want to periodically have coverage improvement -> refactor loops after implementing a few features. You can figure out the refactors you want while the agent is implementing the code, after you've sussed out any test issues, then just queue up instructions on how to refactor once the tests are passing.
Essentially, I'm treating Claude Code as a very fast junior developer who needs to be spoon-fed with the architecture.
With that being said, a video will be coming very soon.
"We follow a strict 5-phase discipline" - So we're doing waterfall again? Does this seem appealing to anyone? The problem is you always get the requirements and spec wrong, and then AI slavishly delivers something that meets spec but doesn't meet the need.
What happens when you get to the end of your process and you are unhappy with the result? Do you throw it out and rewrite the requirements and start from scratch? Do you try to edit the requirements spec and implementation in a coordinated way? Do you throw out the spec and just vibe code? Do you just accept the bad output and try to build a new fix with a new set of requirements on top of it?
(Also the llm authored readme is hard to read for me. Everything is a bullet point or emoji and it is not structured in a way that makes it clear what it is. I didn't even know what a PRD meant until halfway through)
I think the big difference between this and waterfall is that waterfall talked about the execution phase before the testing phase, and we have moved past defining the entire system as a completed project before breaking ground. Nothing in defining a feature in documentation up front stops continuous learning and adaptation.
However, LLMs and code breaks the "Working software over comprehensive documentation" component of agile. It breaks because documentation now matters in a way it didn't when working with small teams.
However, it also breaks because writing comprehensive documentation is now cheaper in time than it was three years ago. The big problem now is maintaining that documentation. Nobody is doing a good job of that yet - at least that I've seen.
(Note: I think I have an idea here if there are others interested in tackling this problem.)
The waterfall we know was always a mistake. The downhill only flow we know and (don't) love was from someone at DOD who only glanced at the second diagram (Figure 2) in the original 1970 Royce paper and said "This makes sense, we'll do it!" and... we're doing waterfall.
So, go to the paper that started it all, but was arguing against it:
- https://www.praxisframework.org/files/royce1970.pdf
I encourage you to look at the final diagram in the paper and see some still controversial yet familiar good ideas:
- prototype first
- coding informs design
- design informs requirements
- iterate based on tests -> design -> requirements (~TDD)
Crucially, these arrows go backwards.See also the "Spiral Model" that attempts to illustrate this a different way: https://en.wikipedia.org/wiki/Spiral_model#/media/File:Spira...
Amazing that waterfall arguably spread from this paper, where it's actually an example of "what not to do."
Here's what Royce actually says about the waterfall diagram:
The implementation described above is risky and invites failure. … The testing phase which occurs at the end of the development cycle is the first event for which timing, storage, input/output transfers, etc., are experienced as distinguished from analyzed. These phenomena are not precisely analyzable. … Yet if these phenomena fail to satisfy the various external constraints, then invariably a major redesign is required. … The required design changes are likely to be so disruptive that the software requirements upon which the design is based and which provides the rationale for everything are violated. … One can expect up to a 100-percent overrun in schedule and/or costs.
This is 55 years ago.
That's not to say that you shouldn't anyway have good engineering practice, like short-lived branches and continuous integration. But you should be merging in branches on a schedule that is independent of sprints (and hopefully faster than the sprint length).
One of the benefits of using AI is that these processes, which I personally never followed in the pre-AI era, are now easy and frictionless to implement.
Especially discovering unknown unknowns that lead to changes in your original requirements. This often happens at each step of the process (e.g. when writing the PRD, when breaking down the tickets, when coding, when QAing, and when documenting for users).
That’s when the agent needs to stop and ask for feedback. I haven’t seen (any) agents do this well yet.
I was impressed that someone took it up to this level till I saw the tell tale signs of the AI generated content in the README. Now I have no faith that this is a system that was developed, iterated and tested to actually work and not just a prompt to an AI to dress up a more down to earth workflow like mine.
Evidence of results improvement using this system is needed.
Kidding aside, of course we used AI to build this tool and get it ready for the "public". This includes the README.
I will post a video here and on the repository over the weekend with an end-to-end tutorial on how the system works.
P.S: And it wasnt the em-dashes, its the general structure and the recognizable bullet points with emojis.
Hopefully, your GitHub tickets are large enough, such as covering one vertical scope, one cross-cutting function, or some reactive work such as bug fixing or troubleshooting.
The reason is that coding agents are good at decomposing work into small tasks/TODO lists. IMO, too many tickets on GitHub will interfere with this.
When we break down an epics into tasks, we get CC to analyze what can be run in parallel and use each issue as a conceptual grouping of smaller tasks, so multiple agents can work on the same issue in parallel.
The issues are relatively large, and depending on the feature, every epic has between 5 to 15 issues. When it's time to work on the issue, your local cloud code will break it down into minute tasks to carry out sequentially.
I've come across several projects that try to replicate agile/scrum/SAFe for agents, and I'm trying to understand the rationale. Since these frameworks largely address human coordination and communication challenges, I'm curious about the benefits of mapping them to AI systems. For instance, what advantages does separating developer and tester provide versus having unified agents that handle both functions?
I talked to and extremely strong engineer yesterday who is basically doing exactly this.
Would love to see a video/graphic of this in action.
I point Claude to my codebase, and Claude writes up a PRD that matches/reflects/describes the codebase. Then I iteratively (a) edit the PRD to reflect where I want my codebase to go, and (b) have Claude execute on it.
Huge rules systems, all-encompassing automations, etc all assume that more context is better, which is simply not the case given that "context rot" is a thing.
Are ppl really doing this? My brain gets overwhelmed if i have more than 2 or 3.
That being said, if a task requires editing three different files, I would launch three different sub-agents, each editing one file, cutting down implementation time by two-thirds.
I recently launched https://letsorder.app, https://github.com/brainless/letsorder.
100% of the product (2 web UI apps, 1 backend, 1 marketing site) was generated by LLMs, including deployment scripts. I follow a structured approach. My workflow is a mix of Claude Code, Gemini CLI, Qwen Code or other coding CLI tools with GitHub (issues, documentation, branches, worktrees, PRs, CI, CodeRabbit and other checks). I have recently started documenting my thoughts about user flow with voice and transcribe them. It has shown fantastic results.
Now I am building https://github.com/brainless/nocodo as the most ambitious project I have tried with LLMs (vibe coding). It runs the entire developer setup on a managed Linux server and gives you access through desktop and mobile apps. All self-hosted on your cloud accounts. It would basically be taking an idea to going live with full stack software.
Maybe the ordering flow does work, but how much traction are you going to really get without the demo actually doing what it's supposed to?
Not trying to be snarky - just trying to understand if people actually pay for mediocre or low-quality products like these
This is nothing to do with LLM generated. I work on about 4-5 projects at the moment, https://github.com/brainless. All of them are to test how far LLM driven development go. This, along with time to daily reach out to people, create posts, host lessons on vibe coding: https://lu.ma/user/brainless
I will get these bugs sorted when I get some time. Let's Order is not a commercial project, it is an exercise to show what a solo founder can get done these days with LLMs.
And also, I am not selling Let's Order. I am selling vibe coding, building a product around it, content and coaching - and yes I have customers for this.
I understand people getting mad. I am an engineer, 16 years of building products. I would be mad if I had a cushy job in the US which I felt is under threat from AI. I switched to vibe coding because I see the benefits for the rest of the world, which most engineers with cushy jobs never cared about. And there is money in this for someone like me, driving this solo from a life that is far away from VC funded startups.
- Brainstorm a PRD via guided prompts (prds/[name].md).
- Transform PRD into epics (epics/[epic-name]/epic.md).
- Decompose epic into tasks (epics/[epic-name]/[feature-name]/[task].md).
- Sync: push epics & tasks to GitHub Issues.
- Execute: Analyze which tasks can be run in parallel (different files, etc). Launch specialized agents per issue.