Over the last year or so, my development speed relative to my own baseline from ~2019 is easily 20x, sometimes more. Not because I type faster, or because I cut corners, but because I changed how I use AI.
The short version: I don’t use AI inside my editor. I use two AIs in parallel, in the browser, with full context.
Here’s the setup.
I keep two tabs open:
One AI that acts as a “builder”. It gets a lot of context and does the heavy lifting.
One AI that acts as a reviewer. It only sees diffs and tries to find mistakes.
That’s it. No plugins, no special tooling. Just browser tabs and a terminal.
The important part is context. Instead of asking for snippets, I paste entire files or modules and explain the goal. I ask the AI to explain the approach first, including tradeoffs, before it writes code. That forces me to stay in control of architecture instead of accepting a blob I don’t understand.
A typical flow looks like this:
1. Paste several related files (often across languages).
2. Describe the change I want and ask for an explanation of options. Read and summarize concepts, wikipedia, etc.
3. Pick an approach. Have extensive conversations about trade-offs, concepts, adversarial security etc. Find ways to do things that the OS allows.
4. Let the AI implement it across all files.
5. Copy the diff into the second AI and ask it to look for regressions, missing arguments, or subtle breakage.
6. Fix whatever it finds.
Ship.
The second AI catches a lot of things I would otherwise miss when moving fast. Things like “you changed this call signature but didn’t update one caller” or “this default value subtly changed behavior”.
What surprised me is how much faster cross-stack work gets. Stuff that used to stall because it crossed boundaries (Swift → Obj-C → JS, or backend → frontend) becomes straightforward because the AI can reason across all of it at once.
I’m intentionally strict about “surgical edits”. I don’t let the AI rewrite files unless that’s explicitly the task. I ask for exact lines to add or change. That keeps diffs small and reviewable.
This is very different from autocomplete-style tools. Those are great for local edits, but they still keep you as the integrator across files. This approach flips that: you stay the architect and reviewer, the AI does the integration work, and a second AI sanity-checks it.
Costs me about $40/month total. The real cost is discipline: always providing context, always reviewing diffs, and never pasting code you don’t understand.
I’m sharing this because it’s been a genuine step-change for me, not a gimmick. Happy to answer questions about limits, failure modes, or where this breaks down.
Here is a wiki-type overview I put together for our developers on our team: https://community.intercoin.app/t/ai-assisted-development-playbook-how-we-ship-faster-without-breaking-things/2950
To be fair, most web/mobile frameworks expect you to do that.
Ideally, codebases would grow by adding data (e.g. a json describing endpoints, UIs, etc), not repetitive code.
The problem with this configuration based approach is that now the actual code that executes has to be able to change its functionality arbitrarily in response to new configuration, and the code (and configuration format) needs to be extremely abstracted and incomprehensible. In the real world, someone figures out that things get way easier if you just put a few programming language concepts into the configuration format, and now you're back where you started but with a much worse programming language (shoehorned into a configuration format) than you were using before.
Boilerplate may be cumbersome, but it effectively gives you a very large number of places to "hook into" the framework to make it do what you need. AI makes boilerplate much less painful to write.
Both worlds can be cleanly composed. For instance, for backend development, it's common to define an array (data) of middleware (code).
At a smaller scale, this is already a reality in the Clojure ecosystem - most sql is data (honeysql library), and most html is data (Hiccup library).
Be very careful with this approach. there are many ways it can go completely wrong. i've seen a codebase like this and it was a disaster to debug. because you can't set breakpoints in data. it was a disaster.
It may not look compact or elegant but I'd rather see debuggable and comprehensible boiler point even if it's repetitive rather than a mess
- Commodity work, such as CRUD, integrations, infra plumbing, standard patterns.
- Local novelty, i.e. a new feature for your product (but not new to the world).
- Frontier novelty, as in, genuinely new algorithms/research-like work.
The overwhelming majority of software development is in the first two categories. Even people who think they are doing new and groundbreaking stuff are almost certainly doing variations of things that have been done in other contexts.
https://github.com/Qbix/Platform-History-v1
https://github.com/Qbix/Platform-History-v2
And you can see the latest code here:
Documentation can be created a lot faster, including for normies:
https://community.qbix.com/t/membership-plans-and-discounts/...
My favorite part of AI is red-teaming and finding bugs. Just copypaste diffs and ask it for regressions. Press it over and over until it can't find any.
Here is a speedrun from a few days ago:
E.g. I have an aws agent that does all my devops for me. It isn’t doing anything insane but when I need to investigate something or make a terraform change, I send it off and go do something else. I do similar things 20-100 times a day now. Go write this script, do this report, update this documentation, etc.
I think if you are a high level swe it takes a lot of effort to get “better than doing it myself”. If you have a more generalist role, AI is easily a 10x productivity booster if you are knowledgeable about using it.
It's hard to tell where this 20-50x increase is.
Nobody's taken me up on this offer yet. [0]
I'd happily demonstrate this kind of workflow on my day job if not for company trade-secrets.
That's as legacy as it gets, 20+ year old code base with several "strata" of different technologies and approaches.
Claude Opus handily navigates around it, and produces working bug fixes with minimal guidance.
I'm not going to claim it's 20x or 50x yet, there's still navigation and babysitting involved, but it's definitely capable of working on complex problems, I just don't trust it to run it in YOLO mode.
The key thing is, you need domain knowledge. You need to know where to correct it, and which direction to point it in.
It's not magic, and it will have bad ideas. The key picking out the good ideas from the bad.
You mentioned "navigation and babysitting", could you share what that looks like in practice? Do you have to spend time reconstructing context or correcting Claude's misunderstandings? Do you still need to interrupt colleagues for some tacit knowledge, or has that changed?
I have passed through a phase of copy/pasting from the ChatGPT web to autocomplete tools, but the real feeling of "shit, this is going to really change how I code" came with Claude Code.
So, it would take roughly two months to complete a project at the scale of SQLite or TeX.
I imagine he would have a few choice words of the usual sort, but would he recognise that seemingly anyone can write their own OS like he did?
Personally I'm sceptical because few people can even fit a high-level overview of such a project in their heads, much less the entire codebase.
Citation: https://stackoverflow.com/a/6188624
I really appreciate that he is up-front about "Yes. Vibe coding has lots of dangerous problems that you must learn to control if you are to go whole-hog like this."
Has anyone read his Vibe Coding book? The Amazon reviews make it sound like it's heavy on inspiration but light on techniques.
It has lots of potential. Let me, though, have my own... doubts about it. Thanks for sharing anyway.
This was done in about 3 hours for instance: https://github.com/Qbix/Platform/tree/refactor/DbQuery/platf...
You can see the speed for yourself. Here is my first speedrun livestreamed: https://www.youtube.com/watch?v=Yg6UFyIPYNY
What's the rationale behind writing PHP as if it was JS? Unless I am mistaken it's like someone just did a transliteration from JS to PHP without even converting the JSDoc to PHPDoc.
And are there any tests for the code?
I think in 2026 the automation will reach the testing, closing the loop. At that point, no humans in the loop will make software development extremely fast.
Or just periodically throw it all away and start from scratch?
What if something becomes successful (has users) so that you can't just throw it away?
I can't speak for the OP but the worst software developer I ever worked with was myself from 1 year ago. Provided what "cruft" I'm generating meets my current code standards, it's unlikely to be any worse than anything else past me has done.
That said... I jumped to a few random moments in your video and had an "oh my god" reaction because you really were not kidding when you said that you were pasting code.
I'm pretty much begging you to install and use Cursor. Whatever boost you're getting from your current workflow, you will see triple through use of their Agent/Plan/Debug modes, especially when using Opus 4.5. I promise you: it's a before electricity vs after electricity scenario. I'm actually excited for you.
A lot of folks will tell you to use Claude Code. I personally find that it doesn't make sense for the sorts of projects I work on; I would 100% start with Cursor either way.
It’s not up to the skeptics to prove this tech doesn’t work, it’s up to the proponents to show it does and does so with a similar effect size as cigarettes cause lung cancer.
There are a tremendous amount of LLM productivity stans on HN but the plural of anecdote is not data.
Certainly these tools are useful, but the extent to which they are useful today is not nearly as open and shut as you and others would claim. I’d say that these tools make me 5% more productive on a code base I know well.
I’m totally open to opposing evidence that isn’t just anecdote
Not busting my quota is simply not my top priority. I'm on their $200/month plan and I have it locked to a $1000/month overage limit, though the most I've ever gone through using it every day, all day is about $700. That probably sounds like a lot if you're optimizing for a $20/month token budget, but it's budgeted for. That $10-12k/year is excellent value for the silly amount of functionality that I've been able to create.
Sonnet is a really good LLM, and you can build great things with it. However, if you're using this for serious work, IMO you probably want to use the most productive tools available.
Opus 4.1 was, to be real, punishingly expensive. It made me sweat. Thank goodness that Opus 4.5 is somehow both much better and much cheaper.
Edit, I see you answered this in another response, thanks.
I don't have any interest in yucking anyone's yum, but for me, I find working in and IDE to be vastly more productive than trying to remember dozens of vim and tmux shortcuts.
I haven't personally tried the CC extension because like you, I concluded that it sounds like a single-company Cursor with way fewer points of integration into the IDE.
I hate bikeshedding and rarely do I switch tooling unless something is demonstrably better; preferably 10x better. For me, the Cursor IDE experience is easily 10x better than copying and pasting from ChatGPT, which is why I created this thread in the first place.
Would you be willing to go into more detail about that claim?
The framing of your question as though I might possibly be hallucinating my own situation might be correlated to your lack of reply.
CC seems best suited to situations where one or both of the following are true:
- presence of CI infrastructure
- the ability for the agent to run/test outputs from the run loop
If you're primarily working on embedded hardware, human-in-the-loop is not optional. In real terms, I am the CI infrastructure.
Also, working on hardware means that I am often discussing the circuit with the LLM in a much more collaborative way than what most folks seem to do with CC requirements. There are MCP servers for KiCAD but they don't seem to do more than integrate with BOM management. The LLMs understand EE better than many engineers do, but they can only understand my circuit (and the decisions embedded in it) as well as I can explain/screencap it to them.
The SDK and tooling for the MCUs also just makes an IDE with extensions a much more ergonomic fit than trying to do everything through CLI command switches.
Do you ship 20x more PRs ? Did you solve 20x more bugs ? Did you add 20x more features ? Did you provide 20x more ARR, more values, etc...
I architect of it and go through many iterations. The machine makes mistakes, when I test I have to come back and work through the issues. I often correct the machine about stuff it doesn't know, or missed due to its training.
And ultimately I'm responsible for the code quality, I'm still in the loop all the time. But rather than writing everything by hand, following documentation and make a mistake, I have the machine do the code generation and edits for a lot of the code. There are still mistakes that need to be corrected until everything works, but the loop is a lot faster.
For example, I was able to port our MySQL adapter to PostGres AND Sqlite, something that I had been putting off for years, in about 3-5 hours total, including testing and bugfixes and massive refactoring. And it's still not in the main branch because there is more testing I want to have done before it's merged: https://github.com/Qbix/Platform/tree/refactor/DbQuery/platf...
Here is my first speedrun: https://www.youtube.com/watch?v=Yg6UFyIPYNY
You write the program as source code.
Prompting an LLM to cobble together lines from other people's work is not writing a program.
And yet, I don't see a problem with saying directors made their movies. Sure, it was the work of a lot of talented individuals contributing collectively to produce the final product, and most of those individuals probably contributed more physical "creation" to the film than the director did. But the director is a film maker. So I wouldn't be so confident asserting that someone who coordinates and architects an application by way of various automation tools isn't still a programmer or "writing software"
His language is LLM prompts. If he can check them into git and get reasonably consistent results if he ran the prompts multiple times, just like we expect from our JavaScript or C or assembly or machine code, I don't see the problem.
I knew a guy who could patch a running program by flipping switches on the front panel of a computer. He didn't argue my C language output 'is not writing a program'...
You're joking, right? There's nothing "reasonably consistent" about LLMs. You can input the same prompt with the same context, and get wildly different results every time. This is from a single prompt. The idea that you can get anything close to consistent results across a sequence of prompts is delusional.
You can try prompt "hacks" like STRONGLY EMPHASIZING correct behaviour (or threaten to murder kittens like in the old days), but the tool will eventually disregard an instruction, and then "apologize" profusely for it.
Comparing this to what a compiler does is absurd.[1]
Sometimes it feels like users of these tools are in entirely separate universes given the wildly different perspectives we have.
[1]: Spare me the examples of obscure compiler inconsistencies. These are leagues apart in every possible way.
Anyone experiencing this problem as well?
I try to think of what this would look like at my company and I can't even really conceive of what this dream scenario is supposed to be. We have maybe 6-10 legitimately revenue-earning products with a meaningful user base and it took about a decade to get there. There is no reasonable world in which you say we could have done that in 10 weeks instead, which would be roughly 1/50th the time. It takes at least that long typically just to get a contract completed once a prospective customer decides they even want to make a purchase. Writing code faster won't speed that process up. Can we push features 50x faster? No, we can't, because they come as a response to feature requests from users, which means we need to wait to have users who make such requests, and you can't just compress 10 years of that happening into 10 weeks. That's to say nothing of the fact that what we work on now is a response to market and ecosystem conditions now, not conditions as they were 10 years ago. If we'd pushed what we were doing now to having done it then instead, we'd have just been working on the wrong things.
Think about what it would mean to produce cars 50x faster than using current processes. What good would that even do? The current processes already produce all the cars the world needs. Making them 50x faster wouldn't give you a larger customer base. You'd just be making things no one needs and then throwing them away. The only sensible version of this is doing the same thing at roughly the same speed but at 1/50th the cost. I don't doubt that faster code generation can cut cost but not to 1/50th. Too much of the cost in creating and running a company has nothing at all to do with output.
Show us the financial statements from your company you started in 2019 and your company today. I would be absolutely thrilled to see somebody concretely show they earn the same revenue for 1/50the the cost, or 50x revenue for the same cost. The fact that you push 50x the number of commits or lines of code to Github means nothing to me.
I might start using a second LLM to review the diffs. Something like Gemini 3 Fast. Sounds good.
But I don't want to give up on a fancy IDE to use browser tabs.
So I think I will ask the second LLM to review the `git diff`.
Not to be a dick, but only one? I won't brag about how many I do have running, but it's more than one.
The reality is probably simpler: you've automated a good deal of busywork that you would never have done otherwise.
hope you are getting 50x more value
:) good luck you will need it :)
Surely we've reached peak hype now and it will start to get better? Surely...
Use my abstract factory factories and inversion of control containers. With Haskell your entire solution is just a 20-line mapreduce in a monad transformer stack over IO. In J, it's 20 characters.
I don't see how AI differs. Rather, the last study of significance found that devs were gaslighting themselves into believing they were more productive, when the data actually bore the opposite conclusion [0].