I think about this a lot; am keen to hear what others' perceptions are. For me; the short answer: about 2x (i.e. 100% faster than pre LLMs). Long answer:
When I thoroughly understand the domain (i.e. business logic and real world problem I'm solving), and am familiar with the tech stack, I'm about ~10x faster for the same or better code quality.
When I don't understand the domain, prompts will be ambiguous or inadequate, the LLM will guess, it will do a month's work in a day, but I'll spend the next 3 weeks refactoring and realising how trash the code was, due to how trash the prompt was. All in all, it's probably still faster than pre AI, but can give a demoralising psychological phenomena where you think something's nearly completed only to spend weeks debugging it, refactoring, and often tossing it away and starting over.
In an unfamiliar tech stack, I can't always spot obvious mistakes (mistakes caused by the AI or the prompt), so it's less productive and more risky.
10-15% of the productivity improvement is due to improvements in the dev environment. I open ~/dotfiles with cursor and tell it a problem I have or ask for a specific improvement. It usually modifies .zshrc, .vimrc or similar (and iterates as necessary if the first attempt didn't work). Due to how fast this is (e.g. 5 minutes), I've made about 20 little tweaks that previously didn't justify the time. They definitely make me happier as well as a bit more productive.
But overall, taking everything into account, I'd say I'm about 2x as productive as before LLMs.
For example, writing UI components maybe easier on some language than writing abstract algorithms, writing standardized solutions (i.e. text-book algorithms, features, etc) is easier than writing customized algorithms, etc.
Also, writing code can be very fast if you don't unit test it, especially for CURD apps. In fact most of my coding time was spent on writing unit tests.
I really hoped AI could write those tests for me to lock the specs of the design completely down. But currently it's the opposite, I have to write test for AI generated code. So, my over all experience can be described as the tyranny of reading other people's code times 10.
Rephrasing the question: By what percentage has AI changed your input quality?
Answer would be around -50%. This is attributed mostly to the vast amount of search results that are AI generated and provide very low density information and miss conveying actual key learning points. This means you have to scan through 100% more text to finally get the information you need to solve the issue. I think this is a low estimate actually.
I was very very keen on using this tech when it just emerged and was almost addicted to it.
The feeling I was gaining was akin to scrolling one of them feed-generating apps with cat videos, or eating fast food - quick and empty joy and excitement without lasting fulfilling effect.
After months, or maybe a year of using LLMs - I realized that neither I am faster in delivering the final desired quality of the product, nor am I satisfied with where I am professionally. I lose my human skills and I forget how to get joy out of my work (and I enjoy making them computers work). I noticed how I grew negligent to the results of my work, felt disconnected from and disconcerted by it, and it was alarming sign.
Anybody who worked anywhere for long enough knows that unhappy, robbed of joy people - produce worse results.
Said that, I realized loud and clear that writing code, developing systems, is something that I want to experience joy of personally, I want my brain struggle, sweat and click through most of the parts of it, and that parts of the work that I don't enjoy doing and can be shamelessly offloaded to LLM, are, in fact, quite minimal.
On top of it, while using the LLMs (and boy was I using all of it! Sub-agents, skills, tasks! Hour-long planning-prompting exercises!), I was still noticing that when it comes to "write code" tasks - LLMs were only better and faster than me in delivering quality work when the task at hand was not my primary skill, something that I'd be below average (in my case, web-development or app development, any front-end work). And, given, that I was employed to exercise my main skill rather than secondary skills, it's almost never the case that LLMs were boosting my productivity, they required long time of baby-sitting before I'd almost certainly give up and do all of the work myself.
Admittedly, LLMs are outstanding study partners that can give pointers to where to start and what structure to stick to when learning about new project, technology, problem domain, or generating flash cards based on materials one wants to study, and that's a use of LLM that I'd probably not give up. My speed of learning new things with some LLM assist is boosted greatly, so from this perspective, one can say that LLM makes me better developer after all.
If we talk about my whole team output, I'd say the impact on code production is like 80-100%, but the impact in velocity is between 10% and -25%. So many bug in production, security holes, so many poor models definition making it to production DB only for me and the other true senior to have to fix it.
We are seniors, we read your PRs and try our best to do it thoroughly, but with AI multiplying your code output, and writing quite convincing solutions, it's way harder, so please: if an AI have written the code in your PR, verify the tests are not superficial, verify everything works, think for yourself about what the model is used for and if it can be improved before release. Re-verify the tests (especially if the AI had issues writing/passing them). And do it once more. Please (hopefully one of my coworkers will read this).
I see this a lot with sloppy devs.
For boilerplate stuff like generating tests against a well-defined API, in a compiled language, maybe 2-3x. Far less in languages and frameworks like Ruby/Rails where the distance between generating the code and figuring out if it’s even valid or not is large.
Mechanical refactors that are hard to express via e.g. regex but easy in natural language: maybe 5x.
HTML and CSS, where I know exactly what I want and can clearly articulate it: 2-5x.
For anything architecture-y, off the beaten path, or where generating a substantial amount of code is required: near 0%. Often in the negatives.
What I really use LLMs for is to uncover what I should not do, and this is quite a strong win.
Overall, they are generally very bad at giving me anything useful for the things I build.
In the last year I’ve shipped a couple of small OSS tools that I almost certainly would not have finished without AI‑assisted “vibe coding”. Everything I build now flows through AI, but in a slightly different way than just chatting with an LLM. I rarely use standalone ChatGPT/Gemini/Claude; almost all of it happens inside GitHub with Copilot and agents wired into my repos.
The big shift was treating GitHub as the interface for almost all of my work, not just code. I have repos for things like hiring, application review, financial reviews, and other operational workflows. There are “master” repos with folders and sub‑folders, and each folder has specific context plus instructions that the AI agent should follow when operating in that scope, essentially global rules and sub‑rules for each area of work.
Because of that structure, AI speeds up more than just the typing of code. Idea → spec → implementation → iteration all compress into much tighter loops. Tasks that would have taken me weeks end‑to‑end are now usually a couple of days, sometimes hours. Subjectively that’s where the 10–20x feeling comes from, even though it’s hard to measure precisely.
On the team side we’ve largely replaced traditional stand‑ups with AI‑mediated updates. KPIs and goals live in these repos, and progress/achievements are logged and summarized via AI, which makes updates more quantitative and easier to search back through. It’s starting to feel less like “AI helps me with code” and more like “AI is the main operating system for how our team works.”
Happy to share more about the repo/folder structure or what has/hasn’t worked if anyone’s curious.
Can you expand on this? What was a traditional stand-up before and what does an AI-mediated one look like?
Have you? Are you making tons of money? Have you achieved 20x the amount than you have all previous years?
Take a step back and realize what you're claiming here.
I have worked for too long in the field, but this year and simply thanks to the LLMs I have actually managed to get 4 usable hobby projects done ( as far as I need them to go anyway - personal tools that I use and publish but do not actively maintain unless I need some new feature ), and I have been quite productive with stack I do not normally use at our startup. Most previous years I have finished 0-1 hobby projects.
The ramp up period for new stack was much less, and while I still write some code myself, most of it at least starts as LLM output which I review and adjust to what I really want. It is bit less intellectually satisfying but a lot more efficient way to work for me. And ultimately for work at least I care more about good enough results.
And yes, that doesn't scale to all problem domains or problem sizes, but in some areas even a 20x speedup would be a huge understatement.
Basically, it amounts to being able to give detailed instructions to a junior dev (who can type incredibly fast) and having them carry out your instructions.
If you don't know the code base, and thus can't provide detailed instructions, this junior dev can (using their incredible typing speed) quickly run off the rails. In this case, as you don't know the code base, you wouldn't know it's off the rails. So you're S.O.L.
They work faster, but more often make wrong assumptions without asking. Llms dont ask the stupid questions a junior might, but those questions are essential to getting it right.
But I still haven’t dialed exactly what is too complicated for the LLM to handle (and that goalpost seems to still be moving, but slower now). Because it is almost always very close, I often end up trying to fix the prompt a few times before giving up and just doing it from scratch myself. I think in total the productivity gain for me is probably a lot less than 100%, but more than 0%.
Need to integrate Stripe with the Clerk API in my Astro project? Claude's all over that. 300% faster. I think of it like, if there was a package that did exactly what I wanted, I'd use that package. There just happens not to be; but Claude excels at package-like code.
But as soon as I need to write any unique code – the code that makes my app my app – I find it's perhaps a touch faster in the moment, but the long-term result isn't faster.
Because now I don't understand my code, right? How could I. I didn't write it. So as soon as something goes wrong, or I want to add a feature, either I exacerbate this problem by getting Claude to do it, or I have to finally put in the work that I should have put in the first time.
Or I have to spend about the same amount of time creating a CLAUDE.md that I would have if I'd just figured out the code myself. Except now the thing I learned is how to tell a machine how to do something that I actually enjoy doing myself. So I never learn; on the contrary, I feel dumber. Which seems a bit weird.
And if I choose the lazy option here and keep deferring my knowledge to Claude, now I'm charging customers for a thing that I 'vibe coded'. And frankly if you're doing that I don't know how you sleep at night.
But to make unique solutions you will get pretty random results and worse you are not building understanding and domain knowledge of your program.
Claude Code sounds cool until it makes 3 changes at once 2 of which you are unsure if they are required or if they wont't break something else. I like it for scripts, data transformations and self contained small programs where i can easily verify correctness.
This, yes. What I do now is use Claude but expressly tell it do not edit my code, just show me, I want to learn. I'm not a very experienced dev so often it'll show me a pattern that I'm unfamiliar with.
I'll use that new knowledge, but then go and type out the code myself. This is slower, in the moment. But I am convinced that the long-term results are better (for me).
- LLMs are absolutely abysmal at PyTorch. They can basic MLP workflows, but that's it more or less. 0% efficiency gained.
- LLMs are great at short autocompletes, especially when the code is predictable. The typing itself is very efficient. Using vim-like shortcuts is now the slower way to write code.
- LLMs are great at writing snippets for tech I am not using that often. Formatting dates, authorizing GDrive, writing advanced regex, etc. I could do it manually, but I would have to check docs, now I can have it done in seconds.
- LLMs are great at writing boilerplate code, e.g. setting up argparse, printing the results in tables, etc. I think I am saving hours per month on these.
- Nowadays I often let LLMs build custom HTML visualization/annotation tools. This is something I would never do before due to time constraints, and the utility is crazy good. It allows my team to better understand the data we are working with.
When it comes to programming in languages and frameworks I'm familiar with, there is virtually no increase in terms of speed (I may use it for double checks), however, it may still help me discover concepts I didn't know.
When it comes to areas I'm not familiar with:
- most of the time, the increase is substantial, for example when I need a targeted knowledge (e.g. finding few APIs in giant libraries), or when I need to understand an existing solution - in some cases, I waste a lot of time, when the LLM hallucinates a solution that doesn't make sense - in some other cases, I do jobs that otherwise I wouldn't have done at all
I stress two aspects:
1. it's crucial IMO to treat LLMs as a learning tool before a productivity one, that is, to still learn from its output, rather than just call it a day once "it works"
2. days of later fixing can save hours of upfront checking. or the reverse, whatever one prefers :)
Personally I do not trust for a second self-reports anyways. They are bound to be wrong.
At work, many of my colleagues are too busy to collaborate and brainstorm with me, who enjoys it a lot and have a lot of energy for it.
They are classic 9to5 and that's fine, but I like software development and talking about it.
So instead of only doing it by myself, or tiring out my colleagues, I collaborate with AI.
Each situation has a different bottleneck, and it's almost never how fast you can write lines of code.
You would need: All engineers aligned on AI use. Invested in automating your: unit tests, integration tests, end to end tests / code quality controls / documentation quality controls / generated api docs / security scans / deployments / feature environments / well designed internal libraries / feature flags / reviews / everything infrastructure / have standards for all of these, so on and so forth. You need to lose your culture of meetings to fix miscommunication. You need to centralize product planning and communication. Stop having 100 different tools at the same time (Jira, Email, Confluence, Slack, Teams, GitHub, BitBucket, GitLab, Sharepoint, ...) where you keep snapshots of what you wanted to do, at some point in time. You need to have a high trust culture. You need to understand mistakes will happen more frequently. You probably don't have production incidents often, because you deploy once a month. You will go fast and the faster you go, even with a low failure rate mistakes happen more often, and you'll need to be prepared for that too. Unfortunately most organizations are missing 99% of the above, organizations like to have layers of communication scattered in all kinds of tools, because hey X tool fixes my problem, they need 2 hour meetings so everyone is aligned on where the button goes and whether the button has to be green or blue, and 10 engineers need to be present in the room too, so 20 engineering hours. Then they go to production once a month.
So if you have solved all that then the bottleneck becomes lines of code per minute, and you could rebuild most products in a few days.
For me personally, it ranges from 10x to 1x. On my own projects, and on projects where the development experience is really really great, easily 10x. We would never have brought that much live in these short timespans without AI assisted software development. In large businesses where 20 people need to stare at a Jira board to decide on the most basic things and give feedback through Confluence comments and emails.... Yeah the bottleneck is not how fast you can write lines of code.
I'd say on average, probably around the +100% mark, mostly as a lot of the work I'm doing currently are simpler tasks - "please add another button to do X". The backend logic is simple, and easy to security check.
Where I'm really not sure on productivity is when I get it to help generate tests that involve functionality across multiple domains of the application - possibly +0%.
I am working on this: https://github.com/ludos1978/ludos-vscode-markdown-kanban
For this kind of no expectations, for fun development I find AI makes it much easier to develop and test hypotheses. For other styles it's different, especially if the stakes are higher.
For example, I do not know rust but I've been using AI to make https://git.sr.ht/~kerrick/ratatui_ruby at a really rapid pace.
I'm not even entirely sure it's a net positive at this point but it feels like it.
I would say probably between 20% to 50% more productive.
Despite being a skeptic I'm somewhat intrigued by the idea of agents chipping away at problems and improving code, but I just can't imagine anyone using this for anything serious given how hard it fails at trivial stuff like this. Given that MS guy is talking big game about planning to rewrite significant parts of Windows in Rust using AI, and is not talking about having rewritten significant parts of Windows in Rust using AI, I remain skeptical of anyone saying AI is doing heavy lifting for them.
Some people find it useful, some people don’t, and unless what you’re using it for matches what they’re using it for (which you’re not asking) none of the results you get give you any insight into what you should expect for your use case.
Oh well, whatever. Here’s my $0.02; on a large code base that takes up to 30 minutes to do a local type check in typescript, the net benefit of AI is neutral or negative, because the agent can’t loop effectively and check it’s own results.
AI scaffolded results are largely irrelevant and don’t use internal design system components or tokens for UI and are generally useless.
Objectively measured ticket completion rates are not meaningfully impacted by the use of AI.
Out of date documentation leads agents to build incorrect solutions using outdated and depreciated techniques and services.
This is true across multiple tools and multiple models used, including sota.
1x
It is not more productive.
This reflects on my personal experience in the last 8 months of intense (and company mandated) AI usage at work.
At home, for small personal projects, I would say it’s closer to the 2x you describe, maybe as much as 3x for building and iterating on rich web UI using react.
for personal projects (more polyglot but rust, js, python, and random shell scripts are bigger and more important here) it's been more mixed to positive; and this is (i think?) in part because i have the luxury of writing off things I'm _not actually_ interested in doing. maintaining cmake files sucks, and the free tier of Cursor does a good enough job of it. I have a few small plugins/extensions for things like blender, and again, I don't know enough to do a good job there, and the benefit of making something extremely specific to what i need without actually knowing what's going on under the hood works fine: I can just verify the results, and that's good enough. but then, conversely, it's made it _wayyyy_ harder to pick and verify third party libraries for the things i do care about? I'll look something up and it'll either be 100% AI vibe coded and not good enough to sneeze at, or it'll be fine, but the documentation is 100% AI generated and likewise, I would rather just have the version of this library before AI ever existed.
more and more I'm convinced LLM agents are only fit for purpose for things that don't need to be good or consistent, but that there is actually a viable niche of things that don't need to be good that it can nicely slot into. That's still not worth $20/month to me, though. and it's absolutely ruining the online commons in a way that makes it hard to feel good about.
(my understanding of claude code is that it's a non-interactive agent, which is worse for what i have in mind. iteration and _changing my mind_ are a big part of my process, so even if I let the computer do its own thing for an hour and work on something else, that's less productive than spending even 10 minutes of focused time on the same thing.)
At work, also 2x to 4x.
The numbers would be even higher (4x to 8x) if I didn't spend half the time correcting slop and carefully steering the AI toward the desired solution when it gets sidetracked. But then again, I was also guilty of those things so maybe it's an even score?
Perhaps it's partly psychological in that using it forces me to think through the problems I'm trying to solve differently than before. Perhaps I'm just a mediocre dev and the AI is bringing me up to "slightly above average," but a win is a win and I'll take it.
- It sometimes generate decent examples of linux kernel api usage for modules, which saves a lot of time digging in the limited documentation. But most of the time it will mix deprecated a new version of the api and it will very likely have very suboptimal and/buggy choices.
- for embedded C, it won't be able to work within the constraints of a very specific project (e.g. using obscure or custom hal), but for generic C or rtos applications it can generate between decent and half decent suggestions.
- almost never seen it generate decent verilog
But business logic that needs to be trusted? Nooooo.
So very, very rarely 1000% faster. Most of the time 10% slower as I have to find and fix issues.
I'll still say it's a net improvement, but not by much. So let's say 10%.
Introductory questions about a widely used language with great documentation and tons of tutorials are made for LLMs.
With writing code for legacy code bases that don't have 3000 copies of the same tutorial in the first 3 pages of search results... they help less.
Note that most people claiming > 2x improvement are doing new code from scratch.
- Writing new code it's probably 3x or so[1].
- Writing automated tests for reproducible bugs, it's probably 2x or so.
- Fixing those bugs I try every so often but it still seems to be a net negative even for Opus 4.5, so call it 0.95x because I mostly just do it myself.
- Figuring out how to reproduce an undesired behavior that was observed in the wild in a controlled environment is still net negative - call it 0.8x because I keep being tempted by this siren song[2]
- Code review it's hard to say, I definitely am able to give _better_ reviews now than I was able to before, but I don't think I spend significantly less time on them. Call it 1.2x.
- Taking some high-level feature request and figuring which parts of the feature request already exist and are likely to work, which parts should be built, which parts we tried to build 5+ years ago and abandoned due to either issues with the implementation or issues with the idea that only became apparent after we observed actual users using it, and which parts are in tension with other parts of the system: net negative. 0.95x, just from trying again every so often.
- Writing new one-off utility tools for myself and my team: 10x-100x. LLMs are amazing. I can say "I want to see a Gantt chart style breakdown of when jobs in a gitlab pipeline start and finish each step of execution, here's the network log, here's a link to the gitlab api docs, write me a bookmarklet I can click on when I'm viewing a pipeline" and go get coffee and come back and have a bookmarklet[3].
Unfortunately for me, a significant fraction of my tasks are of the form "hey so this weird bug showed up in feature X, and the last employee to work on feature X left 6 years ago, can you figure out what's going on and fix it" or "we want to change Y functionality, what's the level of risk and effort".
-----
[1] This number would be higher, but pre-LLMs I invested quite a bit of effort into tooling to make repetitive boilerplate tasks faster, so that e.g. creating the skeleton of a unit or functional test for a module was 5 keystrokes. There's a large speedup in the tasks that are almost boilerplate, but not quite worth it for me to write my own tooling, counterbalanced by a significant slowdown if some but not all tasks had existing tooling that I have muscle memory for but the LLM agent doesn't.
[2] This feels like the sort of thing that the models should be good at. After all, if I fed in the observed behavior, the relevant logs, and the relevant files, even Sonnet 3.7 was capable of identifying the problem most of the time. The issue is that by the time I've figured out what happened at that level of detail, I usually already know what the issue was.
[3] Ok, it actually took a coffee break plus 3 rounds of debugging over about 30 minutes. Still, it's a very useful little tool and one I probably wouldn't have spent the time building in the before times.
I work on a code base that is easily over 1 million lines. It has had dozens of developers work on it over the last 15 to 20 years. Trying to follow the convention for the portion of the code base that I work on alone is a pain. I’ve been working on it for about seven years and I still had to ask questions.
So I would say that I work on a could base with a high level of drudgery. Having an all knowing AI companion has taken an awful lot of the stress out of every aspect.
Even the very best developer I’ve ever worked with can’t match me when I’m using AI to augment my work. For most development tasks, being the best of the best no longer matters. But in a strange way, you still need the exact same analytical mindset because now it’s all about prompts. And it definitely does not negate the need for a developer.
Writing your own code is essentially just an exercise in nostalgia at this point. Or someone who prefers to pick seeds out of cotton themselves, instead of using a cotton gin.
Or perhaps instead of using voice dictation to write this post, I would write a letter and mail it to Hacker News so that they can publish my comment to the site. That’s how backwards writing code is quickly becoming.
I've implemented a few trivial things that I wouldn't have done before if I'd had to type them up by myself.
But the big things that actually take time? Collecting requirements, troubleshooting, planning, design, architecture, research, reading docs? If having basic literacy is 100, then I'd say having LLMs adds about 0.1, if that.
They can be useful to one-shot code that you already know exactly what it should be. Because if you already have the mental model of what the code should be, you can read it x10 faster. The other useful thing is: complex multi-dimensional search. You can explain a process and it'll walk you through the code base. Useful if you already have knowledge of the code base and you need to refresh your memory.
In general, now, I'd consider LLMs extremely harmful. They can introduce very high-interest debt to your code base and quickly bring it to collections. The usage must be carefully considered and deliberated.
Of course this is undone by all the AI slop my boss passes by us now.
I mean if you everybody would be truly 10x more productive (and at the same output quality as before, obviously), this would mean that your company is now developing software 10 times faster as 2 years ago. And there's just no way that this is true across our industry. So something is off.
Caveats:
- Yes, I get that you can now vibe code a pet project on the weekend when you were just too tired before, but that's not really the question here, is it?
- Yes, if your whole job is just to write boilerplate code, and there's no collaboration with others involved, no business logic involved, etc., ok, maybe you are 10x more productive.