Yea, this happens to me too. Does it say something about the tool?
It's not like we are talking about luddites who refuse to adopt the technology, but rather a group who is very open to use it. And yet sometimes, we "forget".
I very rarely regret forgetting. I feel a combination of (a) it's good practice, I don't want my skills to wither and (b) I don't think the AI would've been that much faster, considering the cost of thinking the prompt and that I was probably in flow.
A lot of current AI tools are toys. Fun to play around, but as soon as you have some real world tasks, you just do it your usual way that get the job done.
Yes, much of the time and esp. for tests. I've been writing code for 35 years. It takes a while to break old habits!
Also if you like doing certain tasks, then it is like eating an ice cream vs telling someone to eat an ice cream.
I've seen this complaint a lot, and I honestly don't get it. I have a feeling it helps LLMs write better code. And removing comments can be done in the reading pass, somewhat forcing you to go through the code line by line and "accept" the code that way. In the grand scheme of things, if this were the only downside to using LLM-based coding agents, I think we've come a long way.
LLMs tend to write comments answering "what?", sometimes to a silly extent. What I found helping for using Claude 3.7 was to add this rule in cursor. The fake xml tag helped to decrease the number of times it strays from my instructions.
<mandatory_code_instruction>
YOU ARE FORBIDDEN FROM ADDING ANY COMMENTS OR DOCSTRINGS. The only code accepted will be self-documenting code.
</mandatory_code_instruction>
If there's a section of code where a comment answering "why?" is needed this rule doesn't seem to interfere when I explicitly ask it to add it there.
I tend to have to limit the code I share and ask more pointed / targeted questions in order to lead the AI to a non catastrophic result.
What you want is to ask for list of changes and then apply them. That's what aider, codex, etc. all do.
I made a tool to apply human-readable changes back to files, which you might find useful: https://github.com/asadm/vibemode
aider has this feature too.
>The product manager he sits next to has shipped 130 PRs in the last 12 months.
In a serious organization, non technical people should not be shipping any sort of code. They should be doing the highest leverage things possible to help the business, and if that seems to be coding, there are grave issues in the company.
It also gives investors more confidence to shower them with money when needed, as non-tech people are also doing AI coding and they are super agile!
When Msft CEO claims that 80% code written by AI, there is a 50% doubt, but when someone adds that, yeah so I have done 150 PRs, now it feels more concrete and real.
I wrote about this years before we started doing AI-backed-coding: https://ghiculescu.substack.com/p/opening-the-codebase-up-to..., so some of the details are no longer correct, but the philosophy is the same.
Just as you want developers building domain knowledge for all the benefits it brings, you want a role like product owner to be developing their understanding of delivering software.
Sometimes the highest leverage thing to be done is making teams of people work better together. One aspect of achieving that can be better understanding of what each role does.
Serious organisations sound awful.
I have a Claude console account, if you can call it that? It always takes me 3 times to get the correct email address because it does not work with passkeys or anything that lets me store credentials. I just added the api key to OpenWebUI. It’s nice and cheaper than a subscription for me even though I use it all day.
But I’m still confused. I just now clicked on “build with Claude”, it takes me to that page where I put in the wrong email address 3 times. And then you can buy credits.
Think of it as an LLM that automagically pulls in context from your working directory and can directly make changes to files.
So rather than pasting code and a prompt into ChatGPT and then copy and pasting the results back into your editor, you tell Claude what you want and it does it for you.
It’s a convenient, powerful, and expensive wrapper
My hesitation to adopt stems from the events where Claude.ai WebUI ignorantly breaks the code, but since I can visibly verify it - I iterate it until it seems reasonable syntactically & logically, and then paste it back.
With the autonomous changing of the code lines, I'm slightly nervous it would/could break too many parts concurrently -- hence my hesitation to use it. Any best practices would be insightful
How is your usage so low! Every time i do anything with claude code i spend couple of bucks, for a day of coding it's about $20. Is there a way to save on tokens on a mid-sized Python project or people are just using it less?
I use aider.chat with Claude 3.5 haiku / 3.7 sonnet, cram the context window, and my typical day is under $5.
One thing that can help for lengthy conversations is caching your prompts (which aider supports, but I'm sure Claude Code does, too?)
Obviously, Anthropic has an incentive to get people to use more tokens (i.e. by encouraging you to use tokens on "thinking"). It's one reason to prefer a vendor-neutral solution like aider.
A lot of the time (when it works) I think its easily worth the money, but I would quickly break their $100 a month budget
I'd be curious to hear more about this, whether from the author or from someone who does something similar. When the author says "background", does that literally mean JIRA tickets are being assigned to the agent, and it's spitting back full PRs? Is this setup practical?
(Also, your life will get better when you delete Jira!)
I want something simple that I have full control on, if not just to understand how they work. So I made a minimal coding agent (with edit capability) that is fully functional using only seven tools: read, write, diff, browse, command, ask, and think.
As an example, I can just disable `ask` tool to have it easily go full autonomous on certain tasks. Or, ask it to `think` for refactoring.
Have a look at https://github.com/aperoc/toolkami to see if it might be useful for you.
This is why I don’t touch the shit—it’s fucking snake oil.
Good devops practices make AI coding easier!
Good devops practices make coding easier!
We’ve started more aggressively linting our code because a) it makes the ai better and b) we made the ai do the tedious work of fixing our existing lint violations.
Now I only wish for an Product Manager model that can render the code and provide feedback on the UI issues. Using Cursor and Gemini, we were able to get a impressively polished UI, but it needed a lot of guidance.
> I haven’t yet come across an agent that can write beautiful code.
Yes, the AI don't mind hundreds of lines of if statements, as long as it works it's happy. It's another thing that needs several rounds of feedback and adjustments to make it human-friendly. I guess you could argue that human-friendly code is soon a thing of the past, so maybe there's no point fixing that part.
I think improving the feedback loops and reducing the frequency of "obvious" issues would do a lot to increase the one-shot quality and raise the productivity gains even further.
When you let them pump out code without any intervention, there becomes a point where they start introducing bugs faster than they get fixed and things don't get better
fwiw we interviewd the Claude Code team (https://www.latent.space/p/claude-code) and they said that even within Anthropic (where Claude is free, we got into this a bit), the usage is $6/day so about $200/month. not bad! especially because it goes down when you under-use.
Since writing this a tangentially related thing we've added, is a github action that runs on any PR that includes a (Rails) database migration, and reviews it, comparing it to our docs for how to write good migrations.
Claude helped write the action so it was super easy to set up.
It works particularly well for migrations because all the context is in the PR. We haven't had as much luck with reviewing general PRs where the reason for a change being good or bad could be outside the diff, and where there aren't as clearly defined rules for what should be avoided.
For Claude specifically, people who take more time to write long detailed prompts tend to get much better outcomes. Including me, since I made the effort to get better at prompt writing.
They also have a mode (--watch-files) that allows you to talk to a running aider instance from inside vim, but I haven't used it much yet.
I cant go back to a regular IDE after being able to tab my way through most boilerplate changes, but anytime I have Cursor do something relatively complex it generates a bunch of stuff I don't want. If I use Claude chat the barrier of manually auditing anything that gets copied over stays in place.
I also have pretty low faith in a fully useful version of Cursor anytime soon.
I built an open source CLI coding agent that is essentially this[1]. It combines Claude/Gemini/OpenAI models in a single agent, using the best/most cost effective model for different steps in the workflow and different context sizes. The models are configurable so you can try out different combinations.
It uses OpenRouter for the API layer to simplify use of APIs from multiple providers, though I'm also working on direct integration of model provider API keys.
It doesn't have a Neovim plugin, but I'd imagine it would be one of the easier IDEs to integrate with given that it's also terminal-based. I will look into it—also would be happy to accept a PR if someone wants to take a crack at it.
o3 in codex is pretty close sometimes. I prefer to use it for planning/review but it far exceeds my expectations (and sometimes my own abilities) quite regularly.
>The product manager he sits next to has shipped 130 PRs in the last 12 months.
this is actually horrifying, lol. i haven't even considered product guys going ham on the codebase
Noncoders are about to learn about the code maintenance cycle
But we're unlocking:
A) more dev capacity by having non-devs do simple tasks
B) a much tighter feedback loop between "designer wants a thing" and "thing exists in product"
C) more time for devs like me to focus on deeper, more involved work
Presumably an llm can actually maintain better contextual awareness of code and variables than, say cold loaded syntax highlights.
llms can be better than most humans at applescript
just need a fuck ton in context