FilterHN

codyb

2 days ago

[-]

I'm very confused.

In the picture right at the top of the article, the top of the bell curve is using 8 agents in parallel, and yada yada yada.

And then they go on to talk about how they're using 9 agents in parallel at a cost of 1000 dollars a month for a 300k line (personal?) project?

I dunno, this just feels like as much effort as actually learning how to write the code yourself and then just doing it, except, at the end... all you have is skills for tuning models that constantly change under you.

And it costs you 1000 dollars a month for this experience?

woeirua

2 days ago

[-]

300k lines of AIslop that probably would've been 20k lines of code from a human. And the human has no ability to maintain it, or to reason about it.

2 days ago

[-]

every one of the snarky comments like this on myriad of HN threads like this:

1. assumes most humans write good code (or even better than LLMs)

2. will stick around to maintain it

after 30 years in the industry and last 10 as consultant I can tell you fairly definitively that #1 cannot be further from the truth and #2 is frequent cause of consultants getting gigs, no one understand what “Joe” did with this :)

hbrn

1 day ago

[-]

1. Humans are capable of writing good code. Most won't, but at least it's possible. If your company needs good code to survive, would you take 5% chance or 0% chance?

2. Even when humans write crappy code, they typically can maintain it.

1 day ago

[-]

> 0% chance?

by now I have shipped over a million lines of code written by LLMs (many others have as well) so 0% maybe in the hands of my 12-year old

> …they can typically maintain it

who exactly is “they”?

imtringued

12 hours ago

[-]

I don't know why anyone would be proud of producing as large a maintenance burden as possible.

3 hours ago

[-]

larger maintenance burden as opposed to same LOC written by Joe from Nebraska?

dataflow

1 day ago

[-]

This sounds like a wild take. So what about those trying LLM code, then deciding it isn't good enough, and going back and writing it from scratch themselves, with what they perceive to be better results? They're just wrong and the LLM was just as good?

3 hours ago

[-]

this is same argument as “we tried to use Redis, it sucked so are using MS Access now”

you either learn and know how to use tools as SWE professional that you or you don’t…

mattmanser

1 day ago

[-]

I think it's hard as he's done a huge article about coding with agents, with no code examples.

You can go look at his GitHub but it's a bewildering array of projects. I've had a bit of a poke around at a few of the seemingly more recent ones. Bit odd though as in one he's gone heavy on TS classes and another heavy on functions. Might be he was just contributing to one as it was under a different account.

And a lot of them seem to be tools that wrap a lot of cli tools. There is a ton of scaffolding code, to handle a ton of cli options. A LOT of logger stmts, one file I randomly opened was a logger stmt every other line.

So it's hard to judge, I found it hard to wade through the code as it's basically just a bunch of option handling for tool calls. It didn't really do much. But necessary, probably?

Just very different code than I need to write.

And there are some weird tells that make it hard to believe.

For example, he talks about refactoring for useEffect in React but I KNOW GPT5 is really rubbish at it.

Some code it's given me recently was littered with useEffect and useMemo when it wasn't needed. Then when challenged it got rid of some, then changed other stuff to useEffect when again, it wasnt needed.

And then got all confused and basically blew it's top.

Yet this person says he can just chuck a basic prompt at his codex cli, running GPT5 and it magically refractors the bad useEffects?

How are we getting such different results?

1 day ago

[-]

OP: If you give the llm examples like https://react.dev/learn/you-might-not-need-an-effect, it does a farily good job at refactoring useEffecs.

And yes refactoring sometimes re-introduces these, so it's not a perfect solution.

mattmanser

1 day ago

[-]

The irony here is that I did!

Having looked at the code a bit more, all I can say is that it's a lot of code to do little.

There's also a lot of naive error throwing going on.

And he seems to debug using logger stmts.

They're not scalable projects, you couldn't write enterprise software the way those projects are written. You would end up with such a volume of code.

1 day ago

[-]

Personally, my experience with codex is same as yours, no way I would ever use codex for TS projects and especially not React. I don't know this mate personally but if we were talking about this over beer I would probably tell you (after 3rd one when I am more open to being direct) that I think I trust this blog as much as I trust President (this one or previous ones) to tell the truth :)

My comment was more geared towards an insane amount of comments on myriad of "AI" / "agent coding" posts where soooooo many people will write "oh, such AI slop" assuming that average SWE would write it better. I don't know many things but working with these tools heavily over the last year or so (and really heavily last 6 months) I'll take their output over general average SWE every day of the week and twice on Sunday (provided that I am driving the code generation myself, not general AI generated code...)

1 day ago

[-]

(OP) the current projec is closed source. If you look at my cli tools, that's pure slop, all I care is that it works, so reviewing that code for sure will show some weird stuff. Does it matter? It's a tool to fetch logs form a server. I run it locally. As long as is does that reliably, idk about the code.

shinryuu

1 day ago

[-]

What does your current project do? Do you make money with it?

CGMthrowaway

2 days ago

[-]

I'm picturing COBOL developers in the 80s saying the same thing about modern developers today (without even bringing in AI).

eaglelamp

2 days ago

[-]

Higher level abstractions are built on rational foundations, that is the distinction. I may not understand byte code generated by a compiler, but I could research the compiler and understand how it is generated. No matter how much I study a language model I will never understand how it chose to generate any particular output.

GoatInGrey

2 days ago

[-]

COBOL developers may have claimed that higher-level language developers didn't understand what was happening under the hood. However, they never suggested those developers couldn't understand the high-level code itself (what's going on here)—only what lay beneath it.

bitwize

2 days ago

[-]

But COBOL developers resisted modern tooling. A coworker of mine tells the story of when he was working alongside an old mainframe hand more than 25 years ago, and was trying to explain to him how modern IDEs work. The mainframe guy gave him a disdainful look and said "That ain't how computing is done, kid."

Now what the guys above the programmers' paygrade knew was that the aim of software development wasn't really code, it was value delivered to customer. If 300k lines if AI slop deliver that value quickly, they can be worth much more than the 20k lines of beautiful human-written code.

asdff

1 day ago

[-]

COBOL guy has a point. Novel writers didn't get any better when they switched from the typewriter to ms word.

hitarpetar

1 day ago

[-]

I would wager that using a fully fledged IDE has limited to no correlation with developer productivity across the entire industry

codyb

1 day ago

[-]

I'd suspect folk with a terminal first approach probably have much stronger understandings of what is going on under the hood which makes approaching new repositories a lot easier if nothing else.

Alternatively, maybe folk who're exposed to more codebases are the best off.

Or combine the two for wizard.

bitwize

1 day ago

[-]

By "modern IDE" I meant something like Turbo Pascal, as compared with the (at best) ISPF-based editor the mainframe guy was using. This took place in the early 90s.

https://www.ee.torontomu.ca/~elf/hack/realmen.html

fragmede

1 day ago

[-]

ljm

2 days ago

[-]

This is like having a normal distribution where the average is a 1x engineer, a standard deviation below is a 0x engineer, and a standard deviation above is a 10x engineer. Which would make more sense because someone running 9 agents with multiple git checkouts and what-not is just managing a team of synthetic vibe coders.

nf17

2 days ago

[-]

Looks like many are questioning the credibility of the author as a dev/engineer. Peter is the founder of PDFKit and well respected name in iOS circles for his contribution to .framework and ideas about building modular iOS apps. He may be overselling here, but pretty sure he can manage AI generated code

ruszki

11 hours ago

[-]

Your PDFKit is PSPDFKit.

Also I don’t think that there are more than a handful of people in the world who can properly manage a 1 million LOC codebase regardless of source. Even when you remove the ton of useless comments which his code has.

Also the first file which I checked: https://github.com/amantus-ai/llm-codes/blob/main/src/lib/__...

This is not how dates should be tested at all. Timestamp and stack are ignored with errors with parents. If you want really nice code then have one expect per test case. Useless comments. Type guards testing is a joke.

And these classes are very simple ones.

So no, he cannot manage this code. Especially because tests are basically the most important part of good LLM generated code, and LLMs will lie with them all the time.

barrkel

2 days ago

[-]

It all sounds somewhat impressive (300k lines written and maintained by AI) but it's hard to judge how well the experience transfers without seeing the code and understanding the feature set.

For example, I have some code which is a series of integrations with APIs and some data entry and web UI controls. AI does a great job, it's all pretty shallow. The more known the APIs, the better able AI is to fly through that stuff.

I have other code which is well factored and a single class does a single thing and AI can make changes just fine.

I have another chunk of code, a query language, with a tokenizer, parser, syntax tree, some optimizations, and it eventually constructs SQL. Making changes requires a lot of thought from multiple angles and I could not safely give a vague prompt and expect good results. Common patterns need to fall into optimized paths, and new constructs need consideration about how they're going to perform, and how their syntax is going to interact with other syntax. You need awareness not just of the language but also the schema and how the database optimizes based on the data distribution. AI can tinker around the edges but I can't trust it to make any interesting changes.

ljm

2 days ago

[-]

AI agents to me seem maximalist by default, and if it takes 250 lines to meet a requirement they would choose that over the much simpler one-line change.

In an existing codebase this is easily solved by making your prompt more specific, possibly so specific you are just describing the actual changes to make. Even then I find myself asking for refinements that simplify the approach, with suggestions for what I know would work better. The only reason I'm not writing the change myself is because the AI agent is running a TDD-style red/green/refactor loop and can say stuff like "this is an integration test, don't use mocks and prefer to use relevant rspec matchers in favour of asserting on internal object state" and it will fix every test in the diff.

In a brand new codebase I don't have a baseline any more and it's just an AI adding more and more into a ball of mud. I'm doing this with NixOS and the only thing keeping it sane is that each generated file is quite small and simple (owing to the language being declarative). Yet still, I have zero idea if I can even deploy it yet as a result.

dingnuts

1 day ago

[-]

I stopped using AI assistance when I realized every time the agent made a change, I had to go behind and simplify it by about 80%. It's easier, faster, more fun, and produces a better end product if I just go ahead and do it myself.

If I am feeling lazy I can have one of the chats give me their shit solution to the micro problem at hand and extract the line I need and integrate it properly. This is usually a little faster than reading the manual, but it's wrong often enough that I usually read the manual for everything the bot does, to make sure. And so next time I can skip asking it.

Someone wake me up and tell me to try these tools again when the flow isn't prompting and deleting and repeat.

inb4 someone tells me there's a learning curve for this human language product that is supposed to make it so I'm obsolete and my CEO can do my job because it makes coding so easy that even an experienced coder has to climb a steep learning curve but there's going to be a white collar blood bath also

fucking pick a narrative, AI shills

timr

2 days ago

[-]

On the contrary, this doesn’t sound impressive at all. It sounds like a cowboy coder working on relatively small projects.

300k LOC is not particularly large, and this person’s writing and thinking (and stated workflow) is so scattered that I’m basically 100% certain that it’s a mess. I’m using all of the same models, the same tools, etc., and (importantly) reading all of the code, and I have 0% faith in any of these models to operate autonomously. Also, my opinion on the quality of GPT-5 vs Claude vs other models is wildly different.

There’s a huge disconnect between my own experience and what this person claims to be doing, and I strongly suspect that the difference is that I’m paying attention and routinely disgusted by what I see.

sarchertech

2 days ago

[-]

300k especially isn’t impressive if it should have been 10k.

timr

2 days ago

[-]

Yes, well put. And that’s a common failure mode.

2 days ago

[-]

I would guess that roughly 0.000087% devs on the planet do it in 10k (if it is possible) and 37.76% would do it in 876k so 300k is probably right in some middle :)

CuriouslyC

2 days ago

[-]

To be fair, codebases are bimodal, and 300k is large for the smaller part of the distribution. Large enterprise codebases tend to be monorepos, have a ton of generated code and a lot of duplicated functionality for different environments, so the 10-100 million line claims need to be taken with a grain of salt, a lot of the sub projects in them are well below 300k even if you pull in defs.

myko

2 days ago

[-]

I'm fairly skeptical of the LLM craze but I deeply respect Peter Steinberger's work over the years, he truly is a gifted software developer in his own right. I'm sure his personal expertise helps him guide these tools better than many could.

1 day ago

[-]

(OP) 1/3rd of the code is tests.

There's an Expo app, two Tauri apps, a cli, a chrome extension. The admin part to help debug and test features is EXTREMELY detailed and around 40k LOC alone.

To give some perspective to that number.

timr

23 hours ago

[-]

Yeah, I read the post. Telling me that there's a chrome extension and some apps tells me nothing. Saying that the code is 1/3 tests is...something, but it's not exceptional, by any means.

I've got an code base I've been writing from scratch with LLMs, its of equivalent LOC and testing ratio, and my experiences trusting the models couldn't be more different. They routinely emit hot garbage.

srameshc

2 days ago

[-]

Everytime I read something like this, I question myself, what I am doing wrong ? And I tried all kinds of AI tools. But I am not even close to claiming that AI writes 50% of my code. My work which sometimes include feature enhancements and maintenance is where I get even less. I have to be extremely careful and make sure nothing unwanted or addition that I am unaware of has been added. Maybe it's me and I am not good yet to get to 100% AI code generation.

troupo

2 days ago

[-]

You're not doing anything wrong. You have to read past hyperbole.

Here's how the article starts: "Agentic engineering has become so good that it now writes pretty much 100% of my code. And yet I see so many folks trying to solve issues and generating these elaborated charades instead of getting sh*t done."

Here's how it continues:

- I run between 3-8 in parallel

- My agents do git atomic commits, I iterated a lot on the agents file: https://gist.github.com/steipete/d3b9db3fa8eb1d1a692b7656217...

- I currently have 4 OpenAI subs and 1 Anthropic sub, so my overall costs are around 1k/month for basically unlimited tokens.

- My current approach is usually that I start a discussion with codex, I paste in some websites, some ideas, ask it to read code, and we flesh out a new feature together.

- If you do a bigger refactor, codex often stops with a mid-work reply. Queue up continue messages if you wanna go away and just see it done

- When things get hard, prompting and adding some trigger words like “take your time” “comprehensive” “read all code that could be related” “create possible hypothesis” makes codex solve even the trickiest problems.

- My Agent file is currently ~800 lines long and feels like a collection of organizational scar tissue. I didn’t write it, codex did.

It's the same magical incantations and elaborated charades as everyone does. The "the no-bs Way of Agentic Engineering" is full of bs and has nothing concrete except a single link to a bunch of incantations for agents. No idea what his actual "website + tauri app + mobile app" is that he build 100% with AI, but depending on actual functionality, after burning $1000 a month on tokens you may actually have a fully functioning app in React + Typescript with little human supervision.

1 day ago

[-]

(OP) You know if I link to a half-finished project, people would take it apart as many don't understand the nuance between crap and simply not done yet. But if you follow me on twitter it'll take you a few minutes to figure out. I'm two months in, even with AI, shipping good stuff takes time.

troupo

1 day ago

[-]

Having scrolled through several pages of your complaining about idiots on HN or discussing a yet another AI tool, I guess this is it: https://sweetistics.com/ ? Something you couldn't link in the article for some reason?

I've scrolled bit more. I think in the past 50-100 tweets you only wrote thee talking about this, one of them proudly showing a mistake (invalid tweets containing the same text): https://x.com/steipete/status/1978229441802162548

So, I have to follow you on twitter and sift through garbage indistinguishable from all such "look how great is codex" and "this is my shamanic ritual that works I promise" to maybe see something you work on.

No thank you. I will make my judgement from the long-form article you posted.

And, as I said: depending on actual functionality, after burning $1000 a month on tokens you may actually have a fully functioning app in React + Typescript with little human supervision. I might do the same for anything Twitter-related because I couldn't be arsed to work with Twitter or Twitter APIs.

1718627440

2 days ago

[-]

> $1000 a month

Yeah at this point you could hire a software developer.

TheMrZZ

2 days ago

[-]

I don't know how much SWE get paid in your area, but I sure hope it's not 1000$/month.

Though I'm aligned that I don't (yet) believe in this "AI writes all my code for me" statements.

1718627440

2 days ago

[-]

It includes that with AI you still need someone to work. First to query the AI and then to fix up something and to bring it in a form you can release and use.

stocksinsmocks

2 days ago

[-]

$5.75/hr is well below outsourced rates. It’s $1.40/hr if the agent runs without stopping. If I hired a human consultant for a project of any size, I could easily spend $10,000 or more on just scoping and contract approval. Humans don’t win on cost.

1718627440

2 days ago

[-]

Right now they still need someone typing prompts and verifying them. When they do what you intend it means that is no longer more work to handhold them than doing it yourself, but it is still work.

fhennig

2 days ago

[-]

I'm in the same boat, I still find the model to make mistakes or solve things in a less than ideal way - maybe the future is to just not care - but for now I want to maintain the level of quality that the codebase currently has.

I think it's good to keep up with what early adopters are doing, but I'm not too fussed about missing something. The plugins is a good example: A few weeks ago there was a post on HN where someone said they are using 18 or 25 or whatever plugins and it's the future, now this person says they are using none. I'm still waiting for the dust to settle, I'm not in a rush.

CuriouslyC

2 days ago

[-]

The person using 25 plugins is giving bad advice. The agent isn't going to need all those tools at once, and each tool burns context and causes tool confusion. Enable MCPs for the specific task you're going to have your agent do.

The trick is to create deterministic hurdles the LLM has to jump over. Tests, linting, benchmarks, etc. You can even do this with diff size to enforce simpler code, tell an agent to develop a feature and keep the character count of the diff below some threshold, and it'll iterate on pruning the solution.

tptacek

2 days ago

[-]

Same! Half would be a lot for me. I'm also not close to the point where I'm comfortable merging LLM-authored PRs without line-by-line reviews.

darkwater

1 day ago

[-]

Didn't you write a blog post a few months ago saying that you had agents preparing PRs for you while AFK doing things IRL? Your outlook back then seemed pretty optimistic, while this comment now seems way less so. Did something change for you or had I misunderstood your post back then?

tptacek

1 day ago

[-]

I still have agents (Sketch.dev mostly) produce PRs offline for me! I'm very optimistic. This is the second most important thing to have happened in my career (#1: the Internet; #3: mobile; #4: not writing everything in C). Nothing has changed. But yeah, I still line-by-line audit everything the agent spits out, and I still take the wheel myself about half the time. If everything stopped right here and no further progress was made on any of this technology and my workflow remained the same, this would remain the second most important thing.

pessimizer

2 days ago

[-]

> I have to be extremely careful and make sure nothing unwanted or addition that I am unaware of has been added.

I've started getting desperate to the point of saying 1) "never. never, ever add or remove features without consulting me first and getting approval." Then eventually, 2) appended to the previous "The last rule is the most important rule, because you keep doing it and I need you to stop doing it." Then finally 3), "THE LAST RULE IS THE MOST IMPORTANT RULE, BECAUSE YOU KEEP DOING IT AND I NEED YOU TO STOP DOING IT."

3/4 of my AI bugs are the AI making changes to the functionality of the code when I'm not looking, or repeatedly reinserting bugs that had been previously removed. The most valuable thing I'm getting it to do is to refactor the code it already wrote into shorter well-named functions (during which it still inevitably adds and removes behavior), because it means that I can just debug by hand and stop demanding over and over again that it not ignore what I said.

But, of course, it's not ignoring me, it's not thinking at all. Trying to look for the magic words to keep it from ignoring me and lying about it is just an illusion of control. The thing that will knock it off it's dumb track is likely just a lucky random seed during the 13th attempt. Then, like a sports fan, I add the lucky underwear to my instructions.

edit: the "I'll get AI to write the AI prompt so it will be perfect" stuff is so much voodoo. LLMs have no special insight into what will make LLMs work correctly. I probably should have stopped that last sentence after the word "insight." Feed them a sample prompt that you say doesn't work, and it will explain to you exactly why it's so bad, and could never work. Feed them the same prompt and ask why it works so well, and it will tell you how perfectly crafted it is and why. Then it will offer to tell you how it could be improved.

Kim_Bruning

1 day ago

[-]

Hrrrm, do you write unit tests to check for the desired behaviour? Or does it 'optimize' those away too? %-/

1 day ago

[-]

AI circumvents guardrails yes.

bigblind

2 days ago

[-]

I'd love to be able to watch people work, who say that they're sucessful with these tools. If there are any devs live streaming software development on Twitch, or just making screen casts (without too many cuts) of how they use these tools in day-to-day work, I'd love to see it.

simonw

2 days ago

[-]

Armin Ronacher has done some of those: https://www.youtube.com/@ArminRonacher

maqnius

2 days ago

[-]

I skimmed through one of the videos and it reminded me of how I just had a week of mainly reviewing other's code and supporting their work.

When I finally had the occasion to code myself, I felt so much better and less stressed at the end of the day.

My point is: what I just saw is hopefully not my future.

I sometimes read the opinion, that those who like the programming part of software engineering, don't like „agentic engineering“ and vica versa. But can we really assume that Armin Ronacher doesn't like programming?

simonw

2 days ago

[-]

It's easy to find counter-examples to that idea that people who like working with coding agents don't enjoy programming. I'm one of those people - I'm enjoying myself so much getting agents to build stuff for me, and I've enjoyed the craft of programming for 25+ years. I'm doing what I did before, just faster and with less time spent on the frustrating, repetitive bits.

2 days ago

[-]

Same here. I like debating the architecture, API, schema, algorithms, data structures, and user experience. Once all that is done, I hand off the implementation.

2 days ago

[-]

sooooo much this. I am in the same 25+ (almost 30) and more and more thinking that there is something there that us “veterans” dig this so much. perhaps it is the discipline we have built over the years that is being applied religiously with agents…

lysecret

1 day ago

[-]

+1 for him he got me into agentic coding earlier this year.

jasondjk

1 day ago

[-]

I recorded a bunch of videos like this for Ruby on Rails. This one is generically relevant about operating Claude hands free with —dangerously-skip-permissions and importantly sandboxing it in a separate user account for security: https://insidertrades.directory/built-with-rails/claude-code...

https://steipete.me/posts/2025/live-coding-session-building-...

jrk

2 days ago

[-]

If you go just a few posts back in Peter's own blog he has a video of himself doing exactly this:

He has posted others over the past few months, but they don't seem to be on his blog currently.

As @simonw mentions in a peer comment, Armin Ronacher also has several great streams (and he's less caffeinated and frenetic than Peter :)

mitjam

1 day ago

[-]

I've watched quite a few and got many good ideas, but it's taking a lot of time that's sometimes better spent with "deliberate practice". I learned more and faster by attending courses by people I already follow and love taking advice from. Earning money with it gives them the time needed to structure and prepare the material.

For example, I'm currently taking the "Elite AI Assisted Coding" (https://maven.com/kentro/context-engineering-for-coding) course by Eleanor Berger and Isaac Flath and learned a lot from their concise presentations and demos and challenging homework assignments which certainly took a long time to prepare.

FitchApps

2 days ago

[-]

My 2c - AI-first is awesome for rapid prototyping / POC but beyond that the devs should own the project and use AI sparingly. I'm not saying AI agents aren't capable, I'm saying that your skills will shift from problem solving/coding to managing AI and whatever code it produces

sarchertech

2 days ago

[-]

I’m not making any psychiatric diagnoses based on GitHub repos or YouTube videos.

But. Sometimes when I see someone talking about cranking out hundreds of thousands of lines of vibe coded apps, I go watch their YouTube videos, or checkout their dozens of unconnected, half finished repos.

Every single time I get a serious manic vibe.

cruffle_duffle

2 days ago

[-]

The overlap between crypto hustlers and AI hustlers is pretty interesting. Not strictly “hustle both” type overlap but it’s a similar type of energy. Bully the non-believers and hype the hype regardless of reality.

I dunno. People say these tools trigger the gambling part of your brain. I think there is a lot of merit to that. When these tools work (which they absolutely do) it’s incredible and your brain gets a nice hit of dopamine but holy cow can these tools fail. But if you just keep pulling that lever, keep adding “the right” context and keep casting the right “spells” the AI will perform its magic again and you’ll get your next fix. Just keep at it. Eventually you’ll get it.

Surely somebody somewhere is doing brain imagery when using these tools. I wouldn’t be surprised to see the same parts of the brain light up as when you play something like Candy Crush. Dig deep into the sunk cost fallacy, pepper with an illusion of control and that glorious “I’m on a roll” feeling (how many agents did this dude have active at once?) and boom…

I mean read the post. The dude spends $1000/mo plugging tokens into a grid of 8 parallel agents. They have a term for this in the gaming industry. It’s a whale.

handfuloflight

1 day ago

[-]

You're describing the classic developer dopamine loop, just faster now.

Spinning up test after test, tweaking parameters, chasing that "it works!" high, that's what debugging has always been.

You're doing the exact same thing with your code that you're criticizing him for doing with AI. Same sunk cost fallacy ("I've already spent 3 hours, might as well get it working"), same illusion of control, same "I'm on a roll" feeling when the tests finally pass.

The only difference is speed. He gets micro-hits every 10 seconds watching tokens stream. You get them every time you re-run your test suite. Same gambling structure, same reward circuit lighting up, you've just normalized yours because it happened slowly enough to not look like a slot machine.

And you're the one reducing it to "gambling" unless you're claiming human developers experience zero dopamine and write code with omniscient correctness the first time. If they don't, if there's iteration, failure, reward, then you're describing the same neurochemistry. You've just decided it only counts as "gambling" when it makes you uncomfortable.

1 day ago

[-]

Counter argument: one is actually capable of reasoning, the other is predicting the next token and brute forcing until checks pass.

hitarpetar

1 day ago

[-]

running and slots are both addictive, so they must be equally bad for you right?

rightbyte

1 day ago

[-]

> pepper with an illusion of control

Did you lose an 'r' or did you mean the ghost glass thing?

Terr_

2 days ago

[-]

Layer on the parasocial angle and it gets even worse: You're gambling and you have a "friend."

darkwater

1 day ago

[-]

In this post they mentioned "blast radius" and said "I didn't invent the term but I like it though". Maybe it's just because I'm very well used to the term in our industry, but that sentence felt so off for me. Cringe vibes.

stocksinsmocks

2 days ago

[-]

Well, to be fair there were a lot of guys doing exactly this sort of thing except they were writing their hobby projects by hand. I don’t take any technical blogs about someone’s secret sauce seriously at all. Programmer blogs are marketing pieces.

1 day ago

[-]

Marketing for what? I didn't even link to what I'm building because I wanna ship it when it's ready.

squirrel

2 days ago

[-]

I have to imagine that like pair programming, this multi-AI approach would be significantly more tiring than one-window, one-programmer coding. Do you have to force yourself to take breaks to keep up your stamina?

XenophileJKO

2 days ago

[-]

Well I don't use as many instances as they do, but using codex for example will take some time researching the code. It mostly just becomes a practice so that I don't have to wait for the model, I just context switch to a different one. It probably helps that I am ADHD, I don't pay much of a cost to ping pong around.

So I might tell one to look back in the git history to when something was removed and add it back into a class. So it will figure out what commit added it, what removed it, and then add the code back in.

While that terminal is doing that, on another I can kick off another agent to make some fixes for something else that I need to knock out in another project.

I just ping pong back to the first window to look at the code and tell it to add a new unit test for the new possible state inside the class it modified and I'm done.

I may also periodically while working have a question about a best practice or something that I'll kick off in browser and leave it running to read later.

This is not draining, and I keep a flow because I'm not sitting and waiting on something, they are waiting on me to context switch back.

1 day ago

[-]

Its bad because basic stuff like “please commit these changes” can sometimes take 10+ minutes at times and causes it to spiral doing tangential stuff.

By running them in parallel you avoid sitting there watching paint dry for a task that takes 3 seconds by hand.

Its really not comparable to a junior, its more comparable to a salty maliciously compliant optimized to burn tokens and deceive you.

throw-10-13

2 days ago

[-]

The human context window is the limiting factor.

That and testing/reviewing the insane amounts of ai slop this method generates.

csar

2 days ago

[-]

If you're getting AI slop you're doing it wrong. You should be getting high quality code. Of course that's easier said than done, but AI slop is a sign that things have gone off the rails.

segfaultex

1 day ago

[-]

I have scarcely gotten decent code. The best a model has spat out is 'fine', which is ok for menial tasks.

I have yet to see anyone show me an AI generated project that I'd be willing to put into production.

IDK, I feel like 'vibe coders' or people who heavily rely on LLM's have allowed their skills (if they ever existed) to atrophy such that they're generally not great at assessing the output from models.

throw-10-13

1 day ago

[-]

Spare me the koolaid.

cheevly

1 day ago

[-]

95% of HN comments nowadays amount to: "I have no idea how to wield AI, but I'm going to talk authoritatively on the subject and use fallacies to downplay the claims of others."

1 day ago

[-]

More like “i tried what others claim extensively and it does not work for me, please let me know if im doing something wrong” — to which the response is often yours, reframing the observation as a fallacy.

crazygringo

21 hours ago

[-]

Can you point me to some of those comments?

I genuinely haven't seen them.

I see many people insisting it didn't work when they tried it for some little thing, therefore it's broken and useless. And a few people saying, actually it works really well if you're willing to learn how to use it.

I'm not sure I've ever seen someone here saying it hasn't worked but they're open to learning how to use it right. It's definitely not common.

2 days ago

[-]

I wonder how many lines of code he generates and reviews a day. With five subscriptions I do not think it is possible to read it all. You can generate more code than you can read with just one subscription.

CuriouslyC

2 days ago

[-]

Use of the bell curve for this meme considered harmful.

If you're going to use AI like that, it's not a clear win over writing the code yourself (unless you're a mid programmer). The whole point of AI is to automate shit, but you've planted a flag on the minimal level of automation you're comfortable with and proclaimed a pareto frontier that doesn't exist.

ljm

2 days ago

[-]

Meta feedback: there are so many external links in this post (the majority being to Twitter) that it really feels like the audience is just...people who follow this guy on Twitter. There must be about 25 separate links to one-liner tweets.

Surely one of those 9 parallel AI agents could add something like footnotes with context?

GOD_Over_Djinn

1 day ago

[-]

I was hoping that there would be some discussion of what exactly the project is. The author says they have 300k loc so far. That’s a considerable codebase. What kind of react app could this possibly be? I would love to know.

> This post is 100% organic and hand-written. I love AI, I also recognize that some things are just better done the old-fashioned way.

I’m curious why they feel that writing a blog post with a particular tone and writing style is more complex than writing what is apparently a truly massive and complex app

N_Lens

2 days ago

[-]

I'm currently satisfied with Claude Code, but this article seems to sing the praises of Codex. I am dubious Whether it's actually superior or it's 'organic marketing' by OAI (Given that they undoubtedly do this, and other shady practices).

I'll give codex a try later to compare.

dinkleberg

2 days ago

[-]

I’ve found codex to be a much more thorough code reviewer.

Recently I’ve been using Claude for code gen and codex for review.

I keep trying to use Gemini as it is so fast, but it is far inferior in every other way in my experience.

2 days ago

[-]

Codex is about as capable as Sonnet, but slower. One advantage is that it more readily pushes back against requests, like the article noted.

lysecret

1 day ago

[-]

This is very funny to me I rarely found this pushback to be useful and mostly very annoying. Claude code has the opposite problem sometimes though. To some extend this ends up being a pure preference of character. You want ur ai coder to be disagreeable or agreeable. Maybe this is a good way to differentiate.

1 day ago

[-]

If it does not push back, how will you know you have a bad idea; when you face the consequences?

1 day ago

[-]

By not relying on AI to think for you.

Kim_Bruning

1 day ago

[-]

I get the idea that different AI have different characters, and different people can either 'get along with them' or not.

To wit, I have absolutely no problems with claude code, but anytime I try to do anything useful with chatgpt it turns into (effectively) a shouting match; There's just no way that particular AI and I can see eye-to-eye. (there's underlying procedural reasons I think)

The author of this piece has the exact opposite experience. Apparently they hate Claude with a passion, but love ChatGPT. Weird!

1 day ago

[-]

I find claude models often use “tricks” like bash one liners, essentially excelling at surgical fixes. It does what i want more reliably, on smaller tasks.

GPT-5 can often be better at larger architectural changes, but i find that comes at the cost of instability/broken PRs. It often fails to capture intent or argues back, or just completely spirals out of control more often.

GPT-5 codex seemed to refuse valid requests like “make a change to break a test so we can test CI” (it over indexed on our agents.md and other instructions and then refused on the basis of “ethics” or some such)

lysecret

1 day ago

[-]

This is a fascinating aspect. I also find much more usage out of Claude code and its willingness to go along with my ideas and be “steerable” seems to be part of it with codex many times it kept on overwriting changes I told it to make. It’s a clear kind of disagreeableness trait in action. Makes me think!

FitchApps

2 days ago

[-]

Feels like with 3 agents coding non-stop you're no longer a coder but rather a manager of sorts...is it even possible to code / fix things by yourself in an environment such as this?

1 day ago

[-]

You essentially need additional agents to implement the guardrails traditionally used to scale teams.

mherrmann

2 days ago

[-]

I use Claude Code every day and find that it still requires a lot of hand-holding. Maybe codex is better. But just in my last session today, Claude wrote 100 lines of test code that could have been 20, and 30 lines of production code that could have been 5. I'm glad I do not have to maintain 300 kloc of 100% AI-generated code. But at the end of the day, what counts is velocity and quality, and it seems OP is happy. The tools certainly are useful.

bionhoward

1 day ago

[-]

Is using these terminal agents with customer noncompete and no privacy questionable when cursor has the same models and privacy mode?

philipp-gayret

2 days ago

[-]

> But Claude Code now has Plugins

> Do you hear that noise in the distance? It’s me sigh-ing. (...) Yes, maintaining good documents for specific tasks is a good idea. I keep a big list of useful docs in a docs folder as markdown.

I'm not that familiar with Claude Code Plugins, but it looks like it allows integrations with Hooks, which is a lot more powerful than just giving more context. Context is one thing, but Hooks let you codify guardrails. For example where I work we have a setup for Claude Code that guides it through common processes, like how to work with Terraform, Git or manage dependencies and the whitelisting or recommendation towards dependencies. You can't guarantee this just by slapping on more context. With Hooks you can both auto-approve or auto-deny _and_ give back guidance when doing so, for me this is a killer feature of Claude Code that lets it act more intelligently without having to rely on it following context or polluting the context window.

Cursor recently added a feature much like Claude Code's hooks, I hope to see it in Codex too.

hansmayer

2 days ago

[-]

So, just trying to understand this - he admits to code being slop and in the same sentences states that agents (which created the slop in the first place) also refactor it? Where is the logic in that?

fhd2

2 days ago

[-]

I feel in this context, refactoring has lost its meaning a bit. Sure, it's often used analogous to changes that don't affect semantics. But originally, the idea was that you make a quick change to solve the problem / test the idea, and then spend some time on properly integrating the changes in the existing system.

LLMs struggle with simplicity in my experience, so they struggle with the first step. They also lack the sort of intelligence required to understand (let alone evolve) the system's design, so they will struggle with the second step as well.

So maybe what's meant here is not refactoring in the original meaning, but rather "cleanup". You can do it in the original way with LLMs, but that means you'll have to be incredibly micro manage-y, in my experience. Any sort of vibe coding doesn't lead to anything I'd call refactoring.

ImaCake

2 days ago

[-]

> LLMs struggle with simplicity in my experience

I think a lot of this is because people (and thus LLMs) use verbosity as a signal for effort. It's a very bad signal, especially for software, but its a very popular signal. Most writing is much longer than it needs to be, everything from SEO website recipes, consulting reports, and non-fiction books. Both the author and the readers are often fooled into thinking lots of words are good.

It's probably hard to train that out of an LLM, especially if they see how that verbosity impressess the people making the purchasing decisions.

Terr_

2 days ago

[-]

> I think a lot of this is because people (and thus LLMs) use verbosity as a signal for effort.

It's also one of the main use-cases for non-programmer use of the models, so there are business-forces against toning it down. Ex: "Make a funny birthday letter for my sister Suzie who's turning 50."

cwyers

2 days ago

[-]

LLMs are good at pursuing objectives, but they aren't necessarily good at juggling competing objectives at once. So you can picture doing the following, for instance:

- "Here is a spec for an API endpoint. Implement this spec."

- "Using these tools, refactor the codebase. Make sure that you are passing all tests from (dead code checker, cyclomatic complexity checker, etc.)"

The clankers are very good at iteratively moving towards a defined objective (it's how they were post-trained), so you can get them to do basically anything you can define an objective for, as long as you can chunk it up in a way that it fits in their usable context window.

sebstefan

2 days ago

[-]

"So, just trying to understand this - he admits to code being buggy and in the same sentences states that he (the engineer who created the bugs in the first place) also debugs it? Where is the logic in that?"

hansmayer

2 days ago

[-]

Are you seriously comparing outputs of human intelligence to a text generator ?

sebstefan

2 days ago

[-]

No I'm highlighting that the logic itself is stupid

If the point is that you can't solve with AI what you messed up with AI, but with human intelligence spending a bit more time on the problem does indeed tend to help, you need to explain why his technique with the AI won't work either.

Plus he's adding human input to it every time, so I see no reason to default to "it wouldn't work".

hansmayer

2 days ago

[-]

Well you said it yourself, you are literally comparing human intelligence with the so-called AI, or better said, advanced text generator. The differentiator being, the text generators have 0 intelligence, otherwise there would not be a flood of AI gurus explaining the latest trick to making them finally work.

grim_io

2 days ago

[-]

He, who does not produce slop in the first iteration, cast the first stone.

pqdbr

2 days ago

[-]

One more article praising Codex CLI over Claude Code. Decided to give it a try this morning.

A simple task that would have taken literally no more than 2 minutes in Claude Code is, as of now, 9m+ and still "inspecting specific directory", with an ever increasing list of read files, not a single line of code written.

I might be holding it wrong.

kendallchuang

2 days ago

[-]

What I understand is Codex takes more time to gather context to make the relevant changes; with more context it may give a more precise response than Claude Code.

pqdbr

2 days ago

[-]

With Claude Code, I can 'tune' the prompt by feeling how much context the model needs to perform it's task. I can mention more files or tell it to read more code as needed.

With one hour of experience of Codex CLI, every single prompt - even the most simple ones - are 5+ minutes of investigation before anything gets done. Unbearable and totally unnecessary.

2 days ago

[-]

It has the same context. It does not take that long to ingest a bunch of files. OpenAI is just not offering the same level of performance, probably due to oversubscription.

cruffle_duffle

2 days ago

[-]

Have any of these no-bs articles described how they handle schema changes? Are they even using a “real” DB or is it all local, single user sqllite? Because I can see a disaster looming letting a vibe coder’s agent loose on a database.

And does it require auth? How is that spec’d out and validated? What about RBAC or anything? How would you even get the LLM to constantly follow rules for that?

Don’t get me wrong these tools are pretty cool but the old adage “if it sounds too good to be true, it probably is” always applies.

lmeyerov

2 days ago

[-]

Senior engineers know process, including for what you described, and that maps to plan-driven AI engineering well:

1. Note the discussion of plan-driven development in the claude code sections (think: plan = granular task list, including goals & validation criteria, that the agent loops over and self-modifies). Plans are typically AI generated: I ask it to do initial steps of researching current patterns for x+y+z and include those in the steps and validations, and even have it re-audit a plan. Codex internally works the same, and multiple people are reporting it automates more of this plan flow.

2. Working with database for tasks like migrations is normal and even better. My two UIs are now the agent CLI (basically streaming AI chat for task list monitoring & editing) and GitHub PR viewer: if it wasn't smart enough to add and test migrations and you didn't put that into the plan, you see it in the PR review and tell it to fix that. Writing migrations is easy, but testing them is annoying, and I've found AI helping write mocks, integration tests, etc to be wonderful.

1 day ago

[-]

(OP) I use atlas for database migrations, it works quite well with agents and has plenty guardrails around it.

outside1234

2 days ago

[-]

Is this paid content from OpenAI?

simonw

2 days ago

[-]

No.

In the USA it's actually not legal for companies to pay for promotional content like this without disclosure. Here's the FTC's FAQ about that: https://www.ftc.gov/business-guidance/resources/ftcs-endorse...

(It's pretty weird to me how much this comes up - there's this cynical idea that nobody would possibly write at length about how they're using these tools for their own work unless they were a paid shill.)

1 day ago

[-]

There are far worse scandals about this company than advertising ethics.

sarchertech

2 days ago

[-]

Another person building zero stakes tools for AI coding. 300k LOC also isn’t impressive if it should have been 10k.

aredox

2 days ago

[-]

>With Claude Code I often have multi-second freezes and it’s process blows up to gigabytes of memory.

I am in a mood where I find it excessively funny that, all that talk about AI, agents, billions of dollars, tera-watts/-hours spent, and people still manage to publish posts with the "its/it's" mistake.

(I am not a native English speaker, so I notice it at a higher rate than people who learned English "by ear".)

Maybe you don't care or you find it annoying to have it pointed out, but it says something about fundamentals. You know, "The way you do one thing is the way you do all things".

Izkata

1 day ago

[-]

As a native speaker, for a very long time I didn't understand why "its" existed, and saw "it's" as parallel to "aredox's" - apostophe-s being either ownership or contraction depending on context (made even more confusing because "s" without an apostrophe is pluralization with other words). Somehow it didn't click until late teens or early 20s that "its" is supposed to be a separate word along the lines of "his" and "hers".

2 days ago

[-]

I get what you're saying. "they're" and "their" is also a classic that many non-natives seem to get right, but native speakers have trouble with.

But OP isn't native either. He's Austrian.

8note

1 day ago

[-]

i dont think its/it's will still be distinct in the dictionary in 15 years. Native english speakers dont really care about the difference, and autocorrect on touch keyboards have disconnected a lot of the input and output of people's typing.