FilterHN

Eight more months of agents

102 points

by arrowsmith

1 day ago

| past

| 21 comments

| crawshaw.io

| HN

▲

dmos62

13 minutes ago

[-]

> the best software for an agent is whatever is best for a programmer

My conclusion as well. It feels paradoxical, maybe because on some level I still think of an LLM as some weird gadget, not a coworker. Context ephemerality is more or less the only veritable difference from a human programmer, I'd say. And, even then, context introduction with LLMs is a speedrun of how you'd do it with new human members of a project. Awesome times we live in.

▲

dagss

12 hours ago

[-]

    But if you try some penny-saving cheap model like Sonnet [..bad things..]. [Better] pay through the nose for Opus.

After blowing $800 of my bootstrap startup funds for Cursor with Opus for myself in a very productive January I figured I had to try to change things up... so this month I'm jumping between Claude Code and Cursor, sometimes writing the plans and having the conversation in Cursor and dump the implementation plan into Claude.

Opus in Cursor is just so much more responsive and easy to talk to, compared to Opus in Claude.

Cursor has this "Auto" mode which feels like it has very liberal limits (amortized cost I guess) that I'm also trying to use more, but -- I don't really like to flip a coin and if it lands up head then waste half hour discovering the LLM made a mess the LLM and try again forcing the model.

Perhaps in March I'll bite the bullet and take this authors advice.

▲

written-beyond

11 hours ago

[-]

Just use Codex 5.3 in codex cli, the $20/mo plan is basically limitless at least for me and I keep reasoning efforts high.

You can enjoy it while it lasts, OpenAI is being very liberal with their limits because of CC eating their lunch rn.

▲

baq

7 minutes ago

[-]

+1, codex 5.2 was really good and 5.3 seems to be better at everything; caveat - I had little time to test it.

▲

rmonvfer

10 hours ago

[-]

Yeah, I can’t recommend gpt-5.3-codex enough, it’s great! I’ve been using it with the new macOS app and I’m impressed. I’ve always been a Claude Code guy and I find myself using codex more and more. Opus is still much nicer explaining issues and walking me through implementations but codex is faster (even with xhigh effort) and gets the job done 95% of the time.

I was spending unholy amounts of money and tokens (subsidized cloud credits tho) forcing Opus for everything but I’m very happy with this new setup. I’ve also experimented with OpenCode and their Zen subscription to test Kimi K2.5 an similar models and they also seem like a very good alternative for some tasks.

What I cannot stand tho is using sonnet directly (it’s fine as a subagent), I’ve found it to be hard to control and doesn’t follow detailed instructions.

▲

jorl17

9 hours ago

[-]

Out of curiosity, what’s your flow? Do you have codex write plans to markdown files? Just chat? What languages or frameworks do you use?

I’m an avid cursor user (with opus), and have been trying alternatives recently. Codex has been an immense letdown. I think I was too spoiled by cursor’s UX and internal planning prompt.

It’s incredibly slow, produces terribly verbose and over-complicated code (unless I use high or xhigh, which are even slower), and missed a lot of details. Python/django and react frontend.

For the first time I felt like I could relate to those people who say it doesn’t make them faster,” because they have to keep fixing the agent’s shot, never felt that with opus 4.5 and 4.6 and cursor

▲

written-beyond

1 hour ago

[-]

Codex cli is a very performant cli though, better than any other cli code assistant I've used.

I mean does it matter what code it's producing? If it renders and functions just use it. I think it's better to take the L on verbose code and optimizing the really ugly bits by hand in a few minutes than be kneecapped every 5 hour by limits and constant pleas to shift to Sonnet.

▲

drsalt

2 hours ago

[-]

you've always been a Claude Code guy? this has existed less than a year.

▲

redanddead

23 minutes ago

[-]

I was born clutching a Claude Code shell, you peasant.

The first sentence out of my mouth was a system prompt

▲

rl3

2 hours ago

[-]

To be fair that still feels like an eternity somehow.

Perhaps AI time is the inverse of Valve time.

▲

dakolli

2 hours ago

[-]

I promise you you're just going to continue to light money on fire. Don't fall for this token madness, the bigger your project gets, the less capable the llm will get and the more you spend per request on average. This is literally all marketing tricks by inference providers. Save your money and code it yourself, or use very inexpensive llm methods if you must.

I think we are going to start hearing stories of people going into thousands in CC debt because they were essentially gambling with token usage thinking they would hit some startup jackpot.

▲

dagss

48 minutes ago

[-]

Compared to the salary I loose by not taking a consulting gig for half a year, these $800 arent't all that much. (I guess depending on definition of bootstrap, mine might not be, as I support myself with saved consulting income.)

Startup is a gamble with or without the LLM costs.

I have been coding for 20 years, I have a good feel for how much time I would have spent without LLM assistance. And if LLMs vanish from the face of the earth tomorrow, I still saved myself that time.

▲

otabdeveloper4

1 hour ago

[-]

You can rent a GPU server and run your own Qwen models.

It's 90 percent the same thing as Claude but with flat-rate costs.

▲

happytoexplain

12 hours ago

[-]

I don't trust the idea of "not getting", "not understanding", or "being out of touch" with anti-LLM (or pro-LLM) sentiment. There is nothing complicated about this divide. The pros and cons are both as plain as anything has ever been. You can disagree - even strongly - with either side. You can't "not understand".

▲

slfnflctd

12 hours ago

[-]

> There is nothing complicated about this divide [...] You can't "not understand"

I beg to differ. There are a whole lot of folks with astonishingly incomplete understanding about all the facts here who are going to continue to make things very, very complicated. Disagreement is meaningless when the relevant parties are not working from the same assumption of basic knowledge.

▲

illusive4080

8 hours ago

[-]

Bolstering your point, check out the comments in this thread: https://www.reddit.com/r/rust/comments/1qy9dcs/who_has_compl...

There’s a lot of unwillingness to even attempt to try the tools.

▲

cruffle_duffle

2 hours ago

[-]

Those people are absolutely going to get left in the dust. In the hands of a skilled dev, these things are massive force multipliers.

▲

skydhash

17 minutes ago

[-]

It doesn’t matter how fast you run if it’s not the correct direction.

▲

baq

5 minutes ago

[-]

Good LLM wielders run in widening circles and get to the goal faster than good old school programmers running in a straight line

▲

derefr

11 hours ago

[-]

The negative impacts of generative AI are most sharply being felt by "creatives" (artists, writers, musicians, etc), and the consumers in those markets. If the OP here is 1. a programmer 2. who works solely with other programmers and 3. who is "on the grind", mostly just consuming non-fiction blog-post content related to software development these days, rather than paying much attention to what's currently happening to the world of movies/music/literature/etc... then it'd be pretty easy for them to not be exposed very much to anti-LLM sentiment, since that sentiment is entirely occurring in these other fields that might have no relevance to their (professional or personal) life.

"Anti-LLM sentiment" within software development is nearly non-existent. The biggest kind of push-back to LLMs that we see on HN and elsewhere, is effectively just pragmatic skepticism around the effectiveness/utility/ROI of LLMs when employed for specific use-cases. Which isn't "anti-LLM sentiment" any more than skepticism around the ability of junior programmers to complete complex projects is "anti-junior-programmer sentiment."

The difference between the perspectives you find in the creative professions vs in software dev, don't come down to "not getting" or "not understanding"; they really are a question of relative exposure to these pro-LLM vs anti-LLM ideas. Software dev and the creative professions are acting as entirely separate filter-bubbles of conversation here. You can end up entirely on the outside of one or the other of them by accident, and so end up entirely without exposure to one or the other set of ideas/beliefs/memes.

(If you're curious, my own SO actually has this filter-bubble effect from the opposite end, so I can describe what that looks like. She only hears the negative sentiment coming from the creatives she follows, while also having to dodge endless AI slop flooding all the marketplaces and recommendation feeds she previously used to discover new media to consume. And her job is one you do with your hands and specialized domain knowledge; so none of her coworkers use AI for literally anything. [Industry magazines in her field say "AI is revolutionizing her industry" — but they mean ML, not generative AI.] She has no questions that ChatGPT could answer for her. She doesn't have any friends who are productively co-working with AI. She is 100% out-of-touch with pro-LLM sentiment.)

▲

wasmainiac

1 hour ago

[-]

> "Anti-LLM sentiment" within software development is nearly non-existent.

I see it all the time in professional and personal circles. For one, you are shifting the goalpost on what is “anti-llm”, two, people are talking about the negative social, political and environmental impacts.

What is your source here?

▲

remich

11 hours ago

[-]

I think this is an interesting point, my one area of disagreement is that there is no "anti-LLM sentiment" in the programming community. Sure, plenty of folks expressing skepticism or disagreement are doing so from a genuine place, but just in reading this site and a few email newsletters I get I can say that there is a non-trivial percent in the programming world who are adamantly opposed to LLMs/AI. When I see comments from people in that subset, it's quite clear that they aren't approaching it from a place of skepticism, where they could be convinced given appropriate evidence or experiences.

▲

bitwize

1 hour ago

[-]

But there's a difference. Being opposed to AI-generated art/music/writing is valid because humans still contribute something extraordinarily meaningful when they do it themselves. There's no market for AI-generated music, and AI-generated art and writing tends to get called out right away when it's detected. People want the human expression in human-generated art, and the AI stuff is a weak placeholder at best.

For software the situation is different. Being opposed to LLM-generated software is just batshit crazy at this point. The value that LLMs provide to the process makes learning to use them, objectively, an absolute must; otherwise you are simply wasting time and money. Eric S. Raymond put it something like "If you call yourself a software engineer, you have no excuse not to be using these tools. Get your thumb out of your ass and learn."

▲

skydhash

5 minutes ago

[-]

Ok, I’ll bite. What’s there to learn that you can tie directly to an increase of productivity?

I can say “learn how to use vim makeprg feature so that you can jump directly to errors reported by the build and tool” and it’s very clear where the ROI. But all the AI hypers are selling are hope, prayers, and rituals.

▲

overgard

11 hours ago

[-]

> "Anti-LLM sentiment" within software development is nearly non-existent.

Strong disagree right there. I remember talking to a (developer) coworker a few months ago who seemed like the biggest AI proponent on our team. When we were one-on-one during a lunch though, he revealed that he really doesn't like AI that much at all, he's just afraid to speak up against it. I'm in a few Discord channels with a lot of highly skilled (senior and principal programmers) who mostly work in game development (or adjacent), and most of them either mock LLMs or have a lot of derision for it. Hacker News is kind of a weird pro-AI bubble, most other places are not nearly as keen on this stuff.

▲

happytoexplain

10 hours ago

[-]

>"Anti-LLM sentiment" within software development is nearly non-existent

This is certainly untrue. I want to say "obviously", which means that maybe I am misunderstanding you. Below are some examples of negative sentiments programmers have - can you explain why you are not counting these?

NOTE: I am not presenting these as an "LLMs are bad" argument. My own feelings go both ways. There is a lot that's great about LLMs, and I don't necessarily agree with every word I've written below - some of it is just my paraphrasing of what other people say. I'm only listing examples of what drives existing anti-LLM sentiment in programmers.

1. Job loss, loss of income, or threat thereof

These two are exacerbated by the pace of change, since so many people already spent their lives and money establishing themselves in the career and can't realistically pivot without becoming miserable - this is the same story for every large, fast change - though arguably this one is very large and very fast even by those standards. Lots of tech leadership is focusing even more than they already were on cheap contractors, and/or pushing employees for unrealistic productivity increases. I.e. it's exacerbating the "fast > good" problem, and a lot of leadership is also overestimating how far it reduces the barrier to creating things, as opposed to mostly just speeding up a person's existing capabilities. Some leadership is also using the apparent loss of job security as leverage beyond salary suppression (even less proportion of remote work allowed, more surveillance, worse office conditions, etc).

2. Happiness loss (in regards to the job itself, not all the other stuff in this list)

This is regarding people who enjoy writing/designing programs but don't enjoy directing LLMs; or who don't enjoy debugging the types of mistakes LLMs tend to make, as opposed to the types of mistakes that human devs tend to make. For these people, it's like their job was forcibly changed to a different, almost unrelated job, which can be miserable depending on why you were good at - or why you enjoyed - the old job.

3. Uncertainty/skepticism

I'm pushing back on your dismissal of this one as "not anti-LLM sentiment" - the comparison doesn't make sense. If I was forced to only review junior dev code instead of ever writing my own code or reviewing experienced dev code, I would be unhappy. And I love teaching juniors! And even if we ignore the subset of cases where it doesn't do a good job or assume it will soon be senior-level for every use case, this still overlaps with the above problem: The mistakes it makes are not like the mistakes a human makes. For some people, it's more unnatural/stressful to keep your eyes peeled for the kinds of mistakes it makes. For these people, it's a shift away from objective, detail-oriented, controlled, concrete thinking; away from the feeling of making something with your hands; and toward a more wishy-washy creation experience that can create a feeling of lack of control.

4. Expertise loss

A lot of positive outcomes with LLMs come from being already experienced. Some argue this will be eroded - both for new devs and existing experienced devs.

5. The training data ownership/morality angle

▲

bitwize

1 hour ago

[-]

Facebook's algorithm has picked up on the idea that "I like art". It has subsequently given me more examples of (human-created) art in my feed. Art from comic book artists, art from manga-style creators, "weird art", even a make-up artist who painted a scene from Where the Wild Things Are on her face.

I like this. What's more, while AI-generated art has a characteristic sameyness to it, the human-produced art stands out in its originality. It has character and soul. Even if it's bad! AI slop has made the human-created stuff seem even more striking by comparison. The market for human art isn't going anywhere, just like the audience for human-played chess went nowhere after Deep Blue. I think people will pay a premium for it, just to distinguish themselves from the slop. The same is true of writing and especially music. I know of no one who likes listening to AI-generated music. Even Sabrina Carpenter would raise less objection.

The same, I'm afraid, cannot be said for software—because there is little value for human expression in the code itself. Code is—almost entirely—strictly utilitarian. So we are now at an inflection point where LLMs can generate and validate code that's nearly as good, if not better, than what we can produce on our own. And to not make use of them is about as silly as Mel Kaye still punching in instruction opcodes in hex into the RPC-4000, while his colleagues make use of these fancy new things called "compilers". They're off building unimaginably more complex software than they could before, but hey, he gets his pick of locations on the rotating memory drum!

I'm one of the nonexistent anti-LLMers when it comes to software. I hate talking to a clanker, whose training data set I don't even have access to let alone the ability to understand how my input affects its output, just to do what I do normally with the neural net I've carried around in my skull and trained extensively for this very purpose. I like working directly with code. Code is not just a product for me; it is a medium of thought and expression. It is a formalized notation of a process that I can use to understand and shape that process.

But with the right agentic loops, LLMs can just do more, faster. There's really no point in resisting. The marginal value of what I do has just dropped to zero.

▲

joefourier

11 hours ago

[-]

The author is correct in that agents are becoming more and more capable and that you don't need the IDE to the same extent, but I don't see that as good. I find that IDE-based agentic programming actually encourages you to read and understand your codebase as opposed to CLI-based workflows. It's so much easier to flip through files, review the changes it made, or highlight a specific function and give it to the agent, as opposed to through the CLI where you usually just give it an entire file by typing the name, and often you just pray that it manages to find the context by itself. My prompts in Cursor are generally a lot more specific and I get more surgical results than with Claude Code in the terminal purely because of the convenience of the UX.

But secondly, there's an entire field of LLM-assisted coding that's being almost entirely neglected and that's code autocomplete models. Fundamentally they're the same technology as agents and should be doing the same thing: indexing your code in the background, filtering the context, etc, but there's much less attention and it does feel like the models are stagnating.

I find that very unfortunate. Compare the two workflows:

With a normal coding agent, you write your prompt, then you have to at least a full minute for the result (generally more, depending on the task), breaking your flow and forcing you to task-switch. Then it gives you a giant mass of code and of course 99% of the time you just approve and test it because it's a slog to read through what it did. If it doesn't work as intended, you get angry at the model, retry your prompt, spending a larger amount of tokens the longer your chat history.

But with LLM-powered auto-complete, when you want, say, a function to do X, you write your comment describing it first, just like you should if you were writing it yourself. You instantly see a small section of code and if it's not what you want, you can alter your comment. Even if it's not 100% correct, multi-line autocomplete is great because you approve it line by line and can stop when it gets to the incorrect parts, and you're not forced to task switch and you don't lose your concentration, that great sense of "flow".

Fundamentally it's not that different from agentic coding - except instead of prompting in a chatbox, you write comments in the files directly. But I much prefer the quick feedback loop, the ability to ignore outputs you don't want, and the fact that I don't feel like I'm losing track of what my code is doing.

▲

wavemode

1 hour ago

[-]

I agree with you wholeheartedly. It seems like a lot of the work on making AI autocomplete better (better indexing, context management, codebase awareness, etc) has stagnated in favor of full-on agentic development, which simply isn't suited for many kinds of tasks.

▲

coffeefirst

10 hours ago

[-]

The other thing about non-agent workflows is they’re much, much less compute intensive. This is going to matter.

▲

gip

1 hour ago

[-]

> In 2026, I don't use an IDE any more.

I don't think it is the best way to look at it. I think that now every team has the power to build and maintain an internal agent (tool + UX) to manager software products. I don't necessarily think that chat-only is enough except for small projects, so teams will build agent that gives them access to the level of abstraction that works best.

It's a data point but this weekend (e.g. in 2 days) I build a desktop + web agent that is able to help me reason on system design and code. Built with Codex powered by the Codex SDK. It is high quality. I've been a software engineer and director of engineering for 10 years. I'm blown away.

▲

sarchertech

45 minutes ago

[-]

I’m not saying this is definitely a bot. However, this is the 7th time I’ve read a post and thought it might be an OpenAI promotion bot, clicked on the username, and noticed that the account was created in 2011.

I have yet to do this and see any other year. Was there someone who bought a ton of accounts in 2011 to farm them out? A data breach? Was 2011 just a very big year for new users? (My own account is from 2011)

▲

gip

13 minutes ago

[-]

I'm not a bot. You are saying that because for some reason you resent people who have a good experience with Codex / OpenAI. Curious what that is - people hate the CEO or what?

I like Claude Code too btw.

The crazy thing here is that I wrote the initial comment myself!

▲

sarchertech

2 minutes ago

[-]

> It's a data point but this weekend (e.g. in 2 days) I build a desktop + web agent that is able to help me reason on system design and code. Built with Codex powered by the Codex SDK. It is high quality. I've been a software engineer and director of engineering for 10 years. I'm blown away.

Assuming you’re not a bot. It’s nothing to do with you having a good experience, it’s the way you wrote about that experience that sounds like a product placement.

▲

redanddead

8 minutes ago

[-]

That's exactly what a bot would say

▲

redanddead

29 minutes ago

[-]

2011 just so happened to be 4 years before a very important year: 2015 — The founding of OpenAI. Unrelated note, have you tried Codex and the Codex SDK?

▲

loveparade

34 minutes ago

[-]

It's definitely a bot, just like probably around 10% of comments on HN at this point, and the majority of upvotes. And it's only increasing.

Calling it bot is a bit dismissive though. It's an agent!

▲

gip

3 minutes ago

[-]

Care to have a phone call with who you call a bot tonight?

If so, send a DM on twitter to @edfixyz with your phone number and I will call you immediately. Or give me your twitter handle.

I'm tired of that BS - when people don't like what you write they call you a bot.

▲

redanddead

27 minutes ago

[-]

it is giving a very agentic vibe

▲

hoistbypetard

13 minutes ago

[-]

> To me that statement is as obvious as "water is wet".

Water is not wet. Water makes things wet. Perhaps the inaccuracy of that statement should be taken as a hint that the other statements that you hold on the same level are worthy of reconsideration.

▲

baq

10 minutes ago

[-]

The good old classic technically correct and completely besides the point observation.

▲

dmk

12 hours ago

[-]

The real insight buried in here is "build what programmers love and everyone will follow." If every user has an agent that can write code against your product, your API docs become your actual product. That's a massive shift.

▲

anthuswilliams

11 hours ago

[-]

I'm very much looking forward to this shift. It is SO MUCH more pro-consumer than the existing SaaS model. Right now every app feels like a walled garden, with broken UX, constant redesigns, enormous amounts of telemetry and user manipulation. It feels like every time I ask for programmatic access to SaaS tools in order to simplify a workflow, I get stuck in endless meetings with product managers trying to "understand my use case", even for products explicitly marketed to programmers.

Using agents that interact with APIs represents people being able to own their user experience more. Why not craft a frontend that behaves exactly the the way YOU want it to, tailor made for YOUR work, abstracting the set of products you are using and focusing only on the actual relevant bits of the work you are doing? Maybe a downside might be that there is more explicit metering of use in these products instead of the per-user licensing that is common today. But the upside is there is so much less scope for engagement-hacking, dark patterns, useless upselling, and so on.

▲

dang

12 hours ago

[-]

Related. Others?

How I program with agents - https://news.ycombinator.com/item?id=44221655 - June 2025 (295 comments)

▲

monus

10 hours ago

[-]

> Along the way I have developed a programming philosophy I now apply to everything: the best software for an agent is whatever is best for a programmer.

Not a plug but really that’s exactly why we’re building sandboxes for agents with local laptop quality. Starting with remote xcode+sim sandboxes for iOS, high mem sandbox with Android Emulator on GPU accel for Android.

No machine allocation but composable sandboxes that make up a developer persona’s laptop.

If interested, a quick demo here https://www.loom.com/share/c0c618ed756d46d39f0e20c7feec996d

muvaf[at]limrun[dot]com

▲

hasperdi

12 hours ago

[-]

> It sounds like someone saying power tools should be outlawed in carpentry.

I see this a lot here

▲

dagss

41 minutes ago

[-]

On HN lately? Haven't seen anything about outlawing. But I see a lot of "powertools don't work and make me slower"

▲

sp33der89

12 hours ago

[-]

All metaphors break down at a certain point, but power tools and generative AI/LLMs being compared feels like somebody is romanticizing the art of programming a bit too much.

Copyright law, education, just the sheer scale of things changing because of LLMs are some things off the top of my head why "power tools vs carpentry" is a bad analogy.

▲

wasmainiac

1 hour ago

[-]

Yes because A tech-bro AIs dream is hundreds of thousands of developers being let go and replacing them with no code tools.

Sure, replace me with AI, but I better get royalties on my public contributions. I like many other developers have kids and other responsibilities to pay for.

We did not share our work publicly to be replaced. The same way I did not lend my neighbour my car so he could run me over, that was implicit.

▲

senordevnyc

54 minutes ago

[-]

We’ve been doing this to other professions for half a century. Live by the sword, die by the sword.

▲

Keyframe

11 hours ago

[-]

if that someone is clumsy, had an active war going on against basic tools before, and wandered into the carpentry from completely different area, then power tools might be a bad idea.

▲

post-it

11 hours ago

[-]

> Agent harnesses have not improved much since then. There are things Sketch could do well six months ago that the most popular agents cannot do today.

I think this is a neglected area that will see a lot of development in the near future. I think that even if development on AI models stopped today - if no new model was ever trained again - there are still decades of innovation ahead of us in harnessing the models we already have.

Consider ChatGPT: the first release relied entirely on its training data to answer questions. Today, it typically does a few Google searches and summarizes the results. The model has improved, but so has the way we use it.

▲

tiny-automates

1 hour ago

[-]

agreed, and i'd go further - the harness is where evaluation actually happens, not in some separate benchmark suite. rhe model doesn't know if it succeeded at a web task. the harness has to verify DOM state, check that the right element was clicked, confirm the page transitioned correctly. right now most harnesses just check "did the model say it was done" which is why pass rates on benchmarks don't translate to production reliability. the interesting harness work is building verification into the loop itself, not as an afterthought.

▲

kevmo314

11 hours ago

[-]

Really? I hardly think it's neglected. The Claude Code harness is the only reason I come back to it. I've tried Claude via OpenCode or others and it doesn't work as well for me. If anything, I would even argue that prior to 4.6, the main reason Opus 4.5 felt like it improved over months was the harness.

▲

dirkc

12 hours ago

[-]

> Using anything other than the frontier models is actively harmful

If that is true, why should one invest in learning now rather than waiting for 8 months to learn whatever is the frontier model then?

▲

jonas21

12 hours ago

[-]

So that you can be using the current frontier model for the next 8 months instead of twiddling your thumbs waiting for the next one to come out?

I think you (and others) might be misunderstanding his statement a bit. He's not saying that using an old model is harmful in the sense that it outputs bad code -- he's saying it's harmful because some of the lessons you learn will be out of date and not apply to the latest models.

So yes, if you use current frontier models, you'll need to recalibrate and unlearn a few things when the next generation comes out. But in the meantime, you will have gotten 8 months (or however long it takes) of value out of the current generation.

▲

properbrew

11 hours ago

[-]

You also don't have to throw away everything you've learnt in those 8 months, there's some things that you'll subtly pickup that you can carry over into the next generation as well.

▲

ej88

12 hours ago

[-]

It's not like you need to take a course. The frontier models are the best, just using them and their harnesses and figuring out what works for your use case is the 'investing in learning'.

▲

recursive

12 hours ago

[-]

How could it be actively harmful if it wasn't harmful last month when it was the frontier model?

▲

fusslo

12 hours ago

[-]

snarky answer: so you can be that 'AI guy' at your office that everyone avoids in the snackroom

▲

senko

12 hours ago

[-]

Because you might want to use LLMs now. If not, it's definitely better to not chase the hype - ignore the whole shebang.

But if you do want to use LLMs for coding now, not using the best models just doesn't make sense.

▲

dsign

11 hours ago

[-]

Look, I'm very negative about this AI thing. I think there is a great chance it will lead to something terrible and we will all die, or worse. But on the other hand, we are all going to die anyway. Some of us, the lucky ones, will die of a heart attack and will learn of our imminent demise in the second it happens, or not at all. The rest of us will have it worse. It has always been like that, and it has only gotten more devastating since we started wearing clothes and stopped being eaten alive by a savanna crocodile or freezing to death during the first snowfall of winter.

But if AI keeps getting better at code, it will produce entire in-silico simulation workflows to test new drugs or even to design synthetic life (which, again, could make us all die, or worse). Yet there is a tiny, tiny chance we will use it to fix some of the darkest aspects of human existence. I will take that.

▲

xyzsparetimexyz

6 minutes ago

[-]

That's stupid. If you genuinely think that there's a great chance AI will kill us all, you wouldn't spin the wheel just for some small vague chance that it doesn't and something good (what exactly, nobody knows) will happen

▲

emmawirt

12 hours ago

[-]

Curious what you mean by "agent harness" here... are you distinguishing between true autonomous agents (model decides next step) vs workflows that use LLMs at specific nodes? I've found the latter dramatically more reliable for anything beyond prototyping, which makes me wonder if the "model improvement" is partly better prompting and scaffolding.

▲

crawshaw

11 hours ago

[-]

Hi, author here. I mean the piece of code that calls the model and executes the tool calls. My colleague Philip calls it “9 lines of code”: https://sketch.dev/blog/agent-loop

We have built two of them now, and clearly the state of the art here can be improved. But it is hard to push too much on this while the models keep improving.

▲

tiny-automates

1 hour ago

[-]

the harness being "9 lines of code" is deceptive in the same way a web server is "just accept connections and serve files."

the hard part isn't the loop itself — it's everything around failure recovery.

when a browser agent misclicks, loads a page that renders differently than expected, or hits a CAPTCHA mid-flow, the 9-line loop just retries blindly. the real harness innovation is going to be in structured state checkpointing so the agent can backtrack to the last known-good state instead of restarting the whole task. that's where the gap between "works in a demo" and "works on the 50th run" lives.

▲

rahimnathwani

11 hours ago

[-]

An agent harness is what enables the user to seamlessly interact with both a model and tool calls. Claude Code is an agent harness.

  ┌────────────────────────────┐
  │           User             │
  └──────────────┬─────────────┘
                 │
                 ▼
  ┌────────────────────────────┐
  │       Agent Harness        │
  │   (software interface)     │
  └──────┬──────────────┬──────┘
         │              │
         ▼              ▼
  ┌────────────┐ ┌────────────┐
  │   Models   │ │   Tools    │
  └────────────┘ └────────────┘

Here's an example of a harness with less code: https://github.com/badlogic/pi-mono/blob/fdcd9ab783104285764...

▲

11 hours ago

[-]

I have no problem with experienced senior devs using agents to write good code faster. What I have a problem with is inexperienced "vibecoders" who don't care to learn and instead use agents to write awful buggy code that will make the product harder to build on even for the agents. It used to be that lack of a basic understanding of the system was a barrier for people, but now it's not, so we're flooded with code written by imperfect models conducted by people who don't know good from bad.

▲

bdangubic

11 hours ago

[-]

the number of experienced, senior programmers though, who are in “anti-LLM” camp, is still fairly staggering.

▲

estimator7292

10 hours ago

[-]

I mean when the tag line is "this will replace senior engineers and you, the senior engineer, must be forced to use it"

Then yeah, it makes sense.

▲

girvo

4 minutes ago

[-]

Yeah I’m baffled why people are surprised that senior+ engineers who are being told in one breath they will be replaced by this tool and also they MUST use this tool to make it better to replace them aren’t happy about it or want to use it willingly.

I also find it wild how we’re sleepwalking into this, but I’m also part of the problem and using these things too.

▲

codebolt

11 hours ago

[-]

Where are you encountering all this slop code? At my work we use LLMs heavily and I don't see this issue. Maybe I'm just lucky that my colleagues all have Uni degrees in CS and at least a few years experience.

▲

post-it

11 hours ago

[-]

> Maybe I'm just lucky that my colleagues all have Uni degrees in CS and at least a few years experience.

That's why. I was using Claude the other day to greenfield a side project and it wanted to do some important logic on the frontend that would have allowed unauthenticated users to write into my database.

It was easy to spot for me, because I've been writing software for years, and it only took a single prompt to fix. But a vibe coder wouldn't have caught it and hackers would've pwned their webapp.

▲

giancarlostoro

11 hours ago

[-]

You can also ask Claude to review all the code for security issues and code smells, you'd be surprised what it finds. We all write insecure code in our first pass through if we're too focused on getting the proof of concept worked out, security isnt always the very 1st thing coded, maybe its the very next thing, maybe it comes 10 changes later.

▲

blibble

11 hours ago

[-]

> We all write insecure code in our first pass through

no, we don't

▲

giancarlostoro

6 hours ago

[-]

Yes we do, you don't just start a brand new web project and spit out CORS rules, authentication schemes, roles, etc in one sitting do you? Are you an AI?

▲

girvo

2 minutes ago

[-]

Yes I really do, because this has been a solved problem for a while. Also it’s necessary to get right because retro fitting it later is a pain.

▲

blibble

6 hours ago

[-]

> are you an AI?

no, I'm a competent engineer

maybe you've not worked with any

▲

bowsamic

11 hours ago

[-]

The issue isn't when the programmers start using it. It's when the project managers start using it and think that they're producing something similar to the programmers

▲

remich

10 hours ago

[-]

We're in a transition phase, but this will shake out in the near future. In the non-professional space, poorly built vibecoded apps simply won't last, for any number of reasons. When it comes to professional devs, this is a problem that is solved by a combination of tooling, process, and management:

(1) Tooling to enable better evaluation of generated code and its adherence to conventions and norms (2) Process to impose requirements on the creation/exposure of PRDs/prompts/traces (3) Management to guide devs in the use of the above and to implement concrete rewards and consequences

Some organizations will be exposed as being deficient in some or all of these areas, and they will struggle. Better organizations will adapt.

▲

etamponi

9 hours ago

[-]

The unfortunate reality is that (1) and (2) is what many, many engineers would like to do, but management is going EXACTLY in the opposite direction: go faster! Go faster! Why are you spending time on these things

▲

jeffrallen

1 hour ago

[-]

Listen to this guy. I've been using his code for a long time, and it works. I am a happy customer of his service, and it works. I listen to his advice and it works.

▲

uludag

11 hours ago

[-]

> I am having more fun programming than I ever have, because so many more of the programs I wish I could find the time to write actually exist. I wish I could share this joy with the people who are fearful about the changes agents are bringing.

It might be just me but this reads as very tone deaf. From my perspective, CEOs are seething at the mouth to make as many developers redundant as possible, not being shy about this desire. (I don't see this at all as inevitable, but tech leaders have made their position clear)

Like, imagine the smugness of some 18th century "CEO" telling an artisan, despite the fact that he'l be resigned to working in horrific conditions at a factory, to not worry and think of all the mass produced consumer goods he may enjoy one day.

It's not at all a stretch of the imagination that current tech workers may be in a very precarious situation. All the slopware in the world wouldn't console them.

▲

overgard

10 hours ago

[-]

I bought Steve Yegge's "Vibe Coding" book. I think I'm about 1/4th of the way through it or so. One thing that surprised me is there's this naivete on display that workers are going to be the ones to reap the benefits of this. Like, Steve was using an example of being able to direct the agent while doing leisure activities (never mind that Steve is more of an executive/thought leader in this company, and, prior to LLMs, seemed to be out of the business of writing code). That's a nice snapshot of a reality that isn't going to persist..

While the idea of programmers working two hours a day and spending the rest of it with their family seems sunny, that's absolutely not how business is going to treat it.

Thought experiment... CEO has a team of 8 engineers. They do some experiments with AI, and they discover that their engineers are 2x more effective on average . What does the CEO do?

a) Change the workweek to 4 hours a day so that all the engineers have better work/life balance since the same amount of work is being done.

b) Fire half the engineers, make the 4 remaining guys pick up the slack, rinse and repeat until there's one guy left?

Like, come on. There's pushback on this stuff not because the technology is bad, (although it's overhyped), but because the no sane person trusts our current economic system to provide anything resembling humane treatment of workers. The super rich are perfectly fine seeing half the population become unemployed, as far as I can tell, as long as their stock numbers go up.

▲

yojat661

53 minutes ago

[-]

You missed option c. C) keep all 8 engineers so the team can pump out features faster, all still working 8 hour days. The ceo will probably be forced to do it to keep up with their competition.

▲

georgemcbay

1 hour ago

[-]

Haven't read that book, but agree that if anyone thinks the workers are likely to capture the value of this productivity shift, they haven't been paying attention to reality.

Though at the same time I also think a lot of the CEO-types (at least in the pure software world) who believe they are going to capture the value of this productivity shift are also in for a rude awakening because if AI doesn't stall out, its only a matter of time from when their engineers are replaceable to when their company doesn't need to exist at all anymore.

▲

Herring

11 hours ago

[-]

> In 2000, less than one percent lived on farms and 1% of workers are in agriculture. That was a net benefit to the world, that we all don't have to work to eat.

The jury's still out on that one, because climate change is an existential risk.

▲

gradus_ad

11 hours ago

[-]

Existential? Maybe to beachfront property owners

▲

xyzsparetimexyz

4 minutes ago

[-]

Idiot

▲

Herring

11 hours ago

[-]

Everybody gangsta until the permafrost starts leaking massive amounts of methane.

▲

esafak

10 hours ago

[-]

Did you know that 10% of the world's population lives in coastal zones at low elevations?

▲

panny

1 hour ago

[-]

I see a lot of people here saying things like:

>ah, they're so dumb, they don't get it, the anti-LLM people

This is one of the reasons I see AI failing in the short term. If I call you an idiot, are you more or less likely to be open minded and try what I'm selling? AI isn't making money, 95% of companies are failing with AI

https://fortune.com/2025/08/18/mit-report-95-percent-generat...

I mean, your AIs might be a lot more powerful if it was generating money, but that's not happening. I guess being condescending to the 95% of potential buyers isn't really working out.

▲

Krei-se

11 hours ago

[-]

The author has a github.

▲

almostdeadguy

12 hours ago

[-]

In the past couple days I've become less skeptical of the capabilities of LLMs and now more alarmed by them, contra the author. I think if we as a society continue to accept the development of LLMs and the control of them by the major AI companies there will be massively negative repercussions. And I don't mean repercussions like "a rogue AI will destroy humanity" per se, but these things will potentially cause massive social upheaval, a large amount of negative impacts on mental health and cognition, etc. I think if you see LLMs as powerful but not dangerous you are not being honest.

▲

NitpickLawyer

11 hours ago

[-]

There are some good things here:

First, we currently have 4 frontier labs, and a bunch of 2nd tier ones following. The fact that we don't have just oAI or just Anthropic or just Google is good in the general sense, I would say. The 4 labs racing each other and trading SotA status for ~a few weeks is good for the end consumer. They keep each other honest and keep the prices down. Imagine if Anthropic could charge 60$ /MTok or oAI could charge 120$ /MTok for their gpt4 style models. They can't in good part because of the competition.

Second, there's a bunch of labs / companies that have released and are continuing to release open mdoels. That's as close to "intelligence on tap" as you can get. And those models are ~6-12 months behind the SotA models, depending on your usecase. Even though the labs have largely different incentives to do so, a lot of them are still releasing open models. Hopefully that continues to hold. So not all control will be in the hands of big tech, even if the "best" will still be theirs. At some point "good enough" is fine.

There's also the thing about geopolitics being involved in this. So far we've seen the EU jumping the gun on regulation, and we're kinda sorta paying for it. Everyone is still confused about what can or cannot be done in the EU. The US seems to be waiting to see what happens, and China will do whatever they do. The worst thing that can happen is that at some point the big players (Anthropic is the main driver) push for regulatory capture. That would really suck. Thankfully atm there's this lingering thinking that "if we do it, the others won't so we'll be on the back foot". Hopefully this holds, at least until the "good enough" from above is out :)

▲

almostdeadguy

10 hours ago

[-]

I'm not just concerned about control by one company, I'm concerned by control for the profit motive, and probably concerned about the wisdom of using these things for anything except extremely limited use cases (breakthrough scientific research, etc.). I think tech people have a bad tendency of viewing this through the lens of platform wars type stakes, and there are much bigger problems with AI. The fact that an alarming number of ex-and-current Anthropic people I've met think the world is going to end is something we should take heed of!

The AI labs started down this path using the Manhattan Project as a metaphor and guess what? It's a good metaphor and we should embrace most of the wider implications of that (though I'd love to avoid all the MAD/cold war bullshit this time).

▲

gadflyinyoureye

11 hours ago

[-]

I see them as powerful and dangerous. The goal for decades now is to reduce the human population to 500 million. All human technology was pushed to this end, covertly. If we suddenly have a technology that renders white collar workers useless, we will get to that number faster than expected.

▲

Applejinx

11 hours ago

[-]

I don't believe that is true, but if it WAS true that human technology was covertly pushed to this end: there are people out there who are demanding that this technology come up with social manipulations (using language) to reduce the human population to a SPECIFIC 500 million.

Or less.

And I don't think it's collar color they're going to be checking against.

So I guess I'm saying I agree that this is powerful and dangerous. These are language models, so they're more effective against humans and their languages. And self-preservation, empathy, humanity do not play a role as there is nobody in there to be offended at the notion of intentionally killing more than 9/10 of humanity… for some definitions of humanity, ones I'm sympathetic to.