As a real world example, I was told to evaluate Claude Code and ChatGPT codex at my current job since my boss had heard about them and wanted to know what it would mean for our operations. Our main environment is a C# and Typescript monorepo with 2 products being developed, and even with a pretty extensive test suite and a nearly 100 line "AGENTS.md" file, all models I tried basically fail or try to shortcut nearly every task I give it, even when using "plan mode" to give it time to come up with a plan before starting. To be fair, I was able to get it to work pretty well after giving it extremely detailed instructions and monitoring the "thinking" output and stopping it when I see something wrong there to correct it, but at that point I felt silly for spending all that effort just driving the bot instead of doing it myself.
It almost feels like this is some "open secret" which we're all pretending isn't the case too, since if it were really as good as a lot of people are saying there should be a massive increase in the number of high quality projects/products being developed. I don't mean to sound dismissive, but I really do feel like I'm going crazy here.
- driving the LLM instead of doing it yourself. - sometimes I just can't get the activation energy and the LLM is always ready to go so it gives me a kickstart
- doing things you normally don't know. I learned a lot of command like tools and trucks by seeing what Claude does. Doing short scripts for stuff is super useful. Of course, the catch here is if you don't know stuff you can't drive it very well. So you need to use the things in isolation.
- exploring alternative solutions. Stuff that by definition you don't know. Of course, some will not work, but it widens your horizon
- exploring unfamiliar codebases. It can ingest huge amounts of data so exploration will be faster. (But less comprehensive than if you do it yourself fully)
- maintaining change consistency. This I think it's just better than humans. If you have stuff you need to change at 2 or 3 places, you will probably forget. LLM's are better at keeping consistency at details (but not at big picture stuff, interestingly.)
I'd previously encountered tools that seemed interesting, but as soon as I tried getting it to run I found myself going down an infinite debugging hole. With an LLM I can usually explain my system's constraints and the best models will give me a working setup from which I can begin iterating. The funny part is that most of these tools are usually AI related in some way, but getting a functional environment often felt impossible unless you had really modern hardware.
There is a counter issue though, realizing mid session that the model won’t be able to deliver that last 10%, and now you have to either grok a dump of half finished code or start from scratch.
They are still very useful, locally.
If (and it's a big if) the LLM gives you something that kinda, sorta, works, it may be an easier task to keep that working, and make it work better, while you refactor it, than it would have been to write it from scratch.
That is going to depend a lot on the skillset and motivation of the programmer, as well as the quality of the initial code dump, but...
There's a lot to be said for working code. After all, how many prototypes get shipped?
I use Claude Code a decent amount, and I actually find that sometimes this can be the opposite for me. Sometimes it is actually missing other areas that the change will impact and causing things to break. Sometimes when I go to test it I need to correct it and point out it missed something or I notice when in the planning phase that it is missing something.
However I do find if you use a more powerful opus model when planning, it does consider things fully a lot better than it used to. This is actually one area I have been seeing some very good improvements as the models and tooling improves.
In fact, I actually hope that these AI tools keep getting better at the point you mention, as humans also have a "context limit". There are only so many small details I can remember about the codebase so it is good if AI can "remember" or check these things.
I guess a lot of the AI can also depend on your codebase itself, how you prompt it, and what kind of agents file you have. If you have a robust set of tests for your application you can very easily have AI tools check their work to ensure things aren't being broken and quickly fix it before even completing the task. If you don't have any testing more could be missed. So I guess it's just like a human in some sense. If you have a crappy codebase for the AI to work with, the AI may also sometimes create sloppy work.
I think it makes sense? Unlike small details which are certain to be explicitly part of the training data, "big picture stuff" feels like it would mostly be captured only indirectly.
It's possible some of it is due to codebase size or tech stack, but I really think there might be more of a human learning curve going on here than a lot of people want to admit.
I think I am firmly in the average of people who are getting decent use out of these tools. I'm not writing specialized tools to create agents of agents with incredibly detailed instructions on how each should act. I haven't even gotten around to installing a Playwright mcp (probably my next step).
But I've:
- created project directories with soft links to several of my employer's repos, and been able to answer several cross-project and cross-team questions within minutes, that normally would have required "Spike/Disco" Jira tickets for teams to investigate
- interviewed codebases along with product requirements to come up with very detailed Jira AC, and then,.. just for the heck of it, had the agent then use that AC to implement the actual PR. My team still code-reviewed it but agreed it saved time
- in side projects, have shipped several really valuable (to me) features that would have been too hard to consider otherwise, like... generating pdf book manuscripts for my branching-fiction creating writing club, and launching a whole new website that has been mired in a half-done state for years
Really my only tricks are the basics: AGENTS.md, brainstorm with the agent, continually ask it to write markdown specs for any cohesive idea, and then pick one at a time to implement in commit-sized or PR-sized chunks. GPT-5.2 xhigh is a marvel at this stuff.
My codebases are scala, pekko, typescript/react, and lilypond - yeah, the best models even understand lilypond now so I can give it a leadsheet and have it arrange for me two-hand jazz piano exercises.
I generally think that if people can't reach the above level of success at this point in time, they need to think more about how to communicate better with the models. There's a real "you get out of it what you put into it" aspect to using these tools.
Can I get it to finish by asking it over and over to code review its PR or some other such generic prompt to weed out the skips and scaffolding? Also yes.
Basically these things just need a supervisor looking at the requirements, test results, and evaluating the code in a loop. Sometimes that's a human, it can also absolutely be an LLM. Having a second LLM with limited context asking questions to the worker LLM works. Moreso when the outer loop has code driving it and not just a prompt.
For example I'm working on some virtualization things where I want a machine to be provisioned with a few options of linux distros and BSDs. In one prompt I asked for this list to be provisioned so a certain test of ssh would complete, it worked on it for several hours and now we're doing the code review loop. At first it gave up on the BSDs and I had to poke it to actually finish with an idea it had already had, now I'm asking it to find bugs and it's highlighting many mediocre code decisions it has made. I haven't even tested it so I'm not sure if it's lying about anything working yet.
I have to break large tasks into smaller tasks, and limit the context and scope.
This is the thing that both Superpowers and Ralph [0] do well when they're orchestrating; the plans are broken down enough so that the actual coding agent instance doesn't get overwhelmed and lost.
It'll be interesting to see what Claude Code's new 1m token limit does to this. I'm not sure if the "stupid zone" is due to approaching token limits, or to inherent growth in complexity in the context.
[0] these are the two that I've experimented with, there are others.
Tell it to analyze your architecture, security, documentation, etc. etc. etc. Install claude to do review on github pull requests and prompt it to review each one with all of these things.
Just keep expanding your imagination about what you can ask it to do, think of it more like designing an organization and pinning down the important things and providing code review and guard rails where it needs it and letting it work where it doesn't.
I can’t say it’s led to shipping “high quality projects”, but it has let me accomplish things I just wouldn’t have had time for previously.
I’ve been wanting to develop a plastic -> silicone -> plaster -> clay mold making process for years, but it’s complex and mold making is both art and science. It would have been hundreds of hours before, with maybe 12 hours of Claude code I’m almost there (some nagging issues… maybe another hour).
And I had written some home automation stuff back with Python 2.x a decade ago; it was never worth the time to refamiliarize myself with in order to update, which led to periodic annoyances. 20 minutes, and it’s updated to all the latest Python 3.x and modern modules.
For me at least, the difference between weeks and days, days and hours, and hours and minutes has allowed me to do things I just couldn’t justify investing time in before. Which makes me happy!
So maybe some folks are “pretending”, or maybe the benefits just aren’t where you’re expecting to see them?
I don't know if I'd trust an LLM to teach an o-scope.
For AI I've been using Cecli which is cli and can actually run the compile step then fix any errors it finds - in addition to using Context7 MCP for syntax.
Not quite 10x yet but productivity has improved for me many times over. It's just how you use the tools available
But on bigger stuff, it bogs down and sometimes I feel like I’m going nowhere. But it gets done eventually, and I have better structured, better documented code. Not because it would be better structured and documented if I left it to its ow devices, but rather it is the best way to get performance out of LLM assistance in code.
The difference now is twofold: First, things like documentation are now -effortless-. Second, the good advice you learned about meticulously writing maintainable code no longer slows you down, now it speeds you up.
Can you elaborate a little bit on how you get the LLM to produce maintainable code? Any tricks other than better prompting?
These things have alerted been true, but now they also enable AI development so instead of accumulating technical debt for expedience sake, we get paid an efficiency subsidy in productivity for doing it right. ( or rather for herding the gerbils to do it right)
My mold project is around 10k lines of code, still small.
But I don’t actually care about whether LLMs are good or bad or whatever. All I care is that I am am completing things that I wasn’t able to even start before. Doesn’t really matter to me if that doesn’t count for some reason.
That’s so nebulous and likely just plain wrong. I have some experience with silicone molds and casting silicone and other materials. I have no idea how you’d accurately estimate it would take hundreds of hours. But the mostly likely reason you’ve had results is that you just did it.
This sounds very very much like confirmation bias. “I started drinking pine needle tea and then 5 days later my cold got better!”
I use AI, it’s useful for lots of things, but this kind of anecdote is terrible evidence.
I’m willing to believe that I’m just especially clueless and this is not a meaningful project to an expert. But hey, I’m printing plastic negatives to make silicone positives to make plaster negatives to slip cast, which is what I actually do care about.
You’re just talking about taking a positive 3d model and automatically creating a mold for it that you 3d print?
If so I wouldn’t want that to be algorithmic because that’s never going to work in the general case. There are just too many edge cases that you have to manually handle. Might as well just create the mold in your CAD program.
For example a lot of pro-OpenAI astroturfing really wanted you to know that 5.3 scored better than opus on terminal-bench 2.0 this week, and a lot of Anthropic astroturfing likes to claim that all your issues with it will simply go away as soon as you switch to a $200/month plan (like you can't try Opus in the cheaper one and realise it's definitely not 10x better).
so yeah, it wouldn't surprise me if it was well over most. I don't actually claim that it is over half here, I've run across quite a few of these kinds of people in real life as well. but it wouldn't surprise me.
Also all this stuff about Claude having feelings directed at midwits is hilarious
I KNOW a common issue people run into is they forget to handle rate limits, but I also know more JavaScript than Python and have limited time, so before I'd write:
``` # NOTE: Make sure to handle the rate limit! This is just an example. See example.com/docs/javascript/rate-limit-example for a js example doing this. ```
Unsurprisingly, more than half of customers would just ignore the comment, forget to handle the rate limit, and then write in a few months later. With Claude, I just write "Create a customer demo in Python that handles rate limits. Use example.com/docs/javascript/rate-limit-example as a reference," and it gets me 95% of the way there.
There are probably 100 other small examples like this where I had the "vibe" to know where the customer might trip over, but not the time to plug up all the little documentation example holes myself. Ideally, yes, hiring a full-time person to handle plugging up these holes would be great, but if you're resource constrained paying Anthropic for tokens is a much faster/cheaper solution in the short term.
They seem to fall apart (for me, at least) when the projects get larger or have multiple people working on them.
They're also super helpful for analytics projects (I'm a data person) as generally the needed context is much smaller (and because I know exactly how to approach these problems, it's that typing the code/handling API changes takes a bunch of time).
In this author's case, they currently work for a company that .. wait for it .. less than 2 weeks ago launched some "AI image generation built for teams" product. (Also, oddly, the author lists himself as the 'Technical Director' at the company, working there for 5-6 years, but the company's Team page doesn't list him as an employee).
Since last few months, I have seen a notable difference in the quality and extent of projects these students have been able to accomplish. Every project and website they show looks polished, most of those could be a full startup MVP pre AI days.
The bar has clearly been raised way high, very fast with AI.
Once we got them into a technical screening, most fell apart writing code. Our problem was simple: using your preferred programming language, model a shopping cart object that has the ability to add and remove items from the cart and track the cart total.
We were shocked by how incapable most candidates were in writing simple code without their IDEs tab completion capability. We even told them to use whatever resources they normally used.
The whole experience left us a little surprised.
For the former, greenfield projects, LLMs are easily a 10x productivity improvement. For the latter, it gets a lot more nuanced. Still amazingly useful in my opinion, just not the hands off experience that building from scratch can be now.
But the reason you don’t see a flood of great products is that the managerial layer has no idea what to do with massively increased productivity (velocity). Ask even a Google what they’d do with doubly effective engineers and the standard answer is to lay half of them off.
The headline gain is speed. Almost no-one's talking about quality - they're moving too fast to notice the lack.
That they are so good at the things I like to do the least and still terrible at the things at which I excel. That's just gravy.
But I guess this is in line with how most engineers transition to management sometime in their 30s.
usually when someone hypes it up it's things like, "i have it text my gf good morning every day!!", or "it analyzed every single document on my computer and wrote me a poem!!"
The "open secret" is that shipping stuff is hard. Who hasn't bought a domain name for a side project that didn't go anywhere. If there's anybody out there, raise your hand! So there's another filtering effect.
The crazy pills are thinking that HN is in any way representative of anything about what's going on in our broader society. Those projects are out there, why do you assume you'll be told about it? That someone's going to write an exposé/blog post on themselves about how they had AI build a thing and now they're raking in the dollars and oh, buy my course on learning how to vibecode? The people selling those courses aren't the ones shipping software!
I don't doubt that an LLM would theoretically be capable of doing these sorts of things, nor did I intend to give off that sentiment, rather I was more evaluating if it was as practical as some people seem to be making the case for. For example, a C compiler is very impressive, but its clear from the blog post[0] that this required a massive amount of effort setting things up and constant monitoring and working around limitations of Claude Code and whatnot, not to mention $20,000. That doesn't seem at all practical, and I wonder if Nicholas Carlini (the author of the Anthropic post) would have had more success using Claude Code alongside his own abilities for significantly cheaper. While it might seem like moving the goalpost, I don't think it's the same thing to compare what I was saying with the fact that a multi billion dollar corporation whose entire business model relies on it can vibe code a C compiler with $20,000 worth of tokens.
> The problem is people have egos, myself included. Not in the inflated sense, but in the "I built a thing a now the Internet is shitting on me and I feel bad" sense.
Yes, this is actually a good point. I do feel like there's a self report bias at play here when it comes to this too. For example, someone might feel like they're more productive, but their output is roughly the same as what it was pre-LLM tooling. This is kind of where I'm at right now with this whole thing.
> The "open secret" is that shipping stuff is hard. Who hasn't bought a domain name for a side project that didn't go anywhere. If there's anybody out there, raise your hand! So there's another filtering effect.
My hand is definitely up here, shipping is very hard! I would also agree that it's an "open secret", especially given that "buying a domain name for a side project that never goes anywhere" is such a universal experience.
I think both things can be true though. It can be true that these tools are definitely a step up from traditional IDE-style tooling, while also being true that they are not nearly as good as some would have you believe. I appreciate the insight, thanks for replying.
[0]: https://www.anthropic.com/engineering/building-c-compiler
You're thinking like an individual, not a corporation. $20,000 is a lot of money for me to go and pay the bill for as an individual. That's a car for most of America! However, if I'm earning $20,000/year at my job, that's peanuts. Thus Mr. Carlini (whom surely makes vastly more than $20,000/year) being able to do, what previously would have taken a team of people to do, is nothing short of astounding. I don't know how well the compiler stacks up against, say clang or gcc, the real question is how much did it cost Intel to make v0.1 of icc.
> For example, someone might feel like they're more productive, but their output is roughly the same as what it was pre-LLM tooling.
There is just no comparison. It's not about how much faster it is, it's about could I have attempted this project before? Yes. Would I have attempted it? Probably not! The start up cost for a project was just so high that I've a list of things that I'd love to attempt but never made the time for. With AI, I'm slowly knocking things off that list (most of them don't actually go anywhere, but there's an itch to scratch, as a hobby).
> not nearly as good as some would have you believe.
Hallucinations from LLMs are interesting as a concept, but they can hardly be blamed for it as they learned to ability from humans. (Some) humans love to blow smoke up your ass in pursuit of the all mighty dollar. LLMs have their limitations. There's some prognostication about the future, but I'm interested in what they can do today.
Thank you for the thoughtful response!
Also, there is nothing complex in a C compiler. As students we built these things as toy projects at uni, without any knowledge of software development practices.
Yet, to bring an example for something that's more than a toy project: 1 person coded this video editor with AI help: https://github.com/Sportinger/MasterSelects
> The reality: 3 weeks in, ~50 hours of coding, and I'm mass-producing features faster than I can stabilize them. Things break. A lot. But when it works, it works.
Even if it's not straight astroturfing I think people are wowed and excited and not analyzing it with a clear head
So, I've very little to publicly show for all my obnoxious LLM advocacy. I wonder if any others are in the same boat?
This is the challenge I also face, it's not always obvious when a change I want will be properly understood by the LLM. Sometimes it one shots it, then others I go back and forth until I could have just done it myself. If we have to get super detailed in our descriptions, at what point are we just writing in some ad-hoc "programming language" that then transpiles to the actual program?
Given time AI will lead to incredible productivity. In the meantime, use as appropriate.
I then ask it to do the same thing in java, and it spends a half hour trying to do the same job and gets caught in some bit of trivia around how to convert html escape characters, for instance, s.replace("<", "<").replace(">", ">").replace("\"").replace("""); as an example and endlessly compiles and fails over and over again, never able to figure out what it has done wrong, nor decides to give up on the minutia and continue with the more important parts.
There's been a lot of talk about it for the past few years but we're just not seeing impacts. Oh sure, management talk it up a lot, but where's the corresponding increase in feature delivery? Software stability? Gross profit? EBITDA?
Give me something measurable and I'll consider it.
A giant monorepo would be a bad fit for an LLM IMO.
I'm mostly a freeloader, so how could I judge people who put in the tokens equivalent to 15 years worth of electricity (incl heating and hot water) bills for my home in a C compiler?
Well, I can see that Anthropic is still an AI company, not a software company, they're granting us access to their most valuable resource that almost doesn't require humans, for a very reasonable fee, allowing us to profit instead of them. They're philanthropists.
It does also seem to me that there is a lot of variance in skills for prompting/using AI in general (I say this as someone who is not particularly good as far as I’m aware – I’m not trying to keep tips secret from you). And there is also a lot of variance in the ability for an AI to solve problem of equal difficulty for a human.
What makes the difference is that agents can create these instructions themselves and monitor themselves and revert actions that didn't follow instructions. You didn't fet there because you achieved satisfactory results with semi-manual solutions. But people who abhor manual are getting there already.
I used this line for a long time, but you could just as easily say the same thing for a typical engineer. It basically boils down to "Claude likes its tickets to be well thought out". I'm sure there is some size of project where its ability to navigate the codebase starts to break down, but I've fed it sizeable ones and so long as the scope is constrained it generally just works nowadays
It's the appearance of productivity, not actual productivity.
Which I think is what people gather from him, but somehow think he's hiding it or pretending is not the case? Which I find strange, given how openly he's talked about it.
As for his productivity going down over time, I think that's a combination of his videos getting bigger scopes and production values, and also he moving some of his time into some not so publicly visible ventures. E.g., he was one of the founders of Standard, which eventually became the Nebula streaming service (though he left quite a while ago now).
Well the person you're responding to didn't say anything like that. They're saying he's unqualified.
> The systems and habits are the ways he found to essentially trick himself into working.
And do they work? If he's failing or fooling himself then a big chunk of his podcasting is wasting everyone's time.
> videos getting bigger scopes and production values
I looked at a video from last year and one from eight years ago and they're pretty similar in production value. Lengths seem similar over time too.
> moving some of his time into some not so publicly visible ventures
I can see he's done three members-only videos in the last two years, in addition to four and a half public videos. Is there anything else?
When they said "It's the appearance of productivity, not actual productivity.", that does very much sound to me like an accusation that he is pretending or trying to deceive you into thinking he's a super productive person.
> And do they work? If he's failing or fooling himself then a big chunk of his podcasting is wasting everyone's time.
I'm afraid I'm not close enough to Mr Grey to be able to confidently say one way or another. Everything seems to indicate that he is a fairly successful individual, as a YouTuber with a big following and founder of at least two companies that seems to be going pretty well. So unless he is incredibly lucky and keeps failing upwards, if I had to guess, I'd say he has had at least some success in making himself work on stuff from time to time.
> I looked at a video from last year and one from eight years ago and they're pretty similar in production value. Lengths seem similar over time too
Really? I mean, let's look at some concrete examples. His latest video [1] features many unique drawings, extensive animations, even some 3d stuff with the rotating globes, and almost every scene has an actual drawn background layer.
Meanwhile, one of his biggest videos from 9 years ago [2] is pretty much just a slideshow, with no animations, and most of the video features a static generic white background.
The overarching style (i.e. stick figures, no elaborate textures) is the same, and I guess this is a partially a subjective point, but I think it's a bit crazy to say the visuals in these two videos are of similar quality.
For an example of stuff other than just the animation itself, he put out the Rock Paper Scissors video [3] two years ago, which had a pretty insane huge scope (though that might not be obvious at first glance)
> I can see he's done three members-only videos in the last two years, in addition to four and a half public videos. Is there anything else?
By definition, I'm not aware of stuff he's not made public. I just know that there is stuff that he chooses not to talk much about (he never once mentioned the Standard stuff on his podcast, for example). He also handles a good portion of the backend stuff for the Cortex Brand line of products (I think managing/planning logistics/inventory?). I'm not a member of his channel or his Patreon so I can't tell you how much he invests in exclusive videos, or if there is some other work he discloses over those channels that he doesn't in others.
[1] https://youtu.be/HSRmfNDk87s?si=lORnzazCWoe2X4Xa
That's not his most recent video, it's a fix of a 2022 video. And the channel still had pretty good output 3-4 years ago.
I compared the nickels video instead, to the worst ID system in America, and they seemed to be similar levels of embellished slideshow.
> By definition, I'm not aware of stuff he's not made public.
I thought you meant paid access stuff and it's easy to see a list of those. If you're suggesting secret videos then uh maybe but that's kind of a weird assumption.
And whatever happened with standard was too long ago to be the problem here.
> He also handles a good portion of the backend stuff for the Cortex Brand line of products (I think managing/planning logistics/inventory?).
That might be the answer but it seems like a waste of his productivity potential.
That's fair, I didn't notice that.
> I compared the nickels video instead, to the worst ID system in America, and they seemed to be similar levels of embellished slideshow.
He still has videos that are simpler. But back then he had nothing that came even close to those big productions he releases from time to time.
> I thought you meant paid access stuff and it's easy to see a list of those. If you're suggesting secret videos then uh maybe but that's kind of a weird assumption.
I'm suggesting he may work on stuff other than videos. Like non-general public facing/non personality driven businesses. Like Cortex Brand, and the Standard stuff before it. He obviously talked a lot about the Cortex Brand stuff, but he kept Standard on the down low. I don't cite Standard as a reason that he is not putting out videos right now, I cite Standard as evidence he isn't necessarily shouting from the rooftops every time he creates a business. So it stands to reason that he may have had other similarly "secret" ventures over the years.
> That might be the answer but it seems like a waste of his productivity potential.
I don't consume their products (they seem nice but they're far too expensive for my third world salary), so selfishly I'd also prefer if he focused more of his time on the videos. But that's an entirely different conversation from "he just pretends to be productive and actually gets next to nothing done".
interesting.
how much planning do you put into your project without AI anyway?
Pretty much all the teams I've been involved in:
- never did any analysis planning, and just yolo it along the way in their PR - every PR is an island, with tunnel vision - fast forward 2 years. and we have to throw it out and start again.
So why are you thinking you're going to get anything different with LLMs?
And plan mode isn't just a single conversation that you then flip to do mode...
you're supposed to create detailed plans and research that you then use to make the LLM refer back to and align with.
This was the point of the Ralph Loop
Tried to move some excel generation logic from epplus to closedxml library.
ClosedXml has basically the same API so the conversion was successful. Not a one-shot but relatively easy with a few manual edits.
But closedxml has no batch operations (like apply style to the entire column): the api is there but internal implementation is on cell after cell basis. So if you have 10k rows and 50 columns every style update is a slow operation.
Naturally, told all about this to codex 5.3 max thinking level. The fucker still succumbed to range updates here and there.
Told it explicitly to make a style cache and reuse styles on cells on same y axis.
5-6 attempts — fucker still tried ranges here and there. Because that is what is usually done.
Not here yet. Maybe in a year. Maybe never.
Yeah I have the same problem where it always uses smart quotes which messes up my compile. 8 told ChatGPT not to use them but it keeps doing it.
That being said, its great at generating boilerplate code or in my case, doing something like 'make a react component here please that does this small thing, and is aligned with the style in the rest of the file'. Good for when I need to work with code bases or technologies that are not my daily. Also a great research assistant.
But I guess being a 'better google' or a 'glorified spellchecker' doesn't get that hype money.
It also kinda feels gaslightish and as I've said in some controversial replies in other posts, its sort of eerily mass "psychosis" vibes just like during COVID.
All AI-IS-WONDERFUL stories are garbage-trash written by garbage people.
Fuck AI. Fuck HN AI promoters. Hopefully you all lose your jobs and fail in life.
Hardly before, now its almost three times a week. And never gets any questions on GPU amortization...
> This has truly freed up my productivity, letting me pursue so many ideas I couldn’t move forward on before
If you're writing in a blog post that AI has changed your life and let you build so many amazing projects, you should link to the projects. Somehow 90% of these posts don't actually link to the amazing projects that their author is supposedly building with AI.
I've got 10+ years of coding experience, I am an AI advocate, but not vibe coding. AI is a great tool to help with the boring bits, using it to initialize files, help figure out various approaches, as a first pass code reviewer, helping with configuring, those things all work well.
But full-on replacing coders? It's not there yet. Will require an order of magnitude more improvement.
I am using them in projects with >100kloc, this is not my experience.
at the moment, I am babysitting for any kloc, but I am sure they will get better and better.
I am sure there are ways to get around this sort of wall, but I do think it's currently a thing.
I built a skribbl.io clone to use at work. We like to play eod on Friday as a happy hour and when we would play skribbl.io we would try to get screencaps of the stupid images we were drawing but sometimes we would forget. So I said I'd use claude to build our own skribbl.io that would save the images.
I was definitely surprised that claude threaded the needle on the task pretty easily, pretty much single shot. Then I continued adding features until I had near parity. Then I added the replay feature. After all that I looked at the codebase... pretty much a single big file. It worked though, so we played it for the time being.
I wanted to fix some bugs and add more features, so I checked out a branch and had an agent refactor first. I'd have a couple context/sessions open and I'd one just review, the other refactored, and sometimes I'd throw a third context/session in there that would just write and run tests.
The LLM will build things poorly if you let it, but it's easy to prompt it another way and even if you fail that and back yourself into a corner, it's easy to get the agents to refactor.
It's just like writing tests, the llms are great at writing shitty useless tests, but you can be specific with your prompt and in addition use another agent/context/session to review and find shitty tests and tell you why they're shitty or look for missing tests, basically keep doing a review, then feed the review into the agent writing the tests.
> Somehow 90% of these posts don't actually link to the amazing projects that their author is supposedly building with AI.
You are in the 90%.
When you create a blog post about it though, I do agree that showing the projects will greatly increase the value of your claims.
That said, I do catch it doing some of the stuff the OP mentioned— particularly leaving “backwards compatibility” stuff in place. But really, all of the stuff he mentions, I’ve experienced if I’ve given it an overly broad mandate.
You also need a reasonably modular architecture which isn't incredibly interdependent, because that's hard to reason about, even for humans.
You also need lots and lots (and LOTS) of unit tests to prevent regressions.
Then let me introduce you to a useful concept:
I've learned with LLM coded apps to break stuff into very small manageable chunks so they can work on the tiny piece and not get screwed by big context.
For the most part, this actually produces a cleaner codebase.
Surely it depends on the design. If you have 10 10kloc modular modules with good abstractions, and then a 10k shell gluing them together, you could build much bigger things, no?
Then again the problem is that the public has learned nothing from the theranos and WeWorks and even more of a problem is that the vc funding works out for most of these hype trains even if they never develop a real business.
The incentives are fucked up. I’d not blame tech enthusiasts for being too enthusiastic
Might as well talk about how AI will invent sentient lizards which will replace our computers with chocolate cake.
Thinking usually happens inside your head.
What is your point?
If you’re trying to say that they should have kept their opinion to themselves, why don’t you do the same?
Edit: tone down the snark
Holy Spiderman what is your point? That if someone says something dumb I can never challenge them nor ask them to substantiate/commit?
> tone down the snark
It's amazing to me that the neutral observation "thinking happens in your head" is snarky. Have you ever heard the phrase "tone police"?
If the person who is liable for the system behavior cannot read/write code (as “all coders have been replaced”), does Anthropic et al become responsible for damages to end users for systems its tools/models build? I assume not.
How do you reconcile this? We have tools that help engineers design and build bridges, but I still wouldn’t want to drive on an “autonomously-generated bridge may contain errors. Use at own risk” because all human structural engineering experts have been replaced.
After asking this question many times in similar threads, I’ve received no substantial response except that “something” will probably resolve this, maybe AI will figure it out
The bridge scenario is simply addressed: Licensed Engineer has to approve designs. Permitting review process has to review designs. Not sure it matters who drafted them initially.
If the only point being made by “all coders are replaced” is that humans aren’t manually typing the code from their keyboard anymore, I don’t think there’s much interesting to argue there, typing the code was never the hard part.
If you spend a couple of years with an LLM really watching and understanding what it’s doing and learning from mistakes, then you can get up the ladder very quickly.
A "basic" understanding in critical domains is extremely dangerous and an LLM will often give you a false sense of security that things are going fine while overlooking potential massive security issues.
All I could think was, "good luck" and I certainly hope their app never processes anything important...
I don't feel like most providers keep a model for more than 2 years. GPT-4o got deprecated in 1.5 years. Are we expecting coding models to stay stable for longer time horizons?
Maybe they don't feel like sharing yet another half working Javascript Sudoku Solver or yet another half working AI tool no one will ever use?
Probably they feel amazed about what they accomplished but they feel the public won't feel the same.
That's the whole point of sharing with the rest of us. If they write for themselves, a private journal to track their progress, then there is no need to share what is actually been built. If they do though make grand claims to everybody then it would be more helpful for people who do read the piece to actually be able to see what has been produced. Maybe it's wonderful for the author but it's not the level of quality required for readers.
https://apps.apple.com/us/app/snortfolio/id6755617457
30kloc client and server combined. I built this as an experiment in building an app without reading any of the code. Even ops is done by claude code. It has some minor bugs but I’ve been using it for months and it gets the job done. It would not have existed at all if I had to write it by hand.
SHOW ME THE MONEY!!!
GPT-5.2 fixed my hanging WiFi driver: https://gist.github.com/lostmsu/a0cdd213676223fc7669726b3a24...
It's a magical moment when someone is able to AI code a solution to a problem that they couldn't fix on their own before.
It doesn't matter whether there are other people who could have fixed this without AI tools, what matters is they were able to get it fixed, and they didn't have to just accept it was broken until someone else fixed it.
Cue the folks saying "well you could DIE!!!" Not if I don't fix brakes, etc ...
Nobody is actually using AI for anything useful or THEY WOULDNT BE TALKING ABOUT IT. They’d be disrupting everything and making billions of dollars.
Instead this whole AI grift reads like “how to be a millionaire in 10 days” grifts by people that aren’t, in fact, millionaires.
Is it really to escape from "getting bogged down in the specifics" and being able to "focus on the higher-level, abstract work", to quote OP's words? I thought naively that engineering always has been about dealing with the specifics and the joy of problem solving. My guess is that the drive is toward power. Which is rather natural, if you think about it.
Science and the academic world
I have always failed to understand the obsessive dream of many engineers to become managers. It seems not to be merely about an increase in revenue.
Is it to escape from "getting bogged down in the specifics" and being able to "focus on the higher-level, abstract work", to quote OP's words? I thought naively that engineering has always been about dealing with the specifics and the joy of problem-solving. My guess is that the drive is towards power, which is rather natural, if you think about it.
Science and the academic world suffer a comparable plague.
And when you're in an existing company, stuck in thing X, knowing that it's obsolete, and the people doing the latest Y that's hot in the job market are in another department and jealously guard access to Y projects?
How about when you go to interview, and you not ONLY have to know Y, but the Leetcode from 15 years ago?
So maybe I've given you another alternative to 'it has to be power, there's no other rational reason to go into management'.
Here's a gentler one: if you want to build big things, involving many people, you need to be in management.
Do you enjoy brick laying and calculating angles around doorways? You're the engineer. Do you want to be the architect hiring engineers, working with project managers, and assessing the budget while worrying about approvals? They're different types of work, and it's not about 'power' like you are suggesting. Autonomy and decision-making power are more the 'power' engineers often don't get (unless they are lucky, very very smart or in a small startup-like environment).
I've gone back and forth across the lead and management lines many times now, and it is career limiting in many many ways. But it's too fulfilling to give up. And I swear there is magic in what small, expert groups are able to produce that laps large org on the regular.
Some research around British government workers found higher job satisfaction in units with hands-off managers. It resonates with my own career. I’m really excited and want to go to work when I’m on a small, autonomous team with little red tape and politics. Larger orgs simply can’t — or haven’t — ever offered me the same feeling; with some exceptions in Big 3 consulting if I was the expert on a case.
The worst manager is the micromanager - either because he's nervous about his job security, because he doesn't know how to delegate, or because he's been hands-on forever and can't let go.
Sadly, that meant I didn’t delegate enough, even if I let the team work on their own stuff. I’ve the same problem with the baby, I wanna do everything myself because of that stupid voice.
Reframed as “I’m just the better parent”, it sounds awful. Or “I’m just a better employee”. Maybe the micromanager (or the non-delegating manager) just can’t let go of that voice, that feeling. I’ll try to do better at the next job.
“Doing X is only part of the task. Getting baby/junior/employee to do X as independently as possible is also a critical requirement.”
This framing doesn’t allow you to think you’d do a better job because then the task is incomplete.
I don't see why it contradicts my little rant above. Of course I also prefer small, nimble teams with lots of autonomy, with individuals who thrive being delegated only extremely broad tasks. The only part where I think there's a difference is the constantly learning.
I love constantly learning. My issue isn't that. It's that I don't want to HAVE to constantly be practicing at home and on the weekend. I did this in my 20s and I can't/won't do this anymore. I just have no time or energy now as an Old.
For myself it is the hands-on work I find most fulfilling unfortunately. I have some sort of brain worm that makes me want to practice all the new things at home/weekend if work isn't letting me. I'm sure it'll burn me out at some point, but to paraphrase a famous creep: I keep getting older, my brainworm stays the same age.
Within my power I try to do that with my directs, making sure new interesting things are cycled in so their CVs become stronger. But me, personally, I've had really bad luck with this. I always had to study on the weekends for something that either isn't used in my company or someone else jealously guards because it's hot on the market.
> only to have it completely obsoleted a few years later
Not really. There aren’t as many fundamentally new ideas in modern tech as it may seem.Web servers have existed for more than 30 years and haven’t changed that much since then. Or e.g., React + Redux is pretty much the same thing as WinProc from WinAPI - invented some time in ~1990. Before Docker, there were Solaris Zones and FreeBSD jails. TCP/IP is 50 years old. And many, many other things we perceive as new.
Moreover, I think it’s worth looking back and learning some of the “old tech” for inspiration; there’s a wealth of deep and prescient ideas there. We still don’t have a full modern equivalent of Macromedia Flash, for example.
There are companies that are willing to consider general aptitude and transferable skills when hiring, but a vast majority compares candidates using checklists of technologies
I can't tell if this is sincere or parody, it is so insufferably wrong. Good troll. I almost bit.
Almost nothing goes obsolete in software; it just becomes unpopular. You can still write every website you see on the Internet with just jQuery. There are perfectly functional HTTP frameworks for Cobol.
These are inherently different levels of power. I'm not sure how your example is supposed to be the opposite when you compare someone laying bricks to someone making hiring and firing decisions about groups of people. Your scenario is fundamentally a power imbalance
I am scientist and worked from time to time as a research engineer merely to pay the bills, so I may see things differently. I always like doing lab / field work and first-hand data analysis. Many engineers I know would likely never stop tinkering and building stuff. It may be easier for a scientist than for an engineer to still get trilled, I don't know.
If only the world incentivized ICs with depth of knowledge to stay in those roles for the long haul instead of chopping off our knowledge of specificity at the apex of their depth of knowledge. So many managers have no talent, no depth of knowledge and a passable ability to manage people.
I concur: Perl taught me to mentally parse and build (complex) regexes, a highly transferable skill. The Lisp course I was taught in the late 80s, certainly helped me grok Clojure and find it a pretty natural fit. I think this is a very common trope.
No, you don’t. You need some kind of decision making and communication process but a separate management is not necessary.
Do you know what stank ranking even is and where it comes from? If you have to rate your group from 1 to 5, each individual, and you rate them all 4s and 5s, they crack down and force you to select a 2 and a 3 and only have one 5. Now, would you prefer a CFO, CTO or even a project manager be the one to do it? It's a weird comment.
Again, as an older manager today, I can see myself in my 20s in the resistance and stubbornness to 'how corporations work' espoused in comments like yours. I sympathize, but I warn you against being naive and ideological, because unfortunately human groups be human groups, and organizations for better or for worse behave in predictable patterns. You might as well know as much as possible so you can deal with it better.
That sure beats having it completely obsoleted a few weeks later, which sometimes feels like the situation with AI
It's a skill that takes practice -coordinating disparate people and groups, creating communication where you notice they're not talking to each other, creating or fixing processes that annoy or cause chaos if they're not there, encouraging people, being a therapist, seeing what's not there and pushing a vision while you get the group to go along, protecting people from management above and pressures around, etc are mostly skills that you learn.
Sometimes no one will give you feedback so you have to figure it out yourself (unless you're lucky to get a mentor), so you just have to throw yourself in and give yourself grace to fail and succeed over time.
The only skill of these I think is possibly genetic or innate, is being able to see the big picture and make strategical decisions. A lot of tech people skew cognitively in narrow areas, and have trouble conceptualizing the world beyond.
One challenge here is the ubiquitous 'managers just approve vacations and waste space' sentiment on here and in some places. These people are a chore to manage (and sometimes are better not being present in your group).
Real managers deal with coaching, ownership, feelings, politics, communication, consensus building, etc. The people who are good at it like setting other people up to win.
> I’d think there would be very, very few people who are actively seeking people drama
Theoretically as a manager you get the bump up the power dynamic ladder (and probably pay ladder) because you are taking on the responsibility of "people drama". Being a good manager is antithetical to treating living, breathing human beings as NPCs in a game.
In engineering the only teams that win are the teams that ship code. Dealing with coaching, ownership, feelings, politics, etc, should all arrive at the same outcome: ship code.
Shipping code is not the end goal of engineering. In fact, more code you ship more liability you have. Main goal is solving problems.
> Engineering is the practice of [...] solve problems within technology, increase efficiency and productivity, and improve systems. [0]
So far, the code Claude has generated looks fairly decent and stylistically not too different from what I would write myself.
That's the source of your difficulty. Research wu-wei.
Often too it's the architecture that can cause a grand idea to crash and burn—experienced devs should be moving toward solving those problems.
Like I’ve been in situations as an IC where poor leadership from above has literally caused less efficient and more painful day-to-day work. I always hoped I could sway those decisions from my position as an IC, but reality rarely aligned with that hope.
I actually love the details, but I just don’t get too deep into them these days as I don’t want to micro-manage.
I do find I have more say in things my team deals with now that I’m a manager.
I'm almost certain some of those I manage do think of me in this way. I try to explain my thought process and decision making to those I manage, and I am always looking for genuine critical feedback. I also own up to it when I've made a bad call. Overall, my anonymous ratings are pretty high, and my teams have seen exceedingly low turnover so I take those as good signs.
Now I'm back to being an IC and I just do the job. Want me to change this variable name so its more readable, in your opinion? No problem. I shall change const foo to const bar.
That can extend to arbitrary absurdity. You are probably not growing your own food, mining your own ore, forging your own tools, etc etc etc.
It's all just a matter of where you rely on external tools/abstractions to do parts of the work you don't want to do yourself.
It's frontier exploration that brings me joy. If a clanker can do something, then it's a solved problem. I use all the tools at my disposal to push the frontier of problems solved. Wasting my time re-inventing the wheel brings me the opposite of joy.
But I'm acutely conscious that in the 5+ years that I've been a senior developer, my ability to come up with useful ideas has significantly outstripped the time I have to realize those ideas (and from experience, the same is often true of academics).
At work, I have the choice between remaining hands-on and limiting what I can get done, or acting more like a manager, and having the opportunity to get more done, but only by letting other people do it, in ways that might not reflect my vision. It's pretty frustrating, to be honest.
For side projects, it's worse. Most of them just can't be done, because I don't even have the choice.
Not really for me. Programming is an effort type job. The more effort you put in the more you get out. True in other professions sure but multiplied with dev work. When became a dad everything changed. Solve hard problem or spend time with kid. I couldn't juggle the two. So i made a choice and fortunately had an opportunity to move into management.
Anyway full circle now I'm back to being a dev and this go around couldn't be easier with our ai agents. Point is I went into management because I was forced, not at all for power.
You want to write a book about people's deepest motivations. Formative experiences, relationships, desires. Society, expectations, disappointment. Characters need to meet and talk at certain times. The plot needs to make sense.
You bring it to your editor. He finds you forgot to capitalise a proper noun. You also missed an Oxford comma. You used "their" instead of "they're".
He sends you back. You didn't get any feedback about whether it makes sense that the characters did what they did.
You are in hell, you won't hear anything about the structure until you fix your commas.
Eventually someone invents an automatic editor. It fixes all the little grammar and spelling and punctuation issues for you.
Now you can bring the script to an editor who tells you the character needs more development.
You are making progress.
Your only issue is the Luddites who reckon you aren't a real author, because you tend to fail their LeetGrammar tests, calling you a vibe author.
I empathize and understand that not everyone has this experience.
I think it's that there is only that much demand for solving really complex problems, and doing the same thing over and over is boring, so management is the only way forward for many people
I was recently looking for mentors to work with him and advance his skills, targeting college aged kids / young 20s..
It was surprising to me how many people I came across in this field at this young age that are trying to focus on the "higher level" game planning aspects and not so much on the lower level implementation specifics.
https://www.youtube.com/playlist?list=PLnuhp3Xd9PYTt6svyQPyR...
https://guide.handmadehero.org/hmcon/
For me it's the other way around. Engineering was always a means to an end - I just want to build products. It was a creative artform more than a scientific endeavour.
You can't do that from a high level abstract position. You actually need to stand at the coal face and think about it from time to time.
This article encodes an entitled laziness that's destructive to personal skill and quality work.
A few years ago, when Agile was still the hot thing and companies had an Agile "facilitor" or manager for each dev team, the common career path I heard when talking to those people was: "I worked as a java/cobol/etc in the past, but it just didn't click with me. I'm more of a peoples person, you know, so project management is where I really do my best work!".
Yeah, right...
What type of code? What types of tools? What sort of configuration? What messaging app? What projects?
It answers none of these questions.
I'm now using pi (the thing openclaw is built on) and within a few days i build a tmux plugin and semaphore plugin^1, and it has automated the way _I_ used to use Claude.
The things I disagree with OP is: The usefulness of persistent memory beyond a single line in AGENTS.md "If the user says 'next time' update your AGENTS.md", the use of long-running loops, or the idea that everything can be resolved via chat - might be true for simple projects, but any original work needs me to design the 'right' approach ~5% of the time.
That's not a lot, but AI lets you create load-bearing tech-debt within hours, at which point you're stuck with a lot of shit and you dont know how far it got smeared.
And the fact this post has 300+ comments, just like countless LLM-generated articles we get here pretty much daily... I guess proves the point in a way?
I have an idea, agent turns it into a draft, depending on idea vagueness/complexity combination of: Looking for alternative, plan the change, look for alternative, split up into smaller drafts to drive separately, execute change (spec, code, tests), review change.
Usually its just: Draft, plan, exec, commit. The steps are flexible enough. Usually each step is a different agent, sometimes not. On complex builds or big changes, a planning agent itself might spawn subagents to avoid bloating its own context.
The progress is stored in: ./dev/{draft/<n>.md , wip/<n>/, fin/<n>/ }.
My `lead` pi has a separate AGENTS.md with how to organize the above sequence, and some notes on how to prompt, keep things small, etc. Note that its skill `tmux-coding-agrents` calls other pi instances (optionally set to codex). I've moved off the claude cli entirely.
I used to spend time telling claude not to forget updating the specs or building its tests because context bloat made them forget AGENTS.md, or to read certain files before it should execute a plan. The lead agent does this just fine now, and every time i see it make a mistake i say: "Next time do X" and it automatically updates its own or the worker agents AGENTS.md.
Because my lead agent context is all about managing this process it doesn't forget steps while its off chasing some bug.
Also, I build (but did not publish) a pi plugin that attempts to use other accounts on usage limits.
Most surprising moment I had, was my lead spawning a subagent, spawning a subagent, which spawned a tmux-bash build with very little prompting, and it was the right thing to do to prevent each agent from context bloat.
- the smaller context and default-off for thinking make it extremely fast (thinking is a dead-end bitter lesson imo)
- the source is available; and organized specifically to make it easy for agents to write and test extensions. Just launch pi in the pi-mono directory, and it will turn your idea into an extension in no time.
This is an AI generated post likely created by going to chatgpt.com and typing in "write a blogpost hyping up [thing] as the next technological revolution", like most tech blog content seems to be now. None of those things ever existed, the AI made them up to fulfill the request.
To add to this, OpenClaw is incapable of doing anything meaningful. The context management is horrible, the bot constantly forgets basic instructions, and often misconfigures itself to the point of crashing.
edit: love the downvotes. I guess HN really is Reddit now. You can make any accusation without evidence and people are supposed to just believe it. If you call it out you get downvoted.
Besides, if there are enough red flags that make it indistinguishable from actual AI slop, then chances are it's not worth reading anyway and nothing of value was lost by a false positive.
>Over the past year, I’ve been actively using Claude Code for development. Many people believed AI could already assist with programming—seemingly replacing programmers—but I never felt it brought any revolutionary change to the way I work.
Funny, because just last month, HN was drowning in blog posts saying Claude Code is what enables them to step away from the desk, is definitely going to replace programmers, and lets people code "all through chatting on [their] phone" (being able to code from your phone while sitting on the bus seems to be the magic threshold that makes all the datacenters worth it).
But we are also easier to impress: only we understand how difficult it is to one-shot code a working app.
AI psychosis sets in when AI captures enough of your perception to alter your reality. Programmers might be "smarter", but we give AI a bigger set of tools to capture our perception with.
And then there's the fact that we want to be fooled.
(Not necessarily this specific post).
It's like we all fell under the spell of a terminal endlessly printing output as some kind of measurement of progress.
I just give the link to those posts to my AI to read it, if it's not worth a human writing it, it's not worth a human reading it.
The only software I've seen designed and implemented by OpenClaw is moltbook. And I think it is hard to come up with a bigger pile of crap than Moltbook.
If somebody can build something decent with OpenClaw, that would help add some credibility to the OpenClaw story.
They are not able to comprehend that for anything more complicated than that, the code might compile, but the logical errors and failure to implement the specs start piling up.
Grok 4 Fast told me its own internal system prompt has rules against autonomous operation, so that might have something to do with it. I am having decent results with it though.
I've noticed Claude shutting down conversations on the same subject. It says "you may continue this conversation with an older model." ChatGPT also got extremely uncomfortable talking about it, and refuses to build anything in that direction, which I find amusing since its ancestor GPT-4 built a self-modifying Python programmer in 2023.
(Also, OpenClaw meets GPT-5's definition for "extremely dangerous AI software" due to the self-modification factor. But I think that applies trivially to any agent, so...)
For me the pain point has always been with non-IT people/companies. They are way more accustomed with phone or even in person appointments. They in general have way more of a say than me, the customer.
Can Openclaw make and take phone calls for me to make appointments? Can Openclaw do chores for me? Can Openclaw meet with contractors for me? None of them it can do. It can make notes for me (useless as most notes are useless). It can scrap websites for me (not very interesting as why would I want to collect so much knowledge?). It can probably automate anything that already has an endpoint or whatever, but I don’t mind write code for my own projects. I always failed to understand why anyone would want to let AI write most of the code of their PERSONAL project — unless they want to sell them quickly.
I’m just a frustrated old man I guess.
[0] https://vapi.ai/
> I’m just a frustrated old man I guess.
I think this is a great summary of the failure of vision that a lot of tech people are having right now.
> automate anything that already has an endpoint or whatever
Facebook used to have API's, Reddit used to have API's, amazon used to have API's
They are gone.
Enshitification and dark patterns have taken over.
"Hey open claw, cancel service xxx" where XXX is something that is 17 steps and purposely hard to cancel so they keep your money.
What's going to happen when your AI tool can go to a website and strip the ad's off and return you just the text? What happens when it can build a customized news feed that looks less like Facebook and more like HN? Aren't we just gaining back function we lost with the death of RSS?
Consumers are mad about the hype of AI but the moment that it can cut through the bullshit we keep putting in their way it's going to wreck business MODELS, and the choice will be adapt or die. Start asking your "AI" tools to do all the basic, tedious bullshit tasks that are low risk (you have a ton of them) and if it gets 1/4 of them done your going to free up a ton of your own time.
[1] https://reorx.com/blog/rabbit-r1-the-upgraded-replacement-fo...
I tried using LLMs to help debug at different points, but they went in circles on bad ideas, even when I gave them what turned out to be a correct clue.
Root cause turned out to be that IPv6 wasn't enabled for Docker networking, but was enabled for the websites DNS. So people who connected over IPv6 were getting their IPs all converted to the same internal Docker IP before being handed to the per-IP throttling algorithm.
I spotted that there were no IPv6 IPs in the logs, but the LLMs missed that the key pattern was the absence of something expected, instead drawing wrong conclusions.
So no, I'm not about to turn OpenClaw loose on building anything at all complex.
> OpenClaw gave me the chance to become that super manager [...] A manager shouldn’t get bogged down in the specifics—they should focus on the higher-level, abstract work
These two propositions seem to be highly incompatible
Honestly I'd rather die
1. It has a lot of files that it loads into it's context for each conversation, and it consistently updates them. Plus it stores and can reference each conversation. So there's a sense of continuity over time.
2. It connects to messaging services and other accounts of yours, so again it feels continuous. You can use it on your desktop and then pick up your phone and send it an iMessage.
3. It hooks into a lot of things, so it feels like it has more agency. You could send it a voice message over discord and say "hey remember that conversation about birds? Send an email to Steve and ask him what he thinks about it"
It feels more like a smart assistant that's always around than an app you open to ask questions to.
However, it's worth stressing how terrible the software actually is. Not a single thing I attempted to do worked correctly, important issues (like the discord integration having huge message delays and sometimes dropping messages) get closed because "sorry we have too many issues", and I really got the impression that the whole thing is just a vibe coded pile of garbage. And I don't like to be that critical about an open source project like this, but I think considering the level of hype and the dramatic claims that humans shouldn't be writing code anymore, I think it's worth being clear about.
Ended up deleting it and setting up something much simpler. I installed a little discord relay called kimaki, and that lets me interact with instances of opencode over discord when I want to. I also spent some time setting up persistent files and made sure the llm can update them, although only when I ask it to in this case. That's covered enough of what I liked from OpenClaw to satisfy me.
if one of my friends sent me an obviously AI-written email, I think that I would cease to be friends with them...
Ah, so it's a device for irritating Steve, got it.
Isn’t the “what he thinks about it” part the hardest? Like, that’s what I want to phrase myself - the part of the conversation I’d like to get their opinion on and what exactly my actual request is. Or are people really doing the meme of sending AI text back and forth to each other with none the wiser?
For personal communication between friends it would be horrible. Authenticity has to be one of the things I value most about the people I know. Didn't mean to imply from that example that I did or would communicate that way.
https://github.com/a-n-d-a-i/ULTRON
Well, it's a work in progress, but I have self-upgrading and self-restarting working, and it's already more reliable than Claw ;)
I used the Claude Code SDK (Agents SDK) originally, but then realized I can get the same result by just calling `claude -p the_telegram_message`
The magic sauce being the --continue flag, of course. Bit less useful otherwise.
I haven't figured out how to interrupt it or see what it's doing yet though.
Well, that and skills to download more skills. It’s a lot faster and easier to extend OC than CC via prompts. It also has cron and other take-initiative features.
I had it hack up a poller for new Gitea notifications (for @ mentions and the like) that wakes up the main bot when something happens, so I have it interacting with a self hosted Gitea. There wasn’t even a Gitea skill for it, it just constructs API requests “manually” each time it needs to do something on it. I guess it knows the Gitea API already. It knew how to make a launchd plist and keep the poller running, without me asking it to do that. It’s a little more oriented toward getting things going and running than CC, which mostly just wants to make commits.
> Generally, I believe [Rabbit] R1 has the potential to change the world. This is a thought that seldom comes to my mind, as I have seen numerous new technologies and inventions. However, R1 is different; it’s not just another device to please a certain niche. It’s meticulously designed to serve one significant goal for all people: to improve lifestyle in the digital world.
I don't know about this; or at least, in my experience, is not a what happens with good managers.
I guess best managers just develop the hunch and know when to do this and when to ask engineers for smallest details to potentially develop different solutions. You have to be technical enough to do this
And me ruining my day fighting with a million hooks, specs and custom linters micromanaging Claude Code in the pursuit of beautiful code.
That would be really helpful.
Why isn't Claude doing all that for me, while I code? Why the obsession that we must use code generation, while other gabage activities would free me to do what I'm, on paper, paid to do?
It's less sexy of course, it doesn't have the promise of removing me in the end. But the reason, in the present state, is that IT admins would never accept for an llm to handle permissions, rotations, management would never accept an llm to report status or provide estimate. This is all "serious" work where we can't have all the errors llm create.
Dev isn't that bad, devs can clean slop and customers can deal with bugs.
Good luck hoping that none from the big money would try to stand between you and someone giving you a service (uber, airbnb, etsy, etc) and get rent from that.
Claude, fix the toilet.
I'm not running OpenClaw, but I've given Claude its own email address and built a polling loop to check email & wake Claude up when I've sent it something. I'm finding a huge improvement from that. Working via email seems to change the Claude dynamic, it feels more like collaborating with a co-worker or freelancer. I can email Claude when I'm out of the house and away from my computer, and it has locked down access to use various tools so it can build some things in reply to my emails.
I've been looking into building out voice memos or an Eleven Labs setup as well, so I can talk to Claude while I'm out exercising, washing dishes etc. Voice memos will be relatively easy but I haven't yet got my head around how to integrate Eleven Labs and work with my local data & tools (I don't want a Claude that's running on Eleven Labs servers).
What made it so popular I think is that it made it easy to attach it to whatever "channel" you're comfortable with. The mac app comes with dictation, but unsure the amount of setup to get tts back.
I feel like there's this "secret" hiding behind all these AI tools, that actually it's all very complicated and takes a lot of effort to make work, but the tools we're given hides it all. It's nice that we benefit from its simplicity of use. But hiding complexity leads to unexpected problems, and I'm not sure we've seen any of those yet - other than the massive, gaping security hole.
So, OpenClaw has changed his life: It has accelerated the AI psychosis.
Regardless of how you isolate the OpenClaw instance (Mac Mini, VPS, whatever) - if it’s allowed to browse the web for answers then there’s the very real risk of prompt injection inserting malicious code into the project.
If you are personally reviewing every line of code that it generates you can mitigate that, but I’d wager none of these “super manager” users are doing that.
I saw on The Verve that they partnered with the company that repeatedly disclosed security vulnerabilities to try to make skills more secure though which is interesting: https://openclaw.ai/blog/virustotal-partnership
I’m guessing most of that malware was really obvious, people just weren’t looking, so it’s probably found a lot. But I also suspect it’s essentially impossible to actually reliably find malware in LLM skills by using an LLM.
A Reddit post with white invisible text can hijack your agent to do what an attacker wants. Even a decade or 2 back, SQL injection attacks used to require a lot of proficiency on the attacker and prevention strategies from a backend engineer. Compare that with the weak security of so called AI agents that can be hijacked with random white text on an email or pdf or reddit comment
It cannot. This is the security equivalent of telling it to not make mistakes.
> Restrict downstream tool usage and permissions for each agentic use case
Reasonable, but you have to actually do this and not screw it up.
> Harden the system according to state of the art security
"Draw the rest of the owl"
You're better off treating the system as fundamentally unsecurable, because it is. The only real solution is to never give it untrusted data or access to anything you care about. Which yes, makes it pretty useless.
I have OPA and set policies on each tool I provide at the gateway level. It makes this stuff way easier.
It does not. Security theater like that only makes you feel safer and therefore complacent.
As the old saying goes, "Don't worry, men! They can't possibly hit us from this dist--"
If you wanna yolo, it's fine. Accept that it's insecure and unsecurable and yolo from there.
I never want to be one wayward email away from an AI tool dumping my company's entire slack history into a public github issue.
And 99% those AI-created "amazing projects" are going to be dead or meaningless in due time, rather sooner than later. Wasted energy and water, not to mention the author's lifetime.
The "supervisor" workflow mentioned by others in this thread (using one agent to manage multiple worker agents) is exactly where the industry is heading. It turns the human from a "vibe coder" into an architect who manages state and requirements while the agents handle the implementation "beads".
If you're hitting the "stupid zone" on larger tasks, try breaking the plan into smaller, specific markdown specs first. OpenClaw's ability to "interview" a codebase and then implement from those specs in commit-sized chunks is a game changer for non-trivial monorepos.
> Rabbit R1 - The Upgraded Replacement for Smart Phones
Kinda hard to take anything here seriously.
- Because the seasoned developers have something entirely different to say https://www.xda-developers.com/please-stop-using-openclaw/
- Also please stop spamming HN with this stuff
https://github.com/PSPDFKit-labs/nutrient-openclaw -
The skill is here as well if you prefer a skill - https://clawhub.ai/jdrhyne/nutrient-openclaw
I let it run in a VM on my desktop and I can check on its progress and provide feedback any time. Only took a few iterations of telling it to tweak its workflow to land on something very productive. Doesn't work for everything but it covers a lot of my work.
This has been a significant aspect of ai use as well. As a result a feel a little less friction with myself, less that I am letting things slip by because, well, because I still want a nice balance to work, life, leisure, etc. I don’t want to overstate things, it’s not a cure all for any of these things, but it helps a lot.
Don't compare your day 1 with some one's day 100
The free versions are toys
Also, Codex isn't a model, so you don't even understand the basics.
And you spent "several hours" on it? I wish I could pick up useful skills by flailing around for a few hours. You'll need to put more effort into learning how to use CLI agents effectively.
Start with understanding what Codex is, what models it has available, and which one is the most recent and most capable for your usage.
So, it appears that we have come a long way bubbling up through abstraction layers: assembly code -> high-level languages -> scripting -> prompting -> openclaw.
> Generally, I believe (Rabbit) R1 has the potential to change the world.
There is a pattern here.
If you delegate these tasks to OpenClaw, I am not really sure the result is exactly what you want to achieve and it works like you want it to.
> Then OpenClaw came along, and everything changed.
> After a few rounds of practice, I found that I could completely step away from the programming environment and handle an entire project’s development, testing, deployment, launch, and usage—all through chatting on my phone.
So, with Claude Code, you're stuck typing in a chat box. Now, with OpenClaw, you can type in a chat box on your phone? This is exciting and revolutionary.
What I really wonder, is who the heck is upvoting this slop on hackernews?
Articles like these should be flagged, and typically would be, but they sometimes appear mysteriously flag-proof.
So many wealthy players invested the outcome, and the technology for astroturfing (LLMs) can ironically be used to boost itself and further its own development
I haven't been able to find a good use for myself yet. Almost everything I use an LLM for has some kind of hard human-in-the-loop factor that is as of yet inescapable -- but I also don't really use LLMs for things like "sort my email.". mostly entirely coding.
Some are learning the hard way why you shouldn't do that having to hire freelancer developers the fix their entire code.
I spoke with a friend who is also in IT, the company he works for is full on into AI, everything is done or managed by AI, they only hit the button. Dude was describing their infrastructure and projects like if AI was a God.
Those are gonna be the first ones to fall, because they aren't using AI to improve their work, they are using AI to completely take over, full access to projects, full access to infrastructures, you name it.
Once we get to a spot where the AI can check its work and iterate, the loop is closed. But we are a long way off from that atm. Even for the web. I mean, have you tried the Playwright MCP server? Aside from being the slowest tool calls I have ever seen, the agent struggles mightily to figure out the simplest of navigation and interaction.
Yes yes Unit tests, but functional is the be all end all and until it can iterate and create its own functional test suite, I just don’t get it.
What am I missing?
oh man this is fantastic
Something tells me they never even downloaded OpenClaw before writing this blog post. It’s probably an aspirational vision board type post their life coach told them to write because they kept talking about OepnClaw during their sessions, and the life coach got tired of their BS.
No desire to be a hater or ignore the possibility of any tech but…yeah…transformative that was not
It's a racket never ends.
It is a constant lure products and tools have to create the feeling of sensemaking. People want (pejorative) tools that show visualizations or summaries, without thinking about the particular visual/summary artifact is useful, actionable or accurate!
They (or their devs) are not at fault that some people honestly believe you can't be as productive or consistent without a "thought garden" or whatever.
It only becomes problematic if the “good” thing also indulges in the hubris of influencers because they view it as good marketing. Like when an egg farm leans in “orange yolk”
Maybe it's unfair to judge an author's current opinion by their past opinion - but since the piece is ultimately an opinion based on their own experience I'm going to take it along a giant pile of salt that the author's standards for the output of AI tools are vastly different than mine.
The last time I talked to someone about OpenClaw and how it is helping them, they told me it tells them what their calendar has for them today or auto-tweets for them (i.e., non-human spam). The first is as simple as checking your calendar, and the second is blatant spam.
Anyone found some good use cases beyond a better interface for AI code assistance?
This should be the opening for every post about the various "innovations" in the space.
Preferably with a subsequent line about the manual process that was worth putting the extra effort into prior to the shiny new thing.
I really can imagine a better UX then opening my calendar in one-click and manual scanning.
Another frequent theme is "tell me the weather." One again, Google home (alexa or whatever) handles it while I'm still in bed and let's me go longer without staring at a screen.
The spam use-case is probably the best use-case I've seen, as in it truly saves time for an equal or better result, but that means being cool with being a spammer.
Their example use case was for it to read and summarize our Slack alerts channel to let us know if we had any issues by tagging people directly... the Slack channel is populated by our monitoring tools that also page the on-call dev for the week.
The kicker... this guy was the on-call dev that week and had just been ignoring the Slack channel, emails and notifications he was getting!
I'm not running openclaw itself. I am building a simpler version that I trust and understand a lot more but ostensibly it's just another always on Claude code wrapper.
I can't come up with any other explanation for why there seems to be so many people claiming that AI is changing their life and workflow, as if they have a whole team of junior engineers at their disposal, and yet have really not that much to show for it.
They're so white collar-pilled that they're in utter bliss experiencing a simulation of the peak white collar experience, being a mid-level manager in meetings all day telling others what to do, with nothing tangible coming out of it.
To be specific, for the past year I've been having numerous long conversations about all the books I've read. I talk about what I liked, didn't like, the ideas and and plots I found compelling or lame, talks about the characters, the writing styles of authors, the contemporary social context the authors might have been addressing, etc. Every aspect of the books I can think off. Then I ask it for recommendations, I tell it given my interests and preferences, suggest new books with literary merit.
ChatGPT just knocks this out of the park, amazing suggestions every time, I've never had so much fun reading than in the past year. It's like having the world's best read and most patient librarian at your personal disposal.
My experience with plain Claude Code is that I can step back and get an overview of what I'm doing, since I tend to hyperfocus on problems, preventing me from having a simultaneous overview.
It does feel like being a project manager (a role I've partially filled before) having your agency in autopilot, which is still more control than having team members do their thing.
So while it may feel very empowering to be the CEO of your own computer, the question is if it has any CEO-like effect on your work.
Taking it back to Claude Code and feeling like a manager, it certainly does have a real effect for me.
I won't dispute that running a bunch of agents in sync won't give you an extension of that effect.
The real test is: Do you invoice accordingly?
I'm waiting for the grift!
Well... no. But I do really like it. It's just an always-on Claude you can chat with in Telegram, that tries to keep context, that has access to a ton of stuff, and it can schedule wakeup times for itself.
Yesterday, I saw a demo of a product similar to OpenClaw. It can organize your files and directories and works really great (until it doesn't, of course). But don't worry, you surely have a backup and need to test the restore function anyway. /s
Edit:
So far, I haven’t found a practical use case for this. To become truly useful, it would need access to certain resources or data that I’m not comfortable sharing with it.
Yes I think it is
And one sided media does as weil. Or do you expect Fox News to publish an unbiased report just next?
Even so, I still believe the Rabbit has its merits. This does not conflict with my view that OpenClaw is what is truly useful to me.
> R1 is definitely an upgraded replacement for smartphones. It’s versatile and fulfills all everyday requirements, with an interaction style akin to talking to a human.
You seemed pretty certain about how the product worked!
We're allowed to have opinions about promises that turn out not to be true.
If the rabbit had been what it claimed it would be, it would have been an obvious upgrade for me, at least.
I just want a voice-first interface.
The most charitable thing you can say about this is they're naive, ignorant of the history of vapourware 'demoed' at trade shows.
> Today, Rabbit R1 has been released, and I view it as a milestone in the evolution of our digital organ.
You viewed it as a “milestone in the evolution of our digital organ” without you let alone anyone having even tested it?
Yet you say ”That article was written when the Rabbit R1 presentation video was first released, I saw it and immediately reflect my thoughts on my blog.”?
It's the endgame.
Even then, the architecture will be horrible unless you chat _a lot_ about it upfront. At some point, it’s easier to just look in the terminal.
Click bait at its peak.
Poe's law strikes... I can't tell if this is satire.
https://reorx.com/blog/rabbit-r1-the-upgraded-replacement-fo...
I hope at some point there will be a medical research into this hysteria.
getting sick of this fluff stuff
Agents work but still mostly produce slop.
Press [Space] to skip
There's not a single real example, and it even has all the em-dashes intact.
I quite like it just from the simple perspective that its a local LLM provider that's available to chat with in tons of apps I already use (e.g. Discord); its a good reduction in the number of parties who are privy to these conversations. I'm not sure if there's another system out there that's so plug-and-play, with so many options for conversation (Discord, Telegram, text, self-hosted web ui, etc).
But the tool calling is vastly overblown. It takes forever to get them set up, and that's to get them barely working. Bluebubbles has always been an ish app whose reverse engineering of the iMessage protocol is more likely to break on every macOS upgrade than do what you want it to do; and OpenClaw's iMessage integration is built on it. I've not yet gotten a Spotify skill to work (though I'm not sure what I'd do with it when I have one); the models just run in circles saying "it should be set up, ope its not, spotify_player sucks, lets try spt, wait that isn't working, lets try ncspot, why isn't this working". The "gog" tool is interesting, its a CLI-based tool for accessing data in your google account, it works alright, though OpenClaw's icon for the tool in their repository is a game controller icon; I suspect a mistaken, likely vibed, reference to the unrelated GOG/Good Ol' Games PC game store. What a mess. I could go on.
The cheaper models critically struggle to grep the full array of tools they have available to them. Kimi K2.5 exhibits this behavior where it will reiterate that it does not have access to my calendar, but usually if I ask it four or five times in a row, eventually it will claim it "discovered" the gog/Google Calendar tool in a hidden sub-directory (what?). Even with more intelligent models, like Opus or 5.2/5.3, the tools oftentimes need to be invoked with highly specific verbiage; "what's on my calendar" might work if you're lucky, but "use gog to fetch my calendar and display today's events" usually works.
I oftentimes just don't see the point. I can click the Gmail or Google Calendar app on my phone and get what I need out of those apps in less-than 6 seconds; it would take longer for me to dictate the exact phrasing to get what I need out of OpenClaw, let alone type it. I can see some argument for cross-operating on data between two apps, but getting that to work without paying Anthropic fifty cents for every query is even rarer. When I need an LLM to operate on my Obsidian notes, I can just use Claude Code or OpenCode... why do I need OpenClaw?
(I am genuinely open minded here; but articles like this just dance around high-minded abstract ideas of "im a super ai manager im so productive" without giving concrete examples. My suspicion is that the people who write these things were previously deeply unproductive people, and now AI has enabled them to achieve a mere fraction of the productivity that most of us already had.)
(And that's being generous. I think there's also a lot of grifters out there. I'll have to fire a stray at Cloudflare for this one: They've published a "get OpenClaw working on Cloudflare" repo where, if you set it up, would straight up cost you $50-$60, maybe $100/month; and they lie [1] about the cost in their own documentation. And you're paying that in addition to the LLM cost. Very bad look from a company I admire.)
[1] https://github.com/cloudflare/moltworker/issues/76#issuecomm...
For the impatient, here's a transcript summary (from Gemini):
The speaker describes creating a "virtual employee" (dubbed a "replicant") running on a local server with unrestricted, authenticated access to a real productivity stack—including Gmail, Notion, Slack, and WhatsApp. Tasked with podcast production, the agent autonomously researched guests, "vibe coded" its own custom CRM to manage data, sent email invitations, and maintained a work log on a shared calendar. The experiment highlights the agent's ability to build its own internal tools to solve problems and interact with humans via email and LinkedIn without being detected as AI.
He ultimately concludes that for some roles, OpenClaw can do 90%+ of the work autonomously. Jason controversially mentions buying Macs to run Kimi 2.5 locally so they can save on costs. Others argue that hosting an open model on inference optimized hardware in the cloud is a better option, but doing so requires sharing potentially sensitive data.Did Jason ever mentioned this in the episode, can you ask gemini?
(At some point he seems to have gone from professionally-wrong-about-everything blogger to magical-podcast-thought-leader. I have no idea how this happened.)