* People using it as a tool, aware of its limitations and treating it basically as intern/boring task executor (whether its some code boilerplate, or pooping out/shortening some corporate email), or as tool to give themselves summary of topic they can then bite into deeper.
* People outsourcing thinking and entire skillset to it - they usually have very little clue in the topic, are interested only in results, and are not interested in knowing more about the topic or honing their skills in the topic
The second group is one that thinks talking to a chatbot will replace senior developer
And this may be fine in certain cases.
I'm learning German and my listening comprehension is marginal. I took a practice test and one of the exercises was listening to 15-30 seconds of audio followed by questions. I did terribly, but it seemed like a good way to practice. I used Claude Code to create a small app to generate short audio (via ElevenLabs) dialogs and set of questions. I ran the results by my German teacher and he was impressed.
I'm aware of the limitations: Sometimes the audio isn't great (it tends to mess up phone numbers), it can only a small part of my work learning German, etc.
The key part: I could have coded it, but I have other more important projects. I don't care that I didn't learn about the code. What I care about is I'm improving my German.
> Group 1: intern/boring task executor
Yup, that makes sense I'm in group 1.
> Group 2: "outsourcing thinking and entire skillset to it - they usually have very little clue in the topic, are interested only in results"
Also me (in this case), as I'm outsourcing the software development part and just want the final app.
Soo... I probably have thought too much about the original proposed groups. I'm not sure they are as clear as the original suggests.
The word "thinking" can be a bit nebulous in these conversations, and critical thinking perhaps even more ambiguously defined, so before we discuss that, we need to define it. I go with the Merriam-Webster definition: the act or practice of thinking critically (as by applying reason and questioning assumptions) in order to solve problems, evaluate information, discern biases, etc.
LLMs seem to be able to mimic this, particularly to those who have no clue what it means when we call an LLM a "stochastic parrot" or some equally esoteric term. At first I was baffled that anyone really thought that LLMs could somehow apply reason or discern its own biases but I had to take a step back and look at how that public perception was shaped to see what these people were seeing. LLMs, generative AI, ML, etc are all extremely complex things. Couple that with the pervasive notion that thinking is hard and you have a massive pool of consumers who are only too happy to offload some of that thinking on to something they may not fully understand but were promised that it would do what they wanted, which is make their daily lives a bit easier.
We always get snagged by things that promise us convenience or offer to help us do less work. It's pretty human to desire both of those things, but proving to be an Achilles Heel for many. How we characterize AI determines our expectations of it; so do you think of it as a bag of tools you can use to complete tasks? Or is it the whole factory assembly line where you can push a few buttons and an pseudo-finished product comes out the other side?
From my perspective the distinction is more on the supply side and we have two generations of AI tools. The first generation was simply talking to a chatbot in a web UI and it's still got its uses, you chat and build up a context with it, it's relying heavily on its training data, maybe it's reading one file.
The second generation leans into RAG and agentic capabilities (if you can glob and grep or otherwise run a search, congrats you have v1 of your RAG strategy). This is where Gemini actually scans all the docs in our Google Workspace and produces a proposal similar to ones we've written before. (Do we even need document templates anymore?) Or where you start a new programming project and Claude can write all the boilerplate, deploy and set up a barebones test suite within a couple of minutes. There's no doubt that these types of tools give us new capabilities and in some cases save a lot more time than just babbling into chatgpt.com.
I think this accounts for a lot of differences in terms of reported productivity by the sane users. I was way less enthusiastic about AI productivity gains before I discovered the "gen 2" applications.
* people who use it instead of search engines.
* people who use it as a doctor/therapist/confidant. Not to research. But as a practitioner.
There are others:
* people who use it instead of man pages or documentation.
* people who use it for short scripts in a language they don't quite understand but "sorta kinda".
And the first group thinks that these tools will enable them to replace a whole team of developers.
I think AI is just allowing everyone to speed-run the innovator's dilemma. Anyone can create a small version of anything, while big orgs will struggle to move quickly as before.
The interesting bit is going to be whether we see AI being used in maturing those small systems into big complex ones that account for the edge cases, meet all the requirements, scale as needed, etc. That's hard for humans to do, and particularly while still moving. I've not see any of this from AI yet outside of either a) very directed small changes to large complex systems, or b) plugins/extensions/etc along a well define set of rails.
When I needed to bash out a quick Hashicorp Packer buildfile without prior experience beyond a bit of Vault and Terraform, local AI was a godsend at getting me 80% of the way there in seconds. I could read it, edit it, test it, and move much faster than Packer’s own thin “getting started” guide offered. The net result was zero prior knowledge to a hardened OS image and repeatable pipeline in under a week.
On the flip side, asking a chatbot about my GPOs? Or trusting it to change network firewalls and segmentation rules? Letting it run wild in the existing house of cards at the core of most enterprises? Absolutely hell no the fuck not. The longer something exists, the more likely a chatbot is to fuck it up by simple virtue of how they’re trained (pattern matching and prediction) versus how infrastructure ages (the older it is or the more often it changes, the less likely it is to be predictable), and I don’t see that changing with LLMs.
LLMs really are a game changer for my personal sales pitch of being a single dinosaur army for IT in small to medium-sized enterprises.
Honestly the absolute revolution for me would be if someone managed to make LLM tell "sorry I don't know enough about the topic", one time I made a typo in a project name I wanted some info on and it outright invented commands and usages (that also were different than the project I was looking for so it didn't "correct the typo") out of thin air...
This is essentially what I'm doing too but I expect in a different country. I'm finding it incredibly difficult to successfully speak to people. How are you making headway? I'm very curious how you're leveraging AI messaging to clients/prospective clients that doesn't just come across as "I farm out work to an AI and yolo".
Edit - if you don't mind sharing, of course.
"I could make that in a weekend"
"The first 80% of a project takes 80% of the time, the remaining 20% takes the other 80% of the time"
That is a good point and true to some extent. But IME with AI, both the initial speedup and the eventual slowdown are accelerated vs. a human.
I've been thinking that one reason is that while AI coding generates code far faster (on a greenfield project I estimate about 50x), it also generates tech-debt at a hyperastonishing rate.
It used to be that tech debt started to catch up with teams in a few years, but with AI coded software it's only a few months into it that tech debt is so massive that it is slowing progress down.
I also find that I can keep the tech debt in check by using the bot only as a junior engineer, where I specify precisely the architecture and the design down to object and function definitions and I only let the bot write individual functions at a time.
That is much slower, but also much more sustainable. I'd estimate my productivity gains are "only" 2x to 3x (instead of ~50x) but tech debt accumulates no faster than a purely human-coded project.
This is based on various projects only about one year into it, so time will tell how it evolves longer term.
Published APIs cannot be changed without causing friction on the client's end, which may not be under our control. Even if the API is properly versioned, users will be unhappy if they are asked to adopt a completely changed version of the API on a regular basis.
Data that was created according to a previous version of the data model continues to exist in various places and may not be easy to migrate.
User interfaces cannot be radically changed too frequently without confusing the hell out of human users.
I haven't tried that yet, so not sure.
Once upon a time I was at a company where the PRD specified that the product needs to have a toggle to enable a certain feature temporarily. Engineering implemented it literally, it worked perfectly. But it was vital to be able to disable the feature, which should've been obvious to anyone. Since the PRD didn't mention that, it was not implemented.
In that case, it was done as a protest. But AI is kind of like that, although out of sheer dumbness.
The story is meant to say that with AI it is imperative to be extremely prescriptive about everything, or things will go haywire. So doing a full rewrite will probably work well, only if you manage to have very tight test case coverage for absolutely everything. Which is pretty hard.
Sometimes the start of a greenfield project has a lot of questions along the lines of "what graph plotting library are we going to use? we don't want two competing libraries in the same codebase so we should check it meets all our future needs"
LLMs can select a library and produce a basic implementation while a human is still reading reddit posts arguing about the distinction between 'graphs' and 'charts'.
The solutions also help me combat my natural tendency to over-engineer.
It’s also fun getting ChatGPT to quiz me on topics.
I also like to generate greenfield codebases from scratch.
I have been meaning to put up a blog ...
Essentially there's a delta between what the human does and the computer produces. In a classic compiler setting this is a known, stable quantity throughout the life-cycle of development.
However, in the world of AI coding this distance increases.
There's various barriers that have labels like "code debt" where the line can cross. There's three mitigations now. Start the lines closer together (PRD is the current en vogue method), push out the frontier of how many shits someone gives (this is the TDD agent method), try to bend the curve so it doesn't fly out so much (this is the coworker/colleague method).
Unfortunately I'm just a one-man show so the fact that I was ahead and have working models to explain this has no rewards because you know, good software is hard...
I've explained this in person at SF events (probably about 40-50 times) so much though that someone reading this might have actually heard it from me...
If that's the case, hi, here it is again.
But last week I had two days where I had no real work to do, so I created cli tools to help with organisation, and cleaning up, I think AI boosted my productivity at least 200%, if not 500.
Overall, still a 4x production gain overall though, so I’m not complaining for $20 a month. It’s especially good at managing complicated aspects of c so I can focus on the bigger picture rather than the symbol contortions.
Yup. My biggest issue with designing software is usually designing the system architecture/infra. I am very opposed to just shove everything to AWS and call it a day, you dont learn anything from that, cloud performance stinks for many things and I dont want to get random 30k bills because I let some instance of something run accidentally.
AI sucks at determining what kinda infrastructure would be great for scenario x due to Cloud being to go to solution for the lazy dev. Tried to get it to recommend a way to self host stuff, but thats just a general security hazard.
That’s what I’ve been doing lately, and it really helps get a clean architecture at the end.
On the other you have a non-technical executive who's got his head round Claude Code and can run e.g. Python locally.
I helped one recently almost one-shot converting a 30 sheet mind numbingly complicated Excel financial model to Python with Claude Code.
Once the model is in Python, you effectively have a data science team in your pocket with Claude Code. You can easily run Monte Carlo simulations, pull external data sources as inputs, build web dashboards and have Claude Code work with you to really integrate weaknesses in your model (or business). It's a pretty magical experience watching someone realise they have so much power at their fingertips, without having to grind away for hours/days in Excel.
almost makes me physically sick.I've a reasonably intense math background corrupted by application to geophysics and implementing real world numerical applications.
To be fair, this statement alone:
* 30 sheet mind numbingly complicated Excel financial model
makes my skin crawl and invokes a flight reflex.
Still, I'll concede that a Claude Code conversion to Python of a 30 sheet Excel financial model is unlikely to be significantly worse than the original.
If a data science team modeled something incorrectly in their simulation, who's gonna catch it? Usually nobody. At least not until it's too late. Will you say "this doesn't look plausible" about the output? Or maybe you'll be too worried about getting chided for "not being data driven" enough.
If an exec tells an intern or temp to vibecode that thing instead, then you definitely won't have any checkpoints in the process to make sure the human-language prompt describing process was properly turned into the right simulation. But unlike in coding, you don't have a user-facing product that someone can click around in, or send requests to, and verify. Is there a test suite for the giant excel doc? I'm assuming no, maybe I'm wrong.
It feels like it's going to be very hard for anyone working in areas with less black-and-white verifiability or correctness like that sort of financial modeling.
https://www.newscientist.com/article/dn23448-how-to-stop-exc...
Any and I mean any statistic someone throws at me I will try and dig in. And if I'm able to, I will usually find that something is very wrong somewhere. As in, the underlying data is usually just wrong, invalidating the whole thing or the data is reasonably sound but the person doing the analysis is making incorrect assumptions about parts of the data and then drawing incorrect conclusions.
Can't tell you how many times I've seen product managers making decisions based on a few hundred analytics events, trying to glean insight where there is none.
What are you optimizing all that code for, it works doesnt it? Dont let perfect be the enemy of good. If it works 80% thats enough, just push it. What is technical debt?
I think 1) holds (as my experience matches your cynicism :), but I have a feeling that data minded people tend to overestimate the importance of 2)...
What also can help for entrepreneurship is having a bias for action. So even if your insights are wrong, if you act and keep acting you will keep acting then you will partially shape reality to your will and bend to its will.
So there are certain forces where you can compensate for your lack of rigor.
The best companies have both of those things by their side.
In many experience, many of the statistics these people use doesn't matter in the success of a business --- they are vanity metrics. But people use statistics, and especially the wrong statistics, to pass their agenda. Regardless, it's important to fix the statistics.
There are often more errors. Sometimes the actual results are wildly different in reality to what a model expects .. but the data treatment has been bug hunted until it does what was expected .. and then attention fades away.
Back in my data scientist days I used to push for testing and verification of models. Got told off for reducing the teams speed. If the model works well enough to get money in, and the managers that make the final calls do not understand the implications of being wrong, this would be the majority of cases.
The local statistics office here recently presented salary statistics claiming that teachers' salaries had unexpectedly increased by 50%. All the press releases went out, and it was only questions raised by the public that forced the statistics office to review and correct the data.
A huge test for me was to have people review my analyses and poke holes. You feel good when your last 50 reports didn’t have a single thing anyone could point out.
I’ve been seeing a lot of people try to build analyses with AI who haven’t been burned with the “just because it sounds correct doesn’t mean it’s right” dilemma who haven’t realized what it takes before you can stamp your name on an analysis.
The Excel sheet will have been tuned over the years by people who knew exactly what it was doing and fixed countless bugs along the way.
The Claude Code copy will be a simulacrum that may behave the same way with some inputs, but is likely to get many of edge cases wrong, and, when you're talking about 30 sheets of Excel, there will be many, many of these sharp edges.
IMHO, earned through years of bleeding eyeballs, the first will be riddled with subtle edge cases curiously patched and fettled such that it'll limp through to the desired goal .. mostly.
The automated AI assisted transcoding will be ... interesting.
Now, back in the day, IBM designed and built an "executive data terminal". It wasn't really a computer terminal in the sense that you and I understand it. Rather, it was a video and two-way-audio feed to a room with a team of underlings, which an executive could ask for business data and analyses, which could be called up on a computer display (also routed to the executive's office). This allowed the executive to ask questions so he (it was the 1960s, it was almost invariably a he) could make informed decisions, and the team of underlings to call up data or crunch numbers on the computer and show the results on the display.
So because executives are used to having things done for them, I can totally see AI being used by executives to replace the "team of underlings" in this setup—in principle. The fact is that were I in that CEO's chair, I'd be thinking twice before trusting anything an LLM tells me, and double-checking those results—perhaps with my team of underlings.
Discussed on Hackernews: https://news.ycombinator.com/item?id=42405462 IEEE article: https://spectrum.ieee.org/ibm-demo
You're too modest. You'd be thinking once.
However when the parrot is hidden in a shiny box made up to look like a regular, relatively trustworthy program...
I'm sure Claude Code will happily one-shot that conversion. It's also virtually guaranteed to have messed up vital parts of the original logic in the process.
Anyway, please try it if you find it unbelievable. I didn't expect it to work FWIW like it did. Opus 4.5 is pretty amazing at long running tasks like this.
Maybe you did one or the other , but “nearly one-shotted” doesn’t tend to mean that.
Claude Code more than occasionally likes to make weird assumptions, and it’s well known that it hallucinates quite a bit more near the context length, and that compaction only partially helps this issue.
Sure, maybe that’s just building something that’s bug-for-bug compatible, but it’s something Claude can work with.
I have no idea why it had so much trouble with this generally easy task. Bizarre.
Tell me if I am wrong, but surely Claude cannot even access execution coverage.
I have, in my early careers, gone knee deep into Excel macros and worked on c# automation that will create excel sheet run excel macros on it and then save it without the macros.
in the entire process, I saw dozens of date time mistakes in VBA code, but no tests that would catch them...
When shit hits the fan and execs need answers yesterday, will they jump to using the LLM to probabilistically make modifications to the system, or will they admit it was a mistake and pull Excel back up to deterministically make modifications the way they know how?
"1 or 2 plan mode prompts" to fully describe a 30-sheet complicated doc suggests a massively higher level of granularity than Opus initial plans on existing codebases give me or a less-than-expected level of Excel craziness.
And the tooling harnesses have been telling the models to add testing to things they make for months now, so why's that impressive or suprising?
I was impressed because the prompt didn't ask it to do that. It doesn't normally add tests for me without asking, YMMV.
Did it build a test suite for the Excel side? A fuzzer or such?
It's the cross-concern interactions that still get me.
80% of what I think about these days when writing software is how to test more exhaustively without build times being absolute shit (and not necessarily actually being exhaustive anyway).
It's like a CPU that's almost 100% reliable... in that it fails only once every 1 million clock cycles.
the largest independent derivatives broker in australia collapsed after it was discovered the board were using astrology and magicians to gamble with all the clients money
https://www.abc.net.au/news/2016-09-16/stockbroker-used-psyc...
All the previous human-driven crashes didn't change anything about capital owners' approach to money, so why would an AI-driven crash change things?
It used to be that we'd fix the copy-paste bugs in the excel sheet when we converted it to a proper model, good to know that we'll now preserve them forever.
It is a beautiful experience to realize wtf you don’t know and how far over their skis so many will get trusting AI. The idea of deploying a rust project at my level of ability with an AI at the helm is is terrifying.
In my experience a lot of Excel models aren’t really tested, just checked a bit and them deemed correct.
Also (I appreciate the authors message here but..)
"Excel on the finance side is remarkably limiting when you start getting used to the power of a full programming ecosystem like Python"
With the addition of lambdas Excel formulae are Turing complete. no more need for VBA in a (mostly) functional environment.
Also on this, Claude for Excel needs a lot of work (as does any tool working with financial models) if you have ever used them in anger I dont think you'll be relying on them with your non-technical finance manager for a while...
Back then, employees were secretly installing Excel macros and Dropbox just to get work done faster. Now they’re quietly running Claude Code in the terminal because the official Copilot can’t even forma a CSV properly.
CISOs are terrified right now and that’s understandable. Non-technical people with root access and agents that write code are a security nightmare. But trying to ban this outright will only push your most effective employees to places where they’re allowed to "fly"
“The real leaps are being made organically by employees, not from a top down [desktop PC] strategy. Where I see the real productivity gains are small teams deciding to try and build a [Lotus 123] assisted workflow for a process, and as they are the ones that know that process inside out they can get very good results - unlike a [mainframe] software engineering team who have absolutely zero experience doing the process that they are helping automate.”
The embedded “power users” show the way, then the CIO-friendly packaged software follows much later.
Microsoft has spent 30 years designing the most contrived XML-based format for Excel/Word/Powerpoint documents, so that it cannot be parsed except by very complicated bespoke applications with hundreds of developers involved.
Now, it's impossible to export any of those documents into plain text that an LLM can understand, and Microsoft Copilot literally doesn't work no matter how much money they throw at it. My company is now migrating Word documents to Markdown because they're seeing how powerful AI is.
This is karmic justice imo.
I even tried telling Copilot to convert each sheet to a CSV on one attempt THEN do calculations. It just ignored it and failed miserably, ironically outputting me a list of files that it should have made, along with the broken python script. I found this very amusing.
I've read that they're supposed to be great with XML as it's so structured, better than JSON, but haven't actually found that to be the case.
I had interns use c++ to unzip, parse, and repackage to json a standardized visio doc. I had no say in the standard, but specific blocks meant specific things, etc. The project was successful. The xml was parse-able... at least for our needs. The overall project died a swift death and this tidbit will probably be forgotten forever in the depths of repo heirarchy.
I think the results would be pretty shocking and I think mostly because the integrations to source services are abject messes.
"With 45 percent of enterprise employees now using generative AI tools, 77 percent of these AI users have been copying and pasting data into their chatbot queries, the LayerX study says. A bit more than a fifth (22 percent) of these copy and paste operations include PII/PCI."
I very much doubt that tinkering with a non-repeatable, probabilistic process is how most non-technical users will routinely use software.
I can imagine power users taking this approach to _create_ or extend productivity tools for themselves and others, just like they have been doing with Excel for decades. It will not _replace_ productivity tools for most non-technical users.
It seems way too soon to really narrow down any kind of trends after a few months. Most people aren't breathlessly following the next twitter trend, give it at least a year. Nobody is really going to be left behind if they pick up agents now instead of 3 months ago.
I've seen great improvements with just two MCP servers: context7 and playwright. The first is great on planning sessions and leads to better usage of new-ish libraries, and the second is giving the model a feedback loop. The advantage is that they work with pretty much any coding agent harness you use. So whatever worked with cursor will work with cc or opencode or whatever else.
If you have found a model that accurately predicts the stock market, you don't write a blog post about how brilliant you are, you keep it quiet and hope no one finds out while you rake in profits.
I still can't figure out quite what motivates these "AI evangelist" types (unlike crypto evangelists who clearly create value for themselves when they create credibility), but if you really have a dramatically better way to solve problems, you don't need to waste your breath trying to convince people. The validity of your method will be obvious over time.
I was just interviewing with a company building a foundation model for supposedly world changing coding assistants... but they still can't ship their product and find enough devs willing to relocate to SF. You would think if you actually had a game changing coding assistant, your number one advantage would be that you don't need to spend anything on devs and can ship 10x as fast as your competition.
> First, you have the "power users", who are all in on adopting new AI technology - Claude Code, MCPs, skills, etc. Surprisingly, these people are often not very technical.
It's not surprising to me at all that these people aren't very technical. For technical people code has never been the bottleneck. AI does reduce my time writing code but as a senior dev, writing code is a very small part of the problems I'm solving.
I've never had to argue with anyone that using a calculator is a superior method of solving simple computational math problems than doing it by hand, or that using a stand mixer is more efficient than using a wooden spoon. If there was a competing bakery arguing that the wooden spoon was better, I wouldn't waste my time arguing about the stand mixer, I would just sell more pastry then them and worry about counting my money.
I'd hazard a guess and say "money"
Perhaps the wildest thing to me is how you'll have senior leaders in a company talking about innovation, but their middle managers actively undermine change out of fear of liability. So many enterprise IT employees are really just trying to avoid punishment that their organization cannot try new things without substantial top-down efforts to accept risk.
This us like saying prison bars are harmful. It depends which side you are on.
May we see the "agentic" replacement for Word, please?
Maybe it's not a big deal, or maybe it's a compliance model with severe financial penalties for non-compliance. I just personally don't kind these tradeoffs going implicit.
Seems like Nadella is having his Baller moment
Still with a small market share. They only figured out how to extort the maximum amount of money from a smaller user base, and app developers, really anyone they can.
Putting that first would have saved the bother of putting the second and third.
Slightly overstated. Tiny teams aren't outcompeting because of AI, they're outcompeting because they aren't bogged down by decades of technical debt and bureaucracy. At Amazon, it will take you months of design, approvals, and implementation to ship a small feature. A one-man startup can just ship it. There is still a real question that has to be answered: how do you safely let your company ship AI-generated code at scale without causing catastrophic failures? Nobody has solved this yet.
Ultimately, it's the same way you ship human-generated code at scale without causing catastrophic failure: by only investing trust in critical systems to people who are trustworthy and have skin in the game.
There are two possibilities right now: either AI continues to get better, to the point where AI tools become so capable that completely non-technical stakeholders can trust them with truly business-critical decision making, or the industry develops a full understanding of their capabilities and is able to dial in a correct amount of responsibility to engineers (accounting for whatever additional capability AI can provide). Personally, I think (hope?) we're going to land in the latter situation, where individual engineers can comfortably ship and maintain about as much as an entire team could in years past.
As you said, part of the difficulty is years of technical debt and bureaucracy. At larger companies, there is a *lot* of knowledge about how and why things work that doesn't get explicitly encoded anywhere. There could be a service processing batch jobs against a database whose URL is only accessible via service discovery, and the service's runtime config lives in a database somewhere, and the only person who knows about it left the company five years ago, and their former manager knows about it but transferred to a different team in the meantime, but if it falls over, it's going to cause a high-severity issue affecting seven teams, and the new manager barely knows it exists. This is a contrived example, but it goes to what you're saying: just being able to write code faster doesn't solve these kinds of problems.
It's very simple. You treat AI as junior and review its code.
But that awesomely complex method has one disadvantage, having to do so means you can't brag about 300% performance improvement your team got from just commiting AI code to master branch without looking.
I think there's a parallel here between people finding great success with coding agents vs. people swearing it's shit. But when prodded it turns out that some are working on good code bases while others work on shit code bases. It's probably the same with large corpos. Depending on the culture, you might get such convoluted processes and so much "assumed" internal knowledge that agents simply won't work ootb.
It took a lot of convincing, but I finally got her to start using ChatGPT to help her write SQL and walk her through setting up some SaaS accounting software formulas.
It worked so well now she's trying to find more applications at work. Claude code is too scary for her though. That will need to be in some Web UI before she feels comfortable giving it a try.
One tidbit I’d disagree with is that only those using the bleeding edge AI tools are reaping the benefits. There seem to be a lot of highly specialized tools and a lot of specific configurations (and mystical incantations) to get them to work, and those are constantly changing and being updated. The bleeding edge is a dangerous place to be if you value your time (and sanity).
Personally, as someone working on moderate-to-highly complex software (live inference of industrial IoT data), I can’t really open a merge / pull request for my colleagues to review unless I 100% understand what I’ve pushed, and can explain to them as well.
My killer app for AI would just be a CLI that gets me to a commit based on moderately technical input:
“Add this configuration variable for this entry point; split this class into two classes, one for each of the responsibilities that are currently crammed together; update the unit tests to reflect these changes, including splitting the tests for the old class into two different test classes; etc”
But, all the hype of the bleeding edge is around abstracting away the entire coding process until you don’t even understand what code is being generated? Hard to see it as anything but a pipe dream. AI is useful, but it’s not a panacea - you can’t fire it and replace it when it fucks up.
The less you understood about code to start with, the quicker you achieve this goal... and the less prepared you are for the consequences.
That's the type of input I give to Claude / Codex. Works for me.
Granted I'm way behind the curve, but is this not how actual engineers (and not influencers) are using it? I heavily micro-manage the implementation because my manager still expects me to know the code
I use this amazingly niche and hipster approach of giving the agent its own account, which through inconceivably highly complex arcane tweaking and configurations can lock down what they can and cant do.
---
Can somebody for the love of god tell me why articles keep bringing up why this is so difficult?
...is how I imagine that conversation goes.
I am learning software development without having it generate code for me—preferring to have it explain each thing line-by-line. But… it’s not only for learning development, but I can query it for historical information and have it point me to the source of the information (so I can read the primary sources as much as possible).
It allows me to customize the things I want to learn at my own pace, while also allowing me to diverge for a moment from the learning material. I have found it invaluable… and so far, Gemini has been pretty good at this (probably owing to the integration of Google search into Gemini).
It lets me cut through the SEO crap that has plagued search engines in recent years.
What will be the expected work output for the average future worker?
I guess it's like asking for people's vim configs, but hey, there are at least a few popular posts mainly around git/vim/terminal configs.
## Important Instructions
- update todo.md as items are completed
**Commit to git after making code changes.** Check `git status` first - only commit if there are actual changes:
```bash
# If not in a git repository, initialize it first:
git init
# Then commit changes:
git add <FILES_UPDATED>
# Be surgical - add only the changes you just made.
git commit -m "Description of changes"
This lets me have bite-sized git commits that I can marshall later, rather than having to wrangl git myself.Small companies are more agile and innovative while corporations often just shuffle papers around. Wow, what a bold claim, never seen before in the entire history of economics.
I think it sums up how thoroughly they've been disrupted, at least for coding AIs (independent of like-for-like quality concerns rightly mentioned elsewhere in this thread re: Excel/Python).
I understand ChatGPT can do like a million other things, but so can Claude. Microsoft deliberately using competitors internally is the thing that their customers should pay attention to. Time to transform "Nobody gets fired for buying Microsoft" into "Nobody gets fired for buying what Microsoft buy", for those inclined.
I’m happily vibe coding at work but yeah article is right. MS has enterprise market share by default not by merit. Stunning contrast between what’s possible and what’s happening in big corp
And if the copilot button does nothing but open a chat window without any real integration with the app, what the hell is the point of that when there's already a copilot button in the windows taskbar?
Ah yes, Monte Carlo simulations, regular part of a finance team's objectives.
I can select exactly where I want changes and have targeted element removal in Photoshop. If I submit the image and try to describe my desired changes textually, I get less easily-controllable output. (And I might still get scrambled text, for instance, in parts of the image that it didn't even need to touch.)
I think this sort of task-specific specialization will have a long future, hard to imagine pure-text once again being the dominant information transfer method for 90% of the things we do with computers after 40 years of building specialized non-text interfaces.
I was a bit surprised by how it still resulted in gibberish text on posters in the background in an unaffected part of the image that at first glance didn't change at all. So even just the "masking" ability of like "anything outside of this range should not be touched" of a GUI would be a godsend.
Ive been trying to create a quick and dirty marketing promo via an LLM to visualise how a product will fit into the world of people - it is incredibly painful to 'hope and pray' that by refining the prompt via text you can make slight adjustments come through.
The models are good enough if you are half-decent at prompting and have some patience. But given the amount invested, I would argue they are pretty disappointing. Ive had to chunk the marketing promo into almost a frame-by-frame play to make it somewhat work.