3 months ago, Anthropic and Simon claimed that Skills were the next big thing and going to completely change the game. So far, from my exploration, I don't see any good examples out there, nor is a there a big growing/active community of users.
Today, we are talking about Cowork. My prediction is that 3 months from now, there will be yet another new Anthropic positioning, followed up with a detailed blog from Simon, followed by HN discussing possibilities. Rinse and Repeat.
This is something I have experienced first hand participating in the Vim/Emacs/Ricing communities. The newbie spends hours installing and tuning workflows with the mental justification of long-term savings, only to throw it all away in a few weeks when they see a new, shinier thing. I have been there and done that. For many, many years.
The mature user configures and installs 1 or 2 shiny new things, possibly spending several hours even. Then he goes back to work. 6 months later, he reviews his workflow and decides what has worked well, what hasn't and looks for the new shiny things in the market. Because, you need to use your tools in anger, in the ups and downs, to truly evaluate them in various real scenarios. Scenarios that won't show up until serious use.
My point is that Anthropic is incentivized in continuously moving goalposts. Simon is incentivized in writing new blogs every other day. But none of that is healthy for you and me.
I fully anticipate there being a fair amount of thrashing on what exactly the right wrapper is around both of those concepts. I think the hard thing is to discriminate between the learned constants (vim/emacs) are from the attempts to re-jiggle or extend that (plugins, etc); it's actually useful to get reviews of these experiments exactly so you don't have to install all of them to find out whether they add anything.
(On skills, I think that the reason why there "aren't good examples out there" is because most people just have a stack of impromptu local setups. It takes a bit of work to extract those to throw them out into the public, and right now it's difficult to see that kind of activity over lots of very-excitable hyping, as you rightly describe.
The deal with skills and other piles of markdown is that they don't look, even from a short distance, like you can construct a business model for them, so I think they may well end up in the world of genuine open source sharing, which is a much smaller, but saner, place.
To be fair, Cowork and similar things are just trying to take the agentic workflows and tools that developers are already accessing (eg most of us have already been working with files in Cursor/CC/Codex for a long time now, it's nothing new) and making them friendly for others.
It’s not quite at the same level but it reminds me of YouTubers who get products from companies for free for a “review” and then they say “no money exchanged hands”. The incentives are implicit wink-wink and everyone knows it except the audience.
This is a very detailed, particular prompt. The type of prompt a programmer would think of as they were trying to break down a task into something that can be implemented. It is so programmer-brained that I come away not convinced that a typical user would be able to write it.
This isn’t an AI skepticism post - the fact that it handles the prompt well is very impressive. But I’m skeptical that the target user is thinking clearly enough to prompt this well.
There may be a future AI-based system that can retain so much context it can kind of just "get what you mean" when you say off-the-cuff things, but I believe that a user that can think, speak, and write clearly will still have a skill advantage over one that does not.
I can speak faster than I type, and the flow state is much smoother when you can just dump a stream of consciousness into the context window in a matter of seconds. And the quality of the model is insane for something that runs locally, on reasonable hardware no less.
Swearing at an LLM is also much more fun when done verbally.
I've heard it well described as a k-type curve. Individuals that already know things will use this tool to learn and do many more things. Individuals that don't know a whole lot aren't going to learn or do a whole lot with this tool.
Imagine what we could accomplish if we had a way of writing very precise language that is easy for a machine to interpret!
I expect we'll see an enormous quantity of "cool prompts to try in Cowork" content show up over the next few months, which makes sense - regular non-programmers will benefit enormously from cookbooks and prompting guides and other tools to help them figure out what they can ask this thing.
"are any of my recent blog drafts unpublished and nearly ready to go?"
I use Substack as a free email provider for the email newsletter copy of my blog - which saves me hundreds of dollars a month in email fees.
Just because Claude Cowork is for "other" kinds of work, not just software engineering, doesn't in any way change that. It's not like other kinds of knowledge work aren't being done by intelligent professionals who invest time into learning how to use complicated software and systems
That is to say, I don't know who the "target user" of this is, but it is a $100/month subscription, so it's presumably someone who is a pretty serious AI user.
Over time, target users will learn to think and communicate this way. As this is what tools will demand of them.
Also you may only need someone to write the meta prompt that then spits out this kind of thing given some problem “I want to find the easiest blog posts to finish in my drafts but some are already published” then a more detailed prompt out of it, read it and set things going.
The 1980's version of simonw was explaining to people how to use Excel, too.
(though 40 years later, things are still pretty bad on the Excel front, hah)
Cowork seems like a great application of that principle.
If Microsoft, in creating their next gen agentic OS, wants to replace Windows with the Linux kernal, Claude Code, and bash shell (turning Windows into a distribution of sorts,) more power to them. However, I doubt this is the direction they'll go.
I get a kick out of the fact that Microsoft has been preciously clinging to the "Copilot" branding and here comes Claude coming saying "Cowork? Good enough for us!".
-
Taking a step back, I really would love to see a broader perspective -- an account of someone who is not tech savvy at all. Someone who works a basic desk job that requires basic competency of microsoft word. I'm so deep into the bubble of AI-adjacent people that I haven't taken stock of how this would or could empower those who are under-skilled.
We've taken it as truth that those who benefit most from AI are high-skilled augmenters, but do others see some lift from it? I'd love if anthropic tried to strap some barely-performing administrative assistants into these harnesses and see if there's a net benefit. For all I know, it's not inconceivable that there be a `rm -rf` catastrophe every other hour.
[0]: https://nymag.com/intelligencer/article/how-claude-code-cowo...
I dont actually think there many of those people out there. And those that are, are on their way out. There are basically none of those people entering the work force. There are tons of people with that sort of computer literacy but they aren't working on computers.
* Bookkeeper & planning approval within city government
* Doctor/dentist/optometry receptionist & scheduler (both at independent offices and at major hospitals)
* Front desk staff at almost every company with a physical front desk
* University administrative staff (there can be a lot more of these people than you'd think)
* DMV workers
* Probably lots of teachers
Those jobs all will use other software as well, but a lot of their job is making and filling forms on a computer, where they are likely needing to use MS Word fairly often to write things up.
Word dominates in the corporate space.
She has to go through about 100 resumes for a position at her college. Each resume is essentially a form the candidate filled out and lists their detailed academic scores from high school > PhD, their work experience, research and publications.
Based on the declared data, candidates are scored by the system
Now this is India and there's a decent amount of fraud, so an individual has to manually check the claimed experience/scores/publications against reality
A candidate might claim to have relevant experience, but the college might be unaccredited, or the claimed salary might be way too low for a relevant academic position. Or they might claim to have published in XYZ journal, but the journal itself might be a fraudulent pay-to-publish thing
Going through 100+ resumes, each 4 pages long is a nightmare of a task. And boring too.
--
So I asked Claude Code to figure out the problem. I gave it a PDF with the scoring guidelines, a sample resume, and asked it to figure out the problem
Without me telling it, it figured out a plan that involved checking a college's accredition and rating (the govt maintains a rating for all colleges), the claimed salary vs actual median salary for that position (too low is a red flag), and whether the claimed publication is in either the SCOPUS index or a govt approved publications index
(I emphasize govt approved because this is in a govt backed institution)
Then I gave it access to a folder with all the 100 resumes.
In less than 30 minutes, it evaluated all candidates and added the evaluation to a CSV file. I asked it to make it more readable, so it made a HTML page with data from all the candidates and red/green/yellow flags about their work-experience, publications, and employment
It made a prioritized list of the most promising candidates based on this data
My wife double checked because she still "doesn't trust AI", but all her verification almost 100% matched Claude's conclusions
This was a 3 day, grinding task done in 30 minutes. And all I did was type into a terminal for 20 minutes
To say nothing of the flagrantly immoral and likely illegal data privacy violations, of course.
She's right not to trust it for something like this. The "almost 100%" is the problem (also consider that you're sending personal data to anthropic without permission) especially for something like this where it might mean discarding someone's resume, which is something that could have a significant impact on a person's life.
Humans are terrible at that kind of long term focus, make clerical errors, etc.
Well, and learning how to do that in 20 minutes
I know the counterargument is people are already putting in company data via ChatGPT. However, that is a conscious decision. This may happen without people even recognizing that they are “spilling the beans”.
> Claude Cowork exfiltrates files https://news.ycombinator.com/item?id=46622328
This is a good example of the Normalization of Deviance in AI by the way.
See my Claude Pirate research from last October for details:
https://embracethered.com/blog/posts/2025/claude-abusing-net...
Even if critics are right that these models are inherently insecure, the market will likely settle for 'optically patched.' If the efficiency gains are there, companies will just accept the residual risk.
It is probably a good lesson on how far confidence can get you in life. People are often highly biased by the presentation of the thing.
Dismissing the opportunity to learn because the person offering you knowledge is enthusiastic about his area of expertise is probably shortsighted.
I’m not intending to be dismissive, just noticing a pattern and advocating a bit of skepticism.
Hearing a programming legend weigh in on the latest programming tool seems entirely completely reasonable.
That's pretty negative! https://simonwillison.net/series/prompt-injection/
There's a whole section in the linked piece about how Cowork doesn't do enough here, including:
> I do not think it is fair to tell regular non-programmer users to watch out for “suspicious actions that may indicate prompt injection”
The people you should be skeptical of are the random Xitter handles who post about a robotic phlebotomists and say "THE FUTUE IS ALREADY HERE".
Example: The decade+ of people worshipping at Musk’s feet only for him to reveal himself as a malignant narcissist.
The failure modes are just too rough for most people to think about until it's too late.
Over time this has become more sophisticated. I've created custom commands to incorporate training tips from YouTube videos (via YT-DLP and WhisperX) and PDFs of exercise plans or books that I've purchased. I've used or created MCP servers to give it access to data from my smart watch and smart scale. It has a few database-like YAML files for scoring things like exercise weight ranges and historical fitness metrics. At some point we'll probably start publishing the workouts online somewhere where I can view and complete them electronically, although I'm not feeling a big rush on that. I can work on this at my own pace and it's never been anything but fun.
I think there's a whole category of personal apps that are essentially AI + a folder with files in it. They are designed and maintained by you, can be exactly what you want (or at least can prompt), and don't need to be published or shared with anyone else. But to create them you needed to be comfortable at the command line. I actually had a chat with Claude about this, asking if there was a similar workflow for non-CLI types. Claude Cowork seems like it. I'll be curious to see what kinds of things non-technical users get up to with it, at least once it's more widely available.
I’m not sure what the plan for integrating extensions is here but they definitely will be wanted.
There are so many office workers who just shuffle data between systems. Not sure about the error rate though but it is not like the error rate is going to be worse a decade from now.
This would have taken ages to do by hand in iMovie, and probably just as long to look up the needed parameters in ffmpeg, but Claude code got it right in the first try, and worked with me to fine-tune the motion detection threshold.
Most recent example: I wanted to try out GLM-image when it dropped the other day, but didn't feel like spending an hour dealing with multifile, multidirectory HuggingFace downloads and all the usual Python dependency headaches. So I made an empty directory, ran Claude Code, and told it "Please download all files from https://huggingface.co/zai-org/GLM-Image/tree/main into this directory and test the model." An hour later, I tabbed back to the console window and there was the sample image file.
Looking at the transcript, sure enough, it ran into all the usual headaches and hassles... but the difference is I didn't have to deal with them.
Note that I didn't tell it "Use uv to test the model" -- I just YOLOed it with my system Python installation. If I later find that it broke something else, oh, well... that sounds like a job for Claude, too.
Another thing that's nice about these CLI tools is that they hide the differences between terminals pretty effectively. I ran this particular task in a Windows DOS box, but could just as easily have used PowerShell, a Mac, or a Linux terminal. The idea of not having to care what OS I'm running is an alluring one, given the steady enshittification of Windows.
More broadly, my observation is that the type of tools that developers use are naturally suited to be scripted. Because developers do that all the time. We work with command line prompts, lots of tools that can be scripted via the command line, and scripting languages that work in that environment.
Tools like Claude Code and Codex are extremely simple for that reason. It's a simple feedback loop that in pseudo code reads like "while criteria not met, figure out what tools to run, run those, add output to context and re-assess if criteria were met". You don't need to hard code anything about the tools. A handful of tools (read file, run command, etc.) is all that is needed. You can get some very sophisticated feedback loops going that effectively counter the traditional limitations of LLMs (hallucinating stuff, poor instruction following, assertively claiming something is done when it isn't, etc.). A simple test suite and the condition that the tests must pass (while disallowing obvious hacks like disabling all the tests) can be enough to make agents grind away at a problem until it is solved.
In a business context, this is not true yet. Most business users use a variety of tools that aren't very scriptable and require fiddling with complex UIs. Worse, a lot of those tools are proprietary and hacking them requires access you typically don't get or is very limited. Given that, a life hack is to translate business workflows into developer tool workflows and then use agentic coding tools. Claude can't use MS Word for you. But it can probably work on MS word files via open source libraries and tools. So, step zero is to "mount a directory" and then use command line tools to manipulate what's inside. You bypass the tool boundary by swapping out business tools with developer tools. Anything behind a SAAS web UI is a bit out of scope unfortunately. You get bogged down in a complex maze of authentication and permission issues, fiddly APIs with poor documentation. That's why most of the connectors for e.g. Chat GPT are a bad joke in how limited they are.
Simple example. Codex/Claude Code, etc. are probably fairly useless doing anything complicated with say Square Space, a wordpress website, etc. But if you use a static site builder, you can make these tools do fairly complicated things. I've been working for the last two weeks on our Hugo website to do some major modernization, restructuring, content generation, translations, etc. All via prompting codex. I'm working on SEO, lighthouse performance, adding complex new components to the website, reusing content from old pages to create new ones, checking consistency between translations, ensuring consistent use of certain language, etc. All by prompting codex. "Add a logo for company X", "make sure page foo has a translation consistent with my translation guide", etc.
I got a lot more productive with this setup after I added a simple npm run verify test suite with a simple AGENTS.md instruction that the verify script has to pass after any change. If you watch what codex does there's a pattern of trial and error until the verification script passes. Usually it doesn't get it right in one go. But it gets there without my intervention. It's not a very sophisticated test suite but it tests a few of the basics (e.g. tailwind styling survives the build and is in the live site, important shit doesn't 404, hugo doesn't error, etc.). I have about 10 simple smoke tests like that.
I think we'll see a big shift in the business world towards more AI friendly tooling because smart business users will be flocking towards tools that work with AI tools in a hurry as they discover that they can shave weeks/days of grinding those tools manually by switching. This is a process that's likely to take very long because people don't like to change their tool habits. But the notion of what is the right tool for the right job is shifting. If it's not AI friendly, it's the wrong tool probably.
Long term, I expect UIs and dealing with permissions in a sane way will be easier to deal with for AI tools. But meanwhile, we don't actually have to wait for all that. You can hack your way to success if you are a bit smart with your tool choices.
But seriously, other tasks I've encountered recently that I wish I could delegate to an AI:
- Posting my junk to Craigslist, determining a fair price, negotiating a buyer (pickup only!)
- Scheduling showings to find an apartment, wherein the listing agents are spread over multiple platforms, proprietary websites, or phone contacts
- Job applications -- not forging a resume, but compiling candidate positions with reasoning, and the tedious part were you have to re-enter your whole resume into their proprietary application pipeline app
What strikes me as basic similarities across these types of things, is that they are essentially data-entry jobs which interact with third-party interfaces, with CRM-like follow up requirements, and require "good judgement" (reading reviews, identifying scams, etc).