FilterHN

2 months ago

[-]

First thing I did here is a grep for "Skills" and no hits. Simon's posts are well upvoted here and Anthropic/Claude is a bit of HN darling, but I think they are playing the hype game a bit too well here.

3 months ago, Anthropic and Simon claimed that Skills were the next big thing and going to completely change the game. So far, from my exploration, I don't see any good examples out there, nor is a there a big growing/active community of users.

Today, we are talking about Cowork. My prediction is that 3 months from now, there will be yet another new Anthropic positioning, followed up with a detailed blog from Simon, followed by HN discussing possibilities. Rinse and Repeat.

This is something I have experienced first hand participating in the Vim/Emacs/Ricing communities. The newbie spends hours installing and tuning workflows with the mental justification of long-term savings, only to throw it all away in a few weeks when they see a new, shinier thing. I have been there and done that. For many, many years.

The mature user configures and installs 1 or 2 shiny new things, possibly spending several hours even. Then he goes back to work. 6 months later, he reviews his workflow and decides what has worked well, what hasn't and looks for the new shiny things in the market. Because, you need to use your tools in anger, in the ups and downs, to truly evaluate them in various real scenarios. Scenarios that won't show up until serious use.

My point is that Anthropic is incentivized in continuously moving goalposts. Simon is incentivized in writing new blogs every other day. But none of that is healthy for you and me.

2 months ago

[-]

I think I made good call on Skills.

They were only announced in October and they've already been ported to Codex and Gemini CLI and VS Code agents and ChatGPT itself (albeit still not publicly acknowledged there by OpenAI). They're also used in Cowork and are part of the internals in Fly's new Sprites. They're doing extremely well for an idea that's only three months old!

This particular post on Cowork isn't some of my best work - it was a first impression I posted within a couple of hours of release (I didn't have preview access to Cowork) just to try and explain what the thing was to people who don't have a $100+/month Claude Max subscription.

I don't think it's "unhealthy" for me to post things like this though! Did you see better coverage of Cowork than mine on day one?

brailsafe

2 months ago

[-]

> But none of that is healthy for you and me.

I read that as it's not healthy to constantly follow the day one posts about every iteration of brand new technology in order to try and see how to incorporate it into your workflow in a rapidly evolving manner.

It's not an attack on your article or your habits, it's an accurate indictment of chronically consuming probably short-lived hype instead of practicing craft and the use of hardened tools, much like watching certain programmers on youtube to know about the latest frontend library instead of just working on something with versatile, generalizable, industry-relevant tools

glemion43

2 months ago

[-]

Not sure we're you heard this.

Can be a stupid advice.

The ai field moves fast, what's wrong being an early adopter and experimenting around with it?

Someone has to do it.

And tbh skills are an easy to use concept to make Claude faster and the context smaller.

rjtavares

2 months ago

[-]

You made the right call. Skills were added to Antigravity and I immediately started creating and using them. I never used custom MCP servers, but skills were immediately obvious to me.

An example: I made a report_polisher skill that cleans some markdown formatting, check image links, and then uses pandoc to convert it to HTML. I ask the tool itself created the skill, then I just tweaked it.

franktankbank

2 months ago

[-]

How is the fidelity of something like this? It seems like it would randomly fuck it up once in a blue moon. Is that not the case? For your use case I don't understand why you would want an AI involved at all.

rjtavares

2 months ago

[-]

Skills may have have code attached to them, so in this case the formatting and converting is all code.

The value of skills is that they are attached to the context of an LLM for few tokens, and the LLM activates one when it feels that it relevant (and brings it into context). It's a chepear alternative to having a huge CLAUDE.md (or equivalent) file.

2 months ago

[-]

Spot on. The value of skills is using a much smaller context to improve the quality of the output.

This is plainly obvious to anyone who understands how these LLMs work.

franktankbank

2 months ago

[-]

Fascinating thank you.

2 months ago

[-]

Please do open-source your skill and blog about it. Also, would like to hear from your experience after a few months of use. Like - how many times did you use the skill, did you run into some problems later (due to some unexpected thing in the markdown), did the skill generalize - or do you have to make tweaks for particular inputs.

2 months ago

[-]

@brailsafe has accurately captured where I am coming from.

I want more blogs/discussion from the community about the existing tools.

In 3/6 months, how many skills have you written? How many times have you used each skill? Did you have edit skills later due to unseen corner cases, or did they generalize? Are skills being used predominantly at the individual level, or are entire teams/orgs able to use a skill as is? What are the usecases where skills are not good at? What are the shortcomings?

(You being the metaphorical HN reader here of course.)

HN has always been a place of greater technical depth than other internet sites and I would like to see more of this sort of thing on the front page along with day one calls.

2 months ago

[-]

I'd love to see answers of those questions too, especially the team ones.

Personally the skills that I have found most useful that I've written myself are these:

- uv-tdd - run a TDD loop on a Python project via uv: https://github.com/simonw/skills/blob/main/uv-tdd/SKILL.md

- setup-to-pyproject - migrate a setup.py Python project to pyproject.toml in the way I prefer https://github.com/simonw/skills/blob/main/setup-to-pyprojec...

- datasette-plugins: everything an LLM needs to write a new Datasette plugin https://github.com/datasette/skill/blob/main/SKILL.md

2 months ago

[-]

You did make a good call on skills.

Anything that lets us compose smaller tasks into larger ones effectively is helpful. That’s because self-attention (ie context) is still a huge limiting factor.

As someone who uses these tools a lot, and who sits on the bleeding edge everyday, I agree with you.

CuriouslyC

2 months ago

[-]

MCP got a ton of use out of the gate. People were fawning over it for the first few months, and we can see how well that hype survived contact with hardcore engineers.

https://www.youtube.com/watch?v=_6C9nMvQsGU

qingcharles

2 months ago

[-]

There was a funny YouTube which came out immediately after the release where Claude rm -rf all dude's files o_O

YOLO? More like LOLOL

ljm

2 months ago

[-]

The fact you could see in the video what was going on for several minutes before the guy noticed it…

malka1986

2 months ago

[-]

you have to be crazy to run claude in YOLO mode outside of a constrained environment such as a docker container

https://embracethered.com/blog/posts/2025/the-normalization-...

holbrad

2 months ago

[-]

There's an awful lot of people running these tools in YOLO mode.

cliftonc

2 months ago

[-]

I really disagree, skills are really quite useful and there is a lot of usage + community - e.g. take a look at https://github.com/obra/superpowers which I know is used by a lot of people to smooth out their workflow with Claude with great results (not forced spec driven development just better context use + better results). Just this week I used skills to help encapsulate a way to document legacy services ahead of a rewrite (given that my experience now is that rewriting becomes a valid path vs refactoring in many instances): https://github.com/cliftonc/unwind.

2 months ago

[-]

I looked at superpowers, but it felt way too generic. Thanks for sharing unwind. More discussion/blogs about these kind of skills is what I am looking for. I would encourage you to write a blog on unwind, explaining in detail how it has helped you. Even better if you do it after 3 months of use, explaining the journey/evolution of the skill.

dannyobrien

2 months ago

[-]

I'm happy to bet with that skills -- or "a set of instructions in markdown that get sucked into your context under certain conditions" will stick around. Similarly, I think that the Claude Code/Cowork -- or "interactive prompt using shell commands on a local filesystem" -- will also stick around.

I fully anticipate there being a fair amount of thrashing on what exactly the right wrapper is around both of those concepts. I think the hard thing is to discriminate between the learned constants (vim/emacs) are from the attempts to re-jiggle or extend that (plugins, etc); it's actually useful to get reviews of these experiments exactly so you don't have to install all of them to find out whether they add anything.

(On skills, I think that the reason why there "aren't good examples out there" is because most people just have a stack of impromptu local setups. It takes a bit of work to extract those to throw them out into the public, and right now it's difficult to see that kind of activity over lots of very-excitable hyping, as you rightly describe.

The deal with skills and other piles of markdown is that they don't look, even from a short distance, like you can construct a business model for them, so I think they may well end up in the world of genuine open source sharing, which is a much smaller, but saner, place.

throwup238

2 months ago

[-]

> (On skills, I think that the reason why there "aren't good examples out there" is because most people just have a stack of impromptu local setups. It takes a bit of work to extract those to throw them out into the public, and right now it's difficult to see that kind of activity over lots of very-excitable hyping, as you rightly describe.

Very much this. All of my skills/subagents are highly tailored to my codebases and workflows, usually by asking Claude Code to write them and resuming the conversation any time I see some behavior I don't like. All the skills I've seen on Github are way too generic to be of any use.

2 months ago

[-]

I thought skills were supposed to be sharable, but (a) ones that are being shared openly are too generic and not useful, (b) people are writing super specific skills and not sharing them.

Would strongly encourage you to open-source/write blog posts on some concrete examples from your experience to bridge this gap.

gabriel-uribe

2 months ago

[-]

Yep, lots of bike shedding right now.

To be fair, Cowork and similar things are just trying to take the agentic workflows and tools that developers are already accessing (eg most of us have already been working with files in Cursor/CC/Codex for a long time now, it's nothing new) and making them friendly for others.

[2]. https://cursor.com/blog/dynamic-context-discovery

jameslk

2 months ago

[-]

> 3 months ago, Anthropic and Simon claimed that Skills were the next big thing and going to completely change the game. So far, from my exploration, I don't see any good examples out there, nor is a there a big growing/active community of users.

Skills have become widely adopted since Anthropic's announcement. They've been implemented across major coding agents[0][1][2] and standardized as a spec[3]. I'm not sure what you mean by "next big thing" but they're certainly superior to MCP in ways, being much easier to implement and reducing context usage by being discoverable, hence their rapid adoption

I don't know if skills will necessarily stay relevant amongst evolution of the rest of the tooling and patterns. But that's more because of huge capital investment around everything touching AI, very active research, and actual improvements in the state of the art, rather than simply "new, shinier things" for the sake of it

[0]. https://developers.openai.com/codex/skills/

[1]. https://antigravity.google/docs/skills

[3]. https://agentskills.io/home

jacobajit

2 months ago

[-]

Cowork actually uses skills under the hood that give it various knowledge work abilities, so that abstraction seems to be working well:

"in Cowork we’ve added an initial set of skills that improve Claude’s ability to create documents, presentations, and other files" https://claude.com/blog/cowork-research-preview

https://github.com/anthropics/skills

linsomniac

2 months ago

[-]

>I don't see any good [skills] examples out there

`/plugin marketplace add anthropics/skills`

2 days ago I built a skill to automate a manual workflow I was using: After Claude writes and commits some code, have Codex review that code and have Claude go back and address what Codex finds. I used this process to implement a fairly complete Docusign-like service, and it did a startlingly good job right out of the gate, the bugs were fairly shallow. In my manual review of the Codex findings, it seems to be producing good results.

Claude Code largely built that skill for me.

Implemented as a skill and I've been using it for the last 2 days to implement a "retrospective meeting runner" web app. Having it as a skill completely automates the code->review->rework step.

2 months ago

[-]

I looked that the official repo of skills, but I found those very generic and artificial.

I would encourage you to write up a blog post of your experience and share a version of the skill you have built. And then follow up with a blog post after 3 months with analysis like how well the skill generalized for your daily use, whether you had to make some changes, what didn't work etc. This is the sort of content we need more of here.

tin7in

2 months ago

[-]

I partially agree with you that things get abandoned by users when they are too complex, but I think skills are a big improvement compared to what we had before.

Skills + tool search tool (dynamic MCP loading) announced recently are way better than just using MCP tools. I see more adoption by the people around me compared to a few months ago.

CuriouslyC

2 months ago

[-]

Anthropic has great marketing. They get shit (and I do mean shit) to stick in a way that I don't think anyone else in the AI space could. MCP and skills were both obvious duds to people who understand the tech.

Simon is more influencer than engineer at this point, he's incentivized to ride waves to drive views, and I think the handwaiving "this will be amazing" posts have been good to him, even if they turn out to be completely wrong.

2 months ago

[-]

This comment isn’t going to age well. Just my $0.02.

CuriouslyC

2 months ago

[-]

Feel free to make an argument, I'm happy to discuss what I might be missing.

SkyPuncher

2 months ago

[-]

I'm not really sure I understand this critique. Skills and cowork are not mutually exclusive. It sits in a gap between Chat and Claude Code.

In regular Chat, I struggle to get the agent to consistently traverse certain workflows that I have. This is something that I can trivially do in Claude Code - but Claude Code wants to code (so I'm often fighting it's tendencies).

Cowork seems like it's going to allow me to use the best parts of Claude Code, without being forced to output everything to code.

i-blis

2 months ago

[-]

Related argument (as to SKILL.md not being a big thing) in the following blog post: https://news.ycombinator.com/item?id=46644086

2 months ago

[-]

Nice - this is the kind of blogs we need more of. Getting into real experience.

johnisgood

2 months ago

[-]

Yeah, I noticed it a long time ago.

Quality > quantity.

ares623

2 months ago

[-]

Well said.

It’s not quite at the same level but it reminds me of YouTubers who get products from companies for free for a “review” and then they say “no money exchanged hands”. The incentives are implicit wink-wink and everyone knows it except the audience.

2 months ago

[-]

That's why I publish disclosures: https://simonwillison.net/about/#disclosures

In the case of Cowork I didn't even get preview access, I learned about it at the same moment as everyone else did. There was no incentive from Anthropic to write about it at all (and I expect they may have preferred me not to bang on about prompt injection risks or point out the bugs in their artifacts implementation.)

Honestly, constantly having to fend off accusations of being a shill is pretty tiring.

2 months ago

[-]

Ignore idiots and trolls. Do your thing. Don’t worry about the noise.

n8cpdx

2 months ago

[-]

> Look at my drafts that were started within the last three months and then check that I didn’t publish them on simonwillison.net using a search against content on that site and then suggest the ones that are most close to being ready

This is a very detailed, particular prompt. The type of prompt a programmer would think of as they were trying to break down a task into something that can be implemented. It is so programmer-brained that I come away not convinced that a typical user would be able to write it.

This isn’t an AI skepticism post - the fact that it handles the prompt well is very impressive. But I’m skeptical that the target user is thinking clearly enough to prompt this well.

headcanon

2 months ago

[-]

Since LLMs were introduced, I've been of the belief that this technology actually makes writing a *more* important skill to develop than less. So far that belief has held. No matter how advanced the model gets, you'll get better results if you can clarify your thoughts well in written language.

There may be a future AI-based system that can retain so much context it can kind of just "get what you mean" when you say off-the-cuff things, but I believe that a user that can think, speak, and write clearly will still have a skill advantage over one that does not.

sothatsit

2 months ago

[-]

FWIW, I've heard many people say that with voice dictation they ramble to LLMs and by speaking more words can convey their meaning well, even if their writing quality is low. I don't do this regularly, but when I have tried it, it seemed to work just as well as my purposefully-written prompts. I can imagine a non-technical person rambling enough that the AI gets what they mean.

headcanon

2 months ago

[-]

Thats a fair counterpoint, and it has helped translate my random thoughts into more coherent text. I also haven't taken advantage of dictation much at all either, so maybe I'll give it a try. I still think the baseline skill that writing gives you translates to an LLM-use skill, which is thinking clearly and knowing how to structure your thoughts. Maybe folks can get that skill in other ways (oration, art, etc.). I don't need to give it essays, but I do need to give it clear instructions. Every time it spins off and does something I don't want, its because I didn't clarify my thoughts correctly.

arcanemachiner

2 months ago

[-]

Setting up SpeechNote with Kokoro is one of the the best things I've ever done.

I can speak faster than I type, and the flow state is much smoother when you can just dump a stream of consciousness into the context window in a matter of seconds. And the quality of the model is insane for something that runs locally, on reasonable hardware no less.

Swearing at an LLM is also much more fun when done verbally.

dworks

2 months ago

[-]

The prompt the user enters is actually not the prompt. Most agents will have an additional background step to use the user's prompt to generate the actual, detailed instructions, which is then used as the actual prompt for code generation. That's how the ability to build a website from "create a website that looks like twitter" is achieved.

patja

2 months ago

[-]

My 85 year-old father could probably resolve 90% of his personal technology problems using an LLM. But for the same reason every phone call on these subjects ends with me saying "can it wait until I come over for lunch next week to take a look?", an LLM isn't a viable solution when he can't adequately describe the problem and its context.

Workaccount2

2 months ago

[-]

I showed my father how to use the live camera mode with Gemini and it's been a boon for him

imiric

2 months ago

[-]

> No matter how advanced the model gets, you'll get better results if you can clarify your thoughts well in written language.

Imagine what we could accomplish if we had a way of writing very precise language that is easy for a machine to interpret!

TeMPOraL

2 months ago

[-]

Yeah, we've already seen that over the past few decades. It's both a limitation and a benefit, but until recently it was the only thing we had (well that, and just hiring another person to act as an LLM for us). LLMs are an upgrade.

frumiousirc

2 months ago

[-]

> No matter how advanced the model gets, you'll get better results if you can clarify your thoughts well in written language.

This definitely agrees with my experience. But a corollary is that written human language is very cumbersome to encode some complex concepts. More and more I give up on LLM-assisted programming because it is easier to express my desires in code than using English to describe what forms I want to see in the produced code. Perhaps once LLMs get something akin to judgement and wisdom I can express my desires in the terms I can use with other experienced humans and take for granted certain obvious quality aspects I want in the results.

SecretDreams

2 months ago

[-]

> So far that belief has held. No matter how advanced the model gets, you'll get better results if you can clarify your thoughts well in written language.

I've heard it well described as a k-type curve. Individuals that already know things will use this tool to learn and do many more things. Individuals that don't know a whole lot aren't going to learn or do a whole lot with this tool.

tomjen3

2 months ago

[-]

It is absolutely true, with the interesting caveat that the basic (spelling grammar) doesn’t matter. Clarity and detail of your ideas do.

2 months ago

[-]

I agree 100% - it's a very programmer-coded prompt. It was pretty much the first thing I thought to try.

I expect we'll see an enormous quantity of "cool prompts to try in Cowork" content show up over the next few months, which makes sense - regular non-programmers will benefit enormously from cookbooks and prompting guides and other tools to help them figure out what they can ask this thing.

slewis

2 months ago

[-]

Can you try a simpler less programmery version?

"are any of my recent blog drafts unpublished and nearly ready to go?"

mbesto

2 months ago

[-]

This is essentially the "future of work"TM - those who can define prompts will be poised best for the future.

oulipo2

2 months ago

[-]

Why choosing to publish on substack, which is owned by a techno-fascist ? https://www.theatlantic.com/ideas/archive/2023/11/substack-e...

2 months ago

[-]

I don't publish on Substack, I publish on my own site: https://simonwillison.net

I use Substack as a free email provider for the email newsletter copy of my blog - which saves me hundreds of dollars a month in email fees.

whattheheckheck

2 months ago

[-]

What internet company isn't run by those types?

alvah

2 months ago

[-]

"Why publish on Substack, a venture‑backed tech platform whose leadership has chosen a permissive moderation policy?" fixed it for you.

sharkjacobs

2 months ago

[-]

It takes a certain amount of expertise to use LLMs effectively. And I know that some people claim otherwise but they simply aren't worth listening to.

Just because Claude Cowork is for "other" kinds of work, not just software engineering, doesn't in any way change that. It's not like other kinds of knowledge work aren't being done by intelligent professionals who invest time into learning how to use complicated software and systems

That is to say, I don't know who the "target user" of this is, but it is a $100/month subscription, so it's presumably someone who is a pretty serious AI user.

mightybyte

2 months ago

[-]

This is why I think (at least given the current state of AI code generators) that senior engineers will benefit more from AI than less experienced engineers. I don't know exactly what the chart of experience (on the x-axis) and amount of productivity gain from AI (on the y-axis) will look like, but I'm pretty sure it will be roughly (given suitable error bars around the input) a monotonically increasing function.

onion2k

2 months ago

[-]

The notion that people who aren't developers couldn't figure out how to use this tool well, or be trained to, is a little too negative. Programmers aren't special snowflakes. Everyone with a brain is capable of describing a problem and breaking the solution into steps.

The popularity of LLMs proves this. That's how most people use them - building up a detailed prompt in steps, and learning how to put more detail in to get the result you want.

IanCal

2 months ago

[-]

One part I like about LLMs is that they can smooth over the rough edges in programming. Lots of people can build pretty complicated spreadsheets, can break down a problem into clear discrete tasks, or can at least look at a set of steps and validate that solves the issue they have & more easily updated it. Those people don’t necessarily know json isn’t a person, how to install python or how to iterate over these things. I cant give directions in Spanish but its not because I don’t know how to get to the library its just I can’t translate precisely.

Also you may only need someone to write the meta prompt that then spits out this kind of thing given some problem “I want to find the easiest blog posts to finish in my drafts but some are already published” then a more detailed prompt out of it, read it and set things going.

perfmode

2 months ago

[-]

> But I’m skeptical that the target user is thinking clearly enough to prompt this well.

Over time, target users will learn to think and communicate this way. As this is what tools will demand of them.

redox99

2 months ago

[-]

Eh, most people never learned how to google.

fudged71

2 months ago

[-]

Select star from blog posts where... :)

iambateman

2 months ago

[-]

But that's how literally all software adoption curves work...

The 1980's version of simonw was explaining to people how to use Excel, too.

(though 40 years later, things are still pretty bad on the Excel front, hah)

Barbing

2 months ago

[-]

Author site: https://simonwillison.net/2026/Jan/12/claude-cowork/

[0] https://www.youtube.com/watch?v=AmdLVWMdjOk

ahussain

2 months ago

[-]

I enjoyed hearing Claude Code creator Boris Cherny talk about "latent demand"[0], which is when users start using your product for something it was not intended for. When that happens, it's a great signal that you should go build that into a full product.

Cowork seems like a great application of that principle.

2 months ago

[-]

This is my Substack newsletter which bundles several posts together into a weekly-ish email - the original post for this one was https://simonwillison.net/2026/Jan/12/claude-cowork/

emp17344

2 months ago

[-]

I don’t think I’ve ever seen this guy say anything negative about an AI product, which makes me skeptical of his insights here.

asadotzler

2 months ago

[-]

He's a proponent, but that doesn't mean his analysis isn't useful. It's clear and mostly accurate and when he gets something wrong he makes it right. Does he do all that with rose tinted glasses, probably, but my experience reading him is that he's sharp, thoughtful, and entirely reasonable.

Dismissing the opportunity to learn because the person offering you knowledge is enthusiastic about his area of expertise is probably shortsighted.

Cornbilly

2 months ago

[-]

I don't think they were being dismissive. They just said they were skeptical, which is generally a good thing. It's certainly better than the goofy hero worship I constantly see on HN.

emp17344

2 months ago

[-]

How is this Simon’s area of expertise? I know he’s a programming legend, but I’ve never heard anything to indicate he’s a machine learning expert.

I’m not intending to be dismissive, just noticing a pattern and advocating a bit of skepticism.

wanderingstan

2 months ago

[-]

This is more akin to a race car driver give a review of, for example, a new type of electric car. It doesn’t matter that the driver is not a domain expert in electric motors and regenerative braking; what matters is he knows how to operate these machines in their use case at the limits.

Hearing a programming legend weigh in on the latest programming tool seems entirely completely reasonable.

2 months ago

[-]

Being an expert in machine learning turns out to not be particularly relevant to being an expert in the applications of LLMs to real-world problems. I'm certainly not an expert in the former but I do think I have credibility in the latter.

2 months ago

[-]

Have you seen my writing on prompt injection (a term that I coined)?

That's pretty negative! https://simonwillison.net/series/prompt-injection/

There's a whole section in the linked piece about how Cowork doesn't do enough here, including:

> I do not think it is fair to tell regular non-programmer users to watch out for “suspicious actions that may indicate prompt injection”

what

2 months ago

[-]

I don’t think you coined “prompt injection”…

https://simonwillison.net/2022/Sep/12/prompt-injection/

2 months ago

[-]

> This isn’t just an interesting academic trick: it’s a form of security exploit. I propose that the obvious name for this should be prompt injection.

I've written about it 142 times since then: https://simonwillison.net/tags/prompt-injection/

I'm credited for coining the term on Wikipedia and in several academic papers.

I don't claim to have discovered the vulnerability - I credited that to Riley Goodside, but we later learned it was independently discovered and first reported to OpenAI by Jonathan Cefalu of Preamble, see https://www.preamble.com/prompt-injection-a-critical-vulnera...

mbesto

2 months ago

[-]

I would argue the EXACT opposite. His analysis is very constructive.

The people you should be skeptical of are the random Xitter handles who post about a robotic phlebotomists and say "THE FUTUE IS ALREADY HERE".

Cornbilly

2 months ago

[-]

No. It’s also good to be skeptical of people that seem like they are operating in good faith.

Example: The decade+ of people worshipping at Musk’s feet only for him to reveal himself as a malignant narcissist.

vacuity

2 months ago

[-]

The deciding factor on whether to believe someone should not be whether they believe themselves, but whether they are worth believing. A murderer who acts in good faith is still a murderer.

webdevver

2 months ago

[-]

id imagine someone like Simon to pick his AI products carefully enough that he doesn't waste his time on duds.

pglevy

2 months ago

[-]

He literally brings up a concern he calls the "lethal trifecta" when it's even remotely relevant.

mvdtnz

2 months ago

[-]

You're right to be skeptical. He makes a living as a hype merchant.

2 months ago

[-]

It's a pretty terrible way to make a living, to be honest. If I was in this for the money I'd trade my reputation as an independent voice for a six figure salary at an AI company.

laborcontract

2 months ago

[-]

This is a nice technical account that we're used to seeing from Simon.

I get a kick out of the fact that Microsoft has been preciously clinging to the "Copilot" branding and here comes Claude coming saying "Cowork? Good enough for us!".

Taking a step back, I really would love to see a broader perspective -- an account of someone who is not tech savvy at all. Someone who works a basic desk job that requires basic competency of microsoft word. I'm so deep into the bubble of AI-adjacent people that I haven't taken stock of how this would or could empower those who are under-skilled.

We've taken it as truth that those who benefit most from AI are high-skilled augmenters, but do others see some lift from it? I'd love if anthropic tried to strap some barely-performing administrative assistants into these harnesses and see if there's a net benefit. For all I know, it's not inconceivable that there be a `rm -rf` catastrophe every other hour.

sanderjd

2 months ago

[-]

This predates Cowork, but I have started to see "non-technical" journalists start taking Claude Code seriously recently. For instance, Joe Weisenthal has been writing about this, eg.: https://nitter.net/thestalwart/status/2010512842705735948.

nlawalker

2 months ago

[-]

The Atlantic just did a dedicated article the other day. Gift link: https://www.theatlantic.com/technology/2026/01/claude-code-a...

[0]: https://nymag.com/intelligencer/article/how-claude-code-cowo...

Analemma_

2 months ago

[-]

New York Magazine, not a technical publication by any means, also had an article about Claude Code/Cowork yesterday: [0]. Kinda punches a hole in the argument you sometimes see around here that "ChatGPT is the only brand consumers know, so OpenAI will definitely win."

sanderjd

2 months ago

[-]

I think it was reasonable to think, a couple years ago, that it would probably turn out that way, but yeah, not anymore.

It honestly feels really refreshing to me, for there to be genuine competition in a new technology.

nonethewiser

2 months ago

[-]

>Someone who works a basic desk job that requires basic competency of microsoft word.

I dont actually think there many of those people out there. And those that are, are on their way out. There are basically none of those people entering the work force. There are tons of people with that sort of computer literacy but they aren't working on computers.

californical

2 months ago

[-]

Eh, I can think of some examples for sure, I think there are still a lot of people like this.

* Bookkeeper & planning approval within city government

* Doctor/dentist/optometry receptionist & scheduler (both at independent offices and at major hospitals)

* Front desk staff at almost every company with a physical front desk

* University administrative staff (there can be a lot more of these people than you'd think)

* DMV workers

* Probably lots of teachers

Those jobs all will use other software as well, but a lot of their job is making and filling forms on a computer, where they are likely needing to use MS Word fairly often to write things up.

mrdependable

2 months ago

[-]

A lot of these have to do with other peoples data. Are we feeding these machines social security numbers and other PII?

TeMPOraL

2 months ago

[-]

It's not any practical problem if it's not used for training and product improvement. It's also not a legal problem if contracts have such provisions and are compatible with laws in relevant jurisdictions.

californical

2 months ago

[-]

I hope not, but… Yes, probably is happening regularly everywhere it’s not explicitly regulated

InitialLastName

2 months ago

[-]

Even where it is explicitly regulated, it's probably being done unthinkingly.

BeetleB

2 months ago

[-]

Oh what a bubble you live in.

Word dominates in the corporate space.

nonethewiser

2 months ago

[-]

>Word dominates in the corporate space.

That doesnt rebut anything I said.

How about young people entering the workforce who primarily work on computers but are mostly computer illiterate?

It definitely exists. But it's shrinking. There are tons of computer illiterate people, less so but even amongst young people, but they arent primarily working on computers. There is still a sizable chunk over 40 but those days are numbered.

dumbmrblah

2 months ago

[-]

I worry this is gonna cause even more sensitive/privilaged data extrafiltration than currently is happening. And most “normies” won't even notice.

I know the counterargument is people are already putting in company data via ChatGPT. However, that is a conscious decision. This may happen without people even recognizing that they are “spilling the beans”.

dpoloncsak

2 months ago

[-]

This hit the front page yesterday so you may have seen it, but figured I'd post for posterity sake

> Claude Cowork exfiltrates files https://news.ycombinator.com/item?id=46622328

HardCodedBias

2 months ago

[-]

I think you're right, but the issue goes deeper. If the productivity gains are real, the incentive to bypass security becomes overwhelming. We are going to see a massive conflict where compliance tries to clamp down, but eventually loses to 'getting work done.'

Even if critics are right that these models are inherently insecure, the market will likely settle for 'optically patched.' If the efficiency gains are there, companies will just accept the residual risk.

https://embracethered.com/blog/posts/2025/claude-abusing-net...

wunderwuzzi23

2 months ago

[-]

Claude (generally, even non Cowork mode) is vulnerable to exfil via their APIs, and Anthropic's response was that you should click the stop button if exfiltration occurs.

This is a good example of the Normalization of Deviance in AI by the way.

See my Claude Pirate research from last October for details:

spaceman_2020

2 months ago

[-]

I just used Claude Code to do something that would have taken my wife 3+ days

She has to go through about 100 resumes for a position at her college. Each resume is essentially a form the candidate filled out and lists their detailed academic scores from high school > PhD, their work experience, research and publications.

Based on the declared data, candidates are scored by the system

Now this is India and there's a decent amount of fraud, so an individual has to manually check the claimed experience/scores/publications against reality

A candidate might claim to have relevant experience, but the college might be unaccredited, or the claimed salary might be way too low for a relevant academic position. Or they might claim to have published in XYZ journal, but the journal itself might be a fraudulent pay-to-publish thing

Going through 100+ resumes, each 4 pages long is a nightmare of a task. And boring too.

So I asked Claude Code to figure out the problem. I gave it a PDF with the scoring guidelines, a sample resume, and asked it to figure out the problem

Without me telling it, it figured out a plan that involved checking a college's accredition and rating (the govt maintains a rating for all colleges), the claimed salary vs actual median salary for that position (too low is a red flag), and whether the claimed publication is in either the SCOPUS index or a govt approved publications index

(I emphasize govt approved because this is in a govt backed institution)

Then I gave it access to a folder with all the 100 resumes.

In less than 30 minutes, it evaluated all candidates and added the evaluation to a CSV file. I asked it to make it more readable, so it made a HTML page with data from all the candidates and red/green/yellow flags about their work-experience, publications, and employment

It made a prioritized list of the most promising candidates based on this data

My wife double checked because she still "doesn't trust AI", but all her verification almost 100% matched Claude's conclusions

This was a 3 day, grinding task done in 30 minutes. And all I did was type into a terminal for 20 minutes

anonymous908213

2 months ago

[-]

Applying for a job is an action that can change the entire course of your life. A job is something somebody spends ~30% of their time at, and every individual job contributes to a career trajectory and determines the opportunities one will have available throughout their life. Not only from the candidate's perspective, making the correct hires is something that will determine the course of a company's future; I would argue that there is literally nothing more important in running a business than hiring the right people. You would think, given these considerations, maybe the person who is paid money to review these applications should actually review them with the thoroughness that is warranted by such a task. Did the university approve of the work being outsourced to a chatbot instead of the person they were paying the salary to do the work?

To say nothing of the flagrantly immoral and likely illegal data privacy violations, of course.

hecanjog

2 months ago

[-]

> My wife double checked because she still "doesn't trust AI", but all her verification almost 100% matched Claude's conclusions

She's right not to trust it for something like this. The "almost 100%" is the problem (also consider that you're sending personal data to anthropic without permission) especially for something like this where it might mean discarding someone's resume, which is something that could have a significant impact on a person's life.

james_marks

2 months ago

[-]

What human has better than “almost 100%” on a dull task they have to grind at for 3 days?

Humans are terrible at that kind of long term focus, make clerical errors, etc.

yosito

2 months ago

[-]

I'm very skeptical of using AI in this way. I've given Claude access to calendars and travel plans and asked it to do similar analytical tasks cross referencing documents that would take days for me to do manually. Since it was about my own plans and life that I knew well, it was possible for me to spot subtle errors that seemed correct at the surface level but actually weren't the conclusions I would make. I've attempted these types of tasks 10-20 times with similar experiences each time. In the end, it's made me very skeptical, like your wife. I don't trust any AI output without a thorough review. Hallucinations are still a frequent problem.

cheema33

2 months ago

[-]

This is good work. When a task is of critical importance, I give two different LLMs the same task. And then ask them to review each other's output and validate all claims. I do this with Codex and Claude Code. It is very rare for them to find some valid fault in the other LLM's solution. And they are generally good about admitting mistakes and then creating a single unified solution that addresses identified issues. This result is better and ready for human review.

bandrami

2 months ago

[-]

Given the data exfil vulnerability a few stories down HN's front page I would be extremely hesitant to ask Claude to process a document someone else produced and sent to me

layer8

2 months ago

[-]

Doesn’t submitting the resumes to Anthropic violate India’s data protection laws?

zdragnar

2 months ago

[-]

Was the double-checking done in that 30 minutes? The fact that it wasn't 100% right means that the human in the loop was still important, so I'm just trying to understand the actual time saved.

adastra22

2 months ago

[-]

> but all her verification almost 100% matched Claude's conclusions

The "almost" here is very alarming.

jstummbillig

2 months ago

[-]

> And all I did was type into a terminal for 20 minutes

Well, and learning how to do that in 20 minutes

janalsncm

2 months ago

[-]

In general, I think when we are evaluating fuzzy things like this we should come up with specifications for what we would like to see before performing the eval. Not saying it happened here, but very often I see people impressed with “answer-shaped” answers rather than objectively assessing the actual quality. The latter is harder and requires specific expertise.

It is probably a good lesson on how far confidence can get you in life. People are often highly biased by the presentation of the thing.

theYipster

2 months ago

[-]

Leveraging Claude Code in a Linux shell to do all sorts of stuff has been an amazing superpower for me, and I think for many others. Cowork is a promising next step to democratize this superpower for others.

If Microsoft, in creating their next gen agentic OS, wants to replace Windows with the Linux kernal, Claude Code, and bash shell (turning Windows into a distribution of sorts,) more power to them. However, I doubt this is the direction they'll go.

fassssst

2 months ago

[-]

The latest models work just as good with PowerShell as they do with bash, well at least that’s true for codex.

OsrsNeedsf2P

2 months ago

[-]

I'll have to try out Codex. My experience with CC is that it _works_ in Powershell on Windows, but it's a magnitude slower than Linux and Mac

xmcqdpt2

2 months ago

[-]

I've gotten much more excited about our future AI overlords since it has led to all kinds of people at work asking me how to use tmux!

tech_tuna

2 months ago

[-]

I'm not convinced that the success and momentum of Claude Code will catch on with the general public. This feels like the one trick pony that's been groomed and billed as a racehorse. Or put another way Claude Cowork feels like Claude Code for people who don't code and are not interested in vibe coding.

We'll see.

mickdarling

2 months ago

[-]

I think Claude Cowork should come with a requirement or a very heavily structured wizard process to ensure the machine has something like a Time Machine backup or other backups that are done regularly, before it is used by folks.

The failure modes are just too rough for most people to think about until it's too late.

coderatlarge

2 months ago

[-]

i propose the following benchmark task that i think can serve as a baseline of whether these local automation systems can really save time:

starting with a bare ubuntu desktop system with plenty of RAM and CPU, setup three ubuntu VMs for secure development and networking skills learning (wireshark, protocol analysis, etc etc):

one ubuntu “virtual” desktop to simulate a working desktop that an end-user or developer would use. its networking should initially be completely isolated.

one ubuntu server to simulate a bastion machine. route all “virtual desktop” traffic through this “bastion”. it will serve as a tap.

one ubuntu server to serve as edge node. this one can share internet access with the host. route all bastion traffic through the edge node.

use this three vm setup to perform ordinary tasks in the “virtual desktop “ and observe the resulting traffic in the “bastion”. verify that no other traffic is generated on or from the host outside of the expected path virtual desktop -> bastion -> edge.

i claim this is a minimal “network clean” development setup for anyone wanting to do security-conscious development.

extra credit: setup another isolated vm sever to act as the package manager ; ie mirror anything to be installed on the “virtual desktop” onto this package server and configure this server as the install point for apt on the “virtual desktop”.

i doubt an AI can set this up right now. (i’ve tried)

beauzero

2 months ago

[-]

This is some low hanging fruit that keeps getting driven by in order to speed up development. There is so so much potential here. If this can replace the RPS consulting industry I won't be unhappy. Let individuals do it themselves so they have time to work themselves into some other position or move up/take on more responsibility.

gchallen

2 months ago

[-]

I've built several bespoke "apps" that are essentially Claude Code + a folder with files in it. For example, I have Claude Coach, which designs ultimate frisbee workouts for me. We started with a few Markdown files—one with my goals, one with information about my schedule, another with information about the equipment and facilities I have access to, and so on. It would access those files and use them to create my weekly workout plans, which were also saved as files under the same folder.

Over time this has become more sophisticated. I've created custom commands to incorporate training tips from YouTube videos (via YT-DLP and WhisperX) and PDFs of exercise plans or books that I've purchased. I've used or created MCP servers to give it access to data from my smart watch and smart scale. It has a few database-like YAML files for scoring things like exercise weight ranges and historical fitness metrics. At some point we'll probably start publishing the workouts online somewhere where I can view and complete them electronically, although I'm not feeling a big rush on that. I can work on this at my own pace and it's never been anything but fun.

I think there's a whole category of personal apps that are essentially AI + a folder with files in it. They are designed and maintained by you, can be exactly what you want (or at least can prompt), and don't need to be published or shared with anyone else. But to create them you needed to be comfortable at the command line. I actually had a chat with Claude about this, asking if there was a similar workflow for non-CLI types. Claude Cowork seems like it. I'll be curious to see what kinds of things non-technical users get up to with it, at least once it's more widely available.

stosssik

2 months ago

[-]

This resonates a lot. And we’re working on something in the same space: a way to build MCP aps for non technical people. If there are builders here who like experimenting, we’re looking for beta testers: -> https://manifest.build

vessenes

2 months ago

[-]

One rough edge for me: the cowork interface seems to have turned off “extensions” - my first ask was to read some emails and compare with some local documents and draft a document. It kept trying to use claude chrome to navigate to gmail.

I’m not sure what the plan for integrating extensions is here but they definitely will be wanted.

avidphantasm

2 months ago

[-]

I’m too dumb/lazy to run find and think for myself, so I’m happily digging my own grave. Yipee!!!

mrdependable

2 months ago

[-]

My imagination may be lacking, but what would you realistically use a tool like this for?

wanderingstan

2 months ago

[-]

For me, I recently wanted to assemble a “supercut” of my videos of attempts at learning to bunny-hop a bike. The tool was able to craft a python script that used ffmpeg to edit out the no-motion portions of the videos and stitch them together.

This would have taken ages to do by hand in iMovie, and probably just as long to look up the needed parameters in ffmpeg, but Claude code got it right in the first try, and worked with me to fine-tune the motion detection threshold.

suddenlybananas

2 months ago

[-]

I googled "edit out no motion in video ffmpeg" and found a snippet from StackOverflow which did this in about 10 seconds.

roflyear

2 months ago

[-]

yeah but that doesn't requiring paying Anthropic $100/m

2 months ago

[-]

Good luck getting that StackOverflow snippet to "work with me to fine-tune the motion detection threshold".

suddenlybananas

2 months ago

[-]

The answer described the relevant parameters for the threshold actually and gave a range of suggested parameters.

fatherwavelet

2 months ago

[-]

From the release page, this seems like a pretty big deal in terms of office jobs at some point in the future: "Spreadsheets with formulas: Generate Excel files with working VLOOKUP, conditional formatting, and multiple tabs"

There are so many office workers who just shuffle data between systems. Not sure about the error rate though but it is not like the error rate is going to be worse a decade from now.

CamperBob2

2 months ago

[-]

I don't use a Mac so can't run Cowork, but the normal Claude CLI is pretty good at general automation tasks (as is Codex-CLI and Gemini CLI.)

Most recent example: I wanted to try out GLM-image when it dropped the other day, but didn't feel like spending an hour dealing with multifile, multidirectory HuggingFace downloads and all the usual Python dependency headaches. So I made an empty directory, ran Claude Code, and told it "Please download all files from https://huggingface.co/zai-org/GLM-Image/tree/main into this directory and test the model." An hour later, I tabbed back to the console window and there was the sample image file.

Looking at the transcript, sure enough, it ran into all the usual headaches and hassles... but the difference is I didn't have to deal with them.

Note that I didn't tell it "Use uv to test the model" -- I just YOLOed it with my system Python installation. If I later find that it broke something else, oh, well... that sounds like a job for Claude, too.

Another thing that's nice about these CLI tools is that they hide the differences between terminals pretty effectively. I ran this particular task in a Windows DOS box, but could just as easily have used PowerShell, a Mac, or a Linux terminal. The idea of not having to care what OS I'm running is an alluring one, given the steady enshittification of Windows.

rjtavares

2 months ago

[-]

I've been using Google's Antigravity (which has a similar UI) to do data analysis and making reports. Skills are really useful for that.

ilaksh

2 months ago

[-]

just in case anyone is interested, I will mention my MIT licensed project that is very useful with Claude https://github.com/runvnc/mindroot

what

2 months ago

[-]

Did this AI hype thot move to substack to try and monetize?

2 months ago

[-]

I don't monetize via Substack. I've been using it as a free email newsletter version of my blog for almost three years now - I wrote about how I do that here: https://simonwillison.net/2023/Apr/4/substack-observable/

mNovak

2 months ago

[-]

So when can an AI call up the cable company and negotiate a discount? Asking for a friend.

But seriously, other tasks I've encountered recently that I wish I could delegate to an AI:

- Posting my junk to Craigslist, determining a fair price, negotiating a buyer (pickup only!)

- Scheduling showings to find an apartment, wherein the listing agents are spread over multiple platforms, proprietary websites, or phone contacts

- Job applications -- not forging a resume, but compiling candidate positions with reasoning, and the tedious part were you have to re-enter your whole resume into their proprietary application pipeline app

What strikes me as basic similarities across these types of things, is that they are essentially data-entry jobs which interact with third-party interfaces, with CRM-like follow up requirements, and require "good judgement" (reading reviews, identifying scams, etc).

lufenialif2

2 months ago

[-]

Possibly unlikely to occur if prompt injection remains possible. I’ll just have my counter party ai prompt inject yours to negotiate a better deal on my behalf.