You might as well just write instructions in English in any old format, as long as it's comprehensible. Exactly as you'd do for human readers! Nothing has really changed about what constitutes good documentation. (Edit to add: my parochialism is showing there, it doesn't have to be English)
Is any of this standardization really needed? Who does it benefit, except the people who enjoy writing specs and establishing standards like this? If it really is a productivity win, it ought to be possible to run a comparison study and prove it. Even then, it might not be worthwhile in the longer run.
Of course any LLM can write any script based on a document, but that's not very deterministic.
A good example is Anthropic's PDF creator skill. It has the basic english instructions as well as actual Python code to generate PDFs
codex + skills finetunes Qwen3-0.6B to +6 on humaneval and beats the base score on the first run.
I reran the experiment from this week, but used codex's new skills integration. Like claude code, codex consumes the full skill into context and doesn't start with failing runs. It's first run beats the base score, and on the second run it beats claude code.
https://xcancel.com/ben_burtenshaw/status/200023306951767675...That said, it's not a perfect comparison because of the Codex model mismatch between runs.
The author seems to be doing a lot of work on skills evaluation.
To be clear, I'm suggesting that any specific format for "skills.md" is a red herring, and all you need to do is provide the LLM with good clear documentation.
A useful comparison would be between: a) make a carefully organised .skills/ folder, b) put the same info anywhere and just link to it from your top-level doc, c) just dump everything directly in the top-level doc.
My guess is that it's probably a good idea to break stuff out into separate sections, to avoid polluting the context with stuff you don't need; but the specific way you do that very likely isn't important at all. So (a) and (b) would perform about the same.
My guess is that the standardization is going to make its way into how the models are trained and Skills are eventually going to pull out ahead.
0: https://vercel.com/blog/agents-md-outperforms-skills-in-our-...
Their reasoning about it is also flawed. E.g. "No decision point. With AGENTS.md, there's no moment where the agent must decide "should I look this up?" The information is already present." - but this is exactly the case for skills too. The difference is just where in the context the information is, and how it is structured.
Having looked at their article, ironically I think the reason it works is that they likely force more information into context by giving the agent less information to work with:
Instead of having a description, which might convince the agent a given skill isn't relevant, their index is basically a list of vague filenames, forcing the agent to make a guess, and potentialy reading the wrong thing.
This is basically exactly what skills were added to avoid. But it will break if the description isn't precise enough. And it's perfectly possible that current tooling isn't aggressive enough about pruning detail that might tempt the agent to ignore relevant files.
A +6 jump on a 0.6B model is actually more impressive than a +2 jump on a 100B model. It proves that 'intelligence' isn't just parameter count; it is context relevance. You are proving that a lightweight model with a cheat sheet beats a giant with amnesia. This is the death of the 'bigger is better' dogma
Which is essentially the bitter lesson that Richard Sutton talks about?Plus, as has been mentioned multiple times here, standard skills are a lot more about different harnesses being able to consistently load skills into the context window in a programmatic way. Not every AI workload is a local coding agent.
I am very interested in finding ways to combine skills + local models + MCP + aider-ish tools to avoid using commercial LLM providers.
Is this a path to follow? Or, something different?
https://xcancel.com/ben_burtenshaw
I wish they arranged it around READMEs. I have a directory with my tasks and I have a README.md there - before codex had skills it already understood that it needs to read the readme when it was dealing with tasks. The skills system is less directory dependent so is a bit more universal - but I am not sure if this is really needed.
This is different from swagger / OpenAPI how?
I get cross trained web front-end devs set a new low bar for professional amnesia and not-invented-here-ism, but maybe we could not do that yet another time?
What's different?
Now if a format dominates it will be post trained for and then it is in fact better.
I can't speak to what the exact split is or what is a part of post training versus pre training at various labs but I am exceedingly confident all labs post train for effectiveness in specific domains.
I claimed that OpenAI overindexed on getting away with aggressive post-training on old pre-training checkpoints. Gemini / Anthropic correctly realized that new pre-training checkpoints need to happen to get the best gains in their latest model releases (which get post-trained too).
So yeah, I agree that it's all just documentation. I know there's been some evidence shown that skills work better, but my feeling is that in the long run it'll fall to the wayside, like prompt engineering, for a couple of reasons. First, many skills will just become unnecessary - models will be able to make slide decks or do frontend design without specific skills (Gemini's already excellent at design without anything beyond the base model, imho). Second, increased context windows and overall intelligence will obviate the need for the specific skills paradigm. You can just throw all the stuff you want Claude to know in your claude.md and call it a day.
To overly programmer-brain it, a slash command is just a skill with a null frontmatter. This means that it doesn't participate in progressive disclosure, aka Claude won't consider invoking it automatically.
I'd like a user writeable, LLM readable, LLM non-writable character/sequence. That would make it a lot easier to know at a glance that a command/file/directory/username/password wasn't going to end up in context and being used by a rogue agent.
It wouldn't be fool proof, since it could probably find some other tool out there to generate it (eg write-me some unicode python), but it's something I haven't heard of that sounds useful. If it could be made fool/tool proof (fools and tools are so resourceful) that would be even better.
This standardization, basically, makes a list of docs easier to scan.
As a human, you have a permanent memory. LLMs don't have it, they have to load it into the context, and doing it only as necessary can help.
E.g. if you had anterograde amnesia, you'd want everything to be optimally organized, labeled, etc, right? Perhaps an app which keeps all information handy.
For example, if you've just joined a new team or a new project, wouldn't you like to have extensive, well-organised documentation to help get you started?
This reminds me of the "curb-cut effect", where accommodations for disabilities can be beneficial for everybody: https://front-end.social/@stephaniewalter/115841555015911839
Claude is programmed to stop reading after it gets through the skill’s description. That means we don’t consume more tokens in the context until Claude decides it will be useful. This makes a big difference in practice. Working in a large repo, it’s an obvious step change between me needing to tell Claude to go read a particular readme that I know solves the problem vs Claude just knowing it exists because it already read the description.
Sure, if your project happened to already have a perfect index file with a one-sentence description of each other documentation file, that could serve as a similar purpose (if Claude knew about it). It’s worthwhile to spread knowledge about how effective this pattern is. Also, Claude is probably trained to handle this format specifically.
Making your docs nice and modular, and having a high-level overview that tells you where to find more detailed info on specific topics, is definitely a good idea. We already know that when we're writing docs for human readers. The LLMs are already trained on a big corpus written by and for humans. There's no compelling reason why we need to do anything radically different to help them out. To the contrary, it's better not to do anything radically different, so that new LLM-assisted code and docs can be accessible to humans too.
Well-written docs already play nicely with LLM context.
I'm very curious to know the size & state of a codebase where skills are beneficial over just having good information hierarchy for your documentation.
Splitting the docs into neat modules is a good idea (for both human readers and current AIs) and will continue to be a good idea for a while at least. Getting pedantic about filenames, documentation schemas and so on is just bikeshedding.
https://claude.com/blog/context-management
> Context editing automatically clears stale tool calls and results from within the context window when approaching token limits.
> The memory tool enables Claude to store and consult information outside the context window through a file-based system.
But it looks like nobody has it as a part of an inference loop yet: I guess it's hard to train (i.e. you need a training set which is a good match for what people use context in practice) and make inference more complicated. I guess more high-level context management is just easier to implement - and it's one of things which "GPT wrapper" companies can do, so why bother?
So if you want to do this, the current workaround is basically to have a sub-agent carry out tasks you don't want to pollute the main context.
I have lots of workflows that gets farmed out to sub-agents that then write reports to disk, and produce a summary to the main agent, who will then selectively read parts of the report instead of having to process the full source material or even the whole report.
https://vercel.com/blog/agents-md-outperforms-skills-in-our-...
It’s also related to attention — invoking a skill “now” means that the model has all the relevant information fresh in context, you’ll have much better results.
What I’m doing myself is write skills that invoke Python scripts that “inject” prompts. This way you can set up multi-turn workflows for eg codebase analysis, deep thinking, root cause analysis, etc.
Works very well.
The whole point of LLM-based code execution is, well, I can just type in any old language it understands and it ought to figure out what I mean!
A "skill" for searching a pdf could be :
* "You can search PDFs. The code is in /lib/pdf.py"
or it could be:
* "Here's a pile of libraries, figure out which you want to use for stuff"
or it could be:
* "Feel free to generate code (in any executable programming language) on the fly when you want to search a PDF."
or it could be:
* "Solve this problem <x>" and the LLM sees a pile of PDFs in the problem and decides to invent a parser.
or any other nearly infinite way of trying to get a non-deterministic LLM to do a thing you want it to do.
At some level, this is all the same. At least, it rounds to the same in a sort of kinda "Big O" order-of-magnitude comparison.
On the other hand, I also agree, but I can definitely see present value in trying to standardize it because humans want to see what is going on (see: JSON - it's highly desirable for programmers to be able to look at a string representation of data than send opaque binary over the wire, even though to a computer binary is gonna be a lot faster).
There is probably an argument, too, for optimization of context windows and tokens burned and all that kinda jazz. `O(n)` is the same as `O(10*n)` (where n is tokens burned or $$$ per second or context window size) and that doesn't matter in theory but certainly does in practice when you're the one paying the bill or you fill up the context window and get nonsense.
So if this is a _thoughtful_ standard that takes that kinda stuff into account then, well, great! It gives a benchmark we can improve and iterate upon.
With some hypothetical super LLM that has a nearly infinite context window and a cost/tok of nearly zero and throughput nearing infinity, you can just say "solve my problem" and it will (eventually) do it. But for now, I can squint and see how this might be helpful.
Programs and data are the basis of deterministic results that are accessible to the llm.
Embedding an sqlite database with interesting information (bus schedules, dietary info, or a thousand other things) and a python program run by the skill can access it.
For Claude at least, it does it in a VM and can be used from your phone.
Sure, skills are more convention than a standard right now. Skills lack versioning, distribution, updates, unique naming, selective network access. But they are incredibly useful and accessible.
You don't want to give an English description of how to compress LZMA and then let the AI do it token by token. Although that would be a pretty good arduous methodical benchmark task for an AI.
IMO it's great if a plugin wants to have their own conventions for how to name and where to put these files and their general structure. I get the sense it doesn't matter to agents much (talking mostly claude here) and the way I use it I essentially give its own "skills" based on my own convention. It's very flexible and seems to work. I don't use the slash commands, I just script with prompts into claude CLI mostly, so if that's the only thing I gain from it, meh. I do see other comments speculating these skills work more efficiently but I'm not sure I have seen any evidence for that? Like a sibling comment noted I can just re-feed the skill knowledge back into the prompt.
I haven't done a formal study, so I can't prove it, but it seems like I get better output from agents if I tailor my English more towards the LLM way of "thinking".
Yeah, WWW is really just text but that doesn't mean you don't need HTTP + HTML and a browser/search engine. Skills is just that, but for agent capabilities.
Long term you're right though, agents will fetch this all themselves. And at some point they will not be our agents at all.
Long term you're right though, agents will fetch this all themselves
It's not "long term", it's right now. If your docs are well-written and well-organised, agents can already use them. The most you might need to do is copy your README.md into CLAUDE.md.
- https://community.openai.com/t/skills-for-codex-experimental...
- https://developers.openai.com/codex/skills/
Having a super repo of everyone else's slop is backwards thinking; you are now in the era where creating written content and verifying it's effectiveness is easier than ever.
For example, we have a skill to /create-new-endpoint. The skill contains a detailed checklist of all the boilerplate tasks that an engineer needs to do in addition to implementing the logic (e.g. update OpenAPI spec, add integration tests, endpoint boilerplate, etc.). The engineer manually invokes the skill from the CLI via slash commands, provides a JIRA ticket number, and engages in some brief design discussion. The LLM is consistently able to one-shot these tickets in a way that matches our existing application architecture.
Check the results.
I have not yet tested this at scale but give me six months.
.claude/skills
.codex/skills
.opencode/skills
.github/skillsCodex started this and OpenCode followed suit with the hour.
But I don't see why you need a strict standard for "an informal description of how to do a particular task". I say "informal" because it's necessarily written in prose -- if it were formal, it'd be a shell script.
Skills seem a bit early to standardize. We are so early in this, why do we want to handcuff our creativity so soon?
[1]: https://code.claude.com/docs/en/skills#control-who-invokes-a... [2]: https://opencode.ai/docs/skills/#disable-the-skill-tool [3]: https://developers.openai.com/codex/skills/#enable-or-disabl...
The problem I see now is that everyone wants to be the winner in a hype cycle and be the standards bringer. How many "standards" have we seen put out now? No one talks about MCP much anymore, langchain I haven't seen in more than a year, will we be talking about Skills in another year?
Why do I want to throw away my dependency management system and shared libraries folder for putting scripts in skills?
What tools do they have access to, can I define this so it's dynamic? Do skills even have a concept for sub tools or sub agents? Why do I want to put references in a folder instead of a search engine? Does frontmatter even make sense, why not something closer to a package.json in a file next to it?
Does it even make sense to have skills in the repo? How do I use them across projects? How do we build an ecosystem and dependency management system for skills (which are themselves versioned)
You are right. I have edited my post slightly.
> Why do I want to throw away my dependency management system and shared libraries folder for putting scripts in skills?
You don't have to put scripts in skills. The script can be anywhere the agent can access. The skill just needs to tell the LLM how to run it.
> Does it even make sense to have skills in the repo? How do I use them across projects?
You don't have to put them in the repo. E.g. with Claude Code you can put project-specific skills in `.claude/skills` in the repo and system-wide skills in `~/.claude/skills`.
3. generalize: how do I store, maintain, and distribute skills shared by employees who work on multiple repos. Sounds like standard dependency management to me. Does to some of the people building collections / registries. Not sure if any of them account for versioning, have not seen anything tied to lock files (though I'd avoid that by using MVS for dep selection)
It's one of my biggest pet peeves with a lot of these tools (now admittedly a lot of them have a config env var to override, but it'd be nice if they just did the right thing automatically).
I treat my skills the same as I would write tiny bash scripts and fish functions in the days gone to simplify my life by writing 2 words instead of 2 sentences. Tiny improvement that only makes sense for a programmer at heart.
standards are good but they slow development and experimentation
>any time you want to search for a skill in `./codex`, search instead in `./claude`
and continue as you were.
.opencode/skill .opencode/skills
[1]: https://opencode.ai/docs/skills/#place-filesI wrote a rant about skills a while ago that's still relevant in some ways: https://sibylline.dev/articles/2025-10-20-claude-skills-cons...
It feels like people think they are something new and novel that there is something technical about them that one needs to learn.
"Skills" are just readmes on particular subjects. They can be for whatever purpose you want them to be. Any time you find that you need to repeatedly tell the agent about something, you can put it in a "skill".
You don't even have to follow the skill standard and use the standard folder and filenames. That's just so the agent can auto find and load them. You can name them whatever you want and put them wherever you want and just add them to context yourself when you need them.
The pattern that works: skills that represent complete, self-contained sequences - "do X, then Y, then Z, then verify" - with clear trigger conditions. The agent recognizes these as distinct modes of operation rather than optional reference material.
What doesn't work: skills as general guidelines or "best practices" documents. These get lost in context or ignored entirely because the agent has no clear signal for when to apply them.
The mental model shift: think of skills less like documentation and more like subroutines you'd explicitly invoke. If you wouldn't write a function for it, it probably shouldn't be a skill.
Example: A Python file is read or written, guidance is given back (once, with a long cooldown) to activate global and company-specific Python skills. Claude activates the skills and writes Python to our preference.
Another value add is that theoretically agents should trigger skills automatically based on context and their current task. In practice, at least in my experience, that is not happening reliably.
But on the other hand, in Claude Code, at least, the skill "foo" is accessible as /foo, as the generalisation of the old commands/ directory, so I tend to favour being explicit that way.
[1] https://skills.sh/vercel-labs/agent-skills/web-design-guidel... [2] https://github.com/vercel-labs/agent-skills/blob/main/skills...
All of these SKILLS.md/AGENTS.md/COMMANDS.md are just simple prompts, maybe even some with context links.
And quite dangerous.
I think this is (mostly) a solvable problem. The current generation of SotA models wasn’t RLVR-trained on skills (they didn’t exist at that time) and probably gets slightly confused by the way the little descriptions are all packed into the same tool call schema. (At least that’s how it works with Claude Code.) The next generation will have likely been RLVRed on a lot of tasks where skills are available, and will use them much more reliably. Basically, wait until the next Opus release and you should hopefully see major improvements. (Of course, all this stuff is non-deterministic blah blah, but I think it’s reasonable to expect going from “misses the skill 30% of the time” to “misses it 2% of the time”.)
> In 56% of eval cases, the skill was never invoked. The agent had access to the documentation but didn't use it. Adding the skill produced no improvement over baseline.
> …
> Skills aren't useless. The AGENTS.md approach provides broad, horizontal improvements to how agents work with Next.js across all tasks. Skills work better for vertical, action-specific workflows that users explicitly trigger,
https://vercel.com/blog/agents-md-outperforms-skills-in-our-...
Probably the more skills you have, the more confused it might get. The more potentially conflicting instructions you give the harder it gets for an LLM to figure out what you actually want to happen.
If I catch it going off script, I often interrupt it and tell it what to do and update the relevant skill. Seems to work pretty good. Keeping things simple seems to work.
I saw someone's analysis a few days ago and they found that their agents were more accurate when just dumping the skill context directly into AGENTS.md
Edit: btw I’ve gone from genai value denier to skeptic to cautiously optimistic to fairly impressed in the span of a year. (I’m a user of Claude code)
Skills as a pattern let the agent scan a lightweight index of descriptions, then pull in the full instructions only when relevant. Whether that's a .skills/ folder or a README index pointing to separate docs doesn't matter much. What matters is the separation between "what capabilities exist" and "how to execute this specific one."
The standardization part is mostly useful for distribution — being able to install and share skills across projects without manually wiring them up. Same reason we standardize package formats even though you could just copy-paste code.
My tooling was previously adding in AI hints with CLAUDE.md, Cursor Rules, Windsurf Rules, AGENTS.md, etc., but I recently switched to using only AGENTS.md and SKILLS. I appreciate the standardization from this perspective.
My model for skills is similar to this, but I extended it to have explicit use when and don’t use when examples and counter examples. This helped the small model which tended to not get the nuances of a free form text description.
I have a feeling that otherwise it becomes too messy for agents to reliably handle a lot of complex stuff.
For example, I have OpenClaw automatically looking for trending papers, turning them into fun stories, and then sending me the text via Telegram so I can listen to it in the ElevenLabs app.
I'm not sure whether it's better to have the story-generating system behind an API or to code it as a skill — especially since OpenClaw already does a lot of other stuff for me.
My general design principle for agents, is that the top level context (ie claude.md, etc) is primarily "information about information", a list of skills, mcps, etc, a very general overview, and a limited amount of information that they always need to have with every request. Everything more specific is in a skill, which is mostly some very light touch instructions for how to use various tools we have (scripts, apis and mcps).
I have found that people very often add _way_ to much information into claude.md's and skills. Claude knows a lot of stuff already! Keep your information to things specific whatever you are working on that it doesn't already know. If your internal processes and house style are super complicated to explain to claude and it keeps making mistakes, you might want to adapt to claude instead of the other way around. Claude itself makes this mistake! If you ask it to build a claude md, it'll often fill it with extraneous stuff that it already knows. You should regularly trim it.
I prefer to completely invert this problem and provoke the model into surfacing whatever desired behavior & capability by having the environment push back on it over time.
You get way more interesting behavior from agents when you allow them to probe their environment for a few turns and feed them errors about how their actions are inappropriate. It doesn't take very long for the model to "lock on" to the expected behavior if you are detailed in your tool feedback. I can get high quality outcomes using blank system prompts with good tool feedback.
> You get way more interesting behavior from agents when you allow them to probe their environment for a few turns and feed them errors about how their actions are inappropriate. It doesn't take very long for the model to "lock on" to the expected behavior if you are detailed in your tool feedback. I can get high quality outcomes using blank system prompts with good tool feedback.
My primary way of developing skills (and previously cursor rules) is to start blank, let the LLM explore, and correct it as we go until the problem is solved. I then ask it to generate a skill (or rule) that explains the process in a way that it could refer to to repeat this again. Next time something like that comes up, we use the skill. If any correction is needed, I tell it to update the skill.
That way we get to have it explore and get more context initially, and then essentially "cache" that summarized context on the process for another time.
Or knowledge that is in their training data, but the majority of its training data isn't following the best practices? (e.g. Web Content Accessibility Guidelines)
I think there is a fair point in those cases of having a bunch of markdown docs files detailing them
- There is no reason you have to expose the skills through the file system. Just as easy to add tool-call to load a skill. Just put a skill ID in the instruction metadata. Or have a `discover_skills` tool if you want to keep skills out of the instructions all together.
- Another variation is to put a "skills selector" inference in front of your agent invocation. This inference would receive the current inquiry/transcript + the skills metadata and return a list of potentially relevant skills. Same concept as a tool selection, this can save context bandwidth when there are a large number of skills
Yes, treating the "front matter" of skill as "function definition" of tool calls as kind of an equivalence class.
This understanding helped me create an LLM agnostic (also sandboxed) open-skills[1] way before this standardization was proposed.
1. Open-skills: https://github.com/instavm/open-skills
So for your example, yes you might tell the agent "write a fantasy story" and you might have a "storytelling skill" that explains things like charater arcs, tropes, etc. You might have a separate "fiction writing" skill that defines writing styles, editing, consistency, etc.
All of this stuff is just 'prompt management' tooling though and isn't super commplicated. You could just paste the skill content into your context and go from there, this just provides a standardized spec for how to structure these on-demand context blocks.
LLM-powered agents are surprisingly human-like in their errors and misconceptions about less-than-ubiquitous or new tools. Skills are basically just small how-to files, sometimes combined with usage examples, helper scripts etc.
Whenever there's an agent best practice (skill) or 'pre-prompt' that you want to use all the time, turn it into a text expansion snippet so that it works no matter where you are.
As an example, I have a design 'pre-prompt' that dictates a bunch of steering for agents re: how to pick style components, typography, layout, etc. It's a few paragraphs long and I always send it alongside requests for design implementation to get way-better-than-average output.
I could turn it into a skill, but then I'd have to make sure whatever I'm using supported skills -- and install it every time or in a way that was universally seen on my system (no, symlinking doesn't really solve this).
So I use AutoHotkey (you might use Raycast, Espanso, etc) to config that every time I type '/dsn', it auto-expands into my pre-prompt snippet.
Now, no matter whether I'm using an agent on the web/cloud, in my terminal window, or in an IDE, I've memorized my most important 'pre-prompts' and they're a few seconds away.
It's anti-fragile steering by design. Call it universal skill injection.
I liked that idea to have something more CLI agnostic
Neo: Ju jitsu? I'm gonna learn Ju jitsu.
[Tank winks and loads the program] Neo: Holy shit!
I have been trying to build skills to do various things on our internal tools, and more often then not, when it doesn't work, it is as much a problem with _our tools_ as it is with the LLM. You can't do obvious things, the documentation sucks, api's return opaque error messages. These are problems that humans can work around because of tribal knowledge, but LLMs absolutely cannot, and fixing it for LLM's also improves it for your human users, who probably have been quietly dealing with friction and bullshit without complaining -- or not dealing with it and going elsewhere.
If you are building a product today, the feature you are working on _is not done_ until Claude Code can use it. A skill and an MCP isn't a "nice to have", it is going to be as important as SEO and accessibility, with extremely similar work to do to enable it.
Your product might as well not exist in a few years if it isn't discoverable by agents and usable by agents.
This is an interesting take. I admit I've never thought this way.
You can have the perfect scraping skill, but if the target blocks your requests, you're stuck. The hard problems are downstream.
Design of https://www.skillcreator.ai/explore for me it's more useful. At least I can search by category, framework, language and I also see much more information what some skill does at a glance. I don't know why vercel really wanted to do it completely black and white - colors used and done with a taste gives useful context and information.
slop?
…including, apparently, the clueless enthusiasm for people to “share” skills.
MCP is also perfectly fine when you run your own MCP locally. It’s bad when you install some arbitrary MCP from some random person. It fails when you have too many installed.
Same for skills.
It’s only a matter of time (maybe it already exists?) until someone makes a “package manager” for skills that has all of the stupid of MCP.
MCP is giving the agents a bunch of functions/tools it can use to interact with some other piece of infrastructure or technology through abstraction. More like a toolbox full of screwdrivers and hammers for different purposes, or a high-level API interface that a program can use.
Skills are more similar to a stack of manuals/books in a library that teach an agent how to do something, without polluting the main context. For example a guide how to use `git` on the CLI: The agent can read the manual when it needs to use `git`, but it doesn’t need to have the knowledge how to use `git` in it’s brain when it’s not relevant.
A directory of skills... same thing
You can use MCP the same way as skills with a different interface. There are no rules on what goes into them.
They both need descriptions and instruction around them, they both have to be is presented and index/instn to the agent dynamically, so we can tell them what they have access to without polluting the context.
See the Anthropic post on moving MCP servers to a search function. Once you have enough skills, you are going to require the same optimization.
I separate things in a different way
1. What things do I force into context (agents.md, "tools" index, files) 2. What things can the agent discorver (MCP, skills, search)
https://github.com/SimHacker/moollm/blob/main/designs/LEELA-...
I call this "speed of light" as opposed to "carrier pigeon". In my experiments I ran 33 game turns with 10 characters playing Fluxx — dialogue, game mechanics, emotional reactions — in a single context window. Try that with MCP and you're making hundreds of round-trips. Skills can compose and iterate at the speed of light, while MCP forces serialization and waiting for carrier pigeons.
https://github.com/SimHacker/moollm/tree/main/skills/speed-o...
Skills also compose. MOOLLM's cursor-mirror skill introspects Cursor's internals via a sister Python script — tool calls, context assembly, thinking blocks, chat history. Everything, for all time, even after Cursor's chat has summarized and forgotten: it's still all there and searchable!
https://github.com/SimHacker/moollm/tree/main/skills/cursor-...
MOOLLM's skill-snitch skill composes with cursor-mirror for security monitoring of untrusted skills. Like Little Snitch watches your network, skill-snitch watches skill behavior — comparing declared tools against observed runtime behavior.
https://github.com/SimHacker/moollm/tree/main/skills/skill-s...
You can even use skill-snitch like a virus scanner to review and monitor untrusted skills. I have more than 100 skills and had skill-snitch review each one including itself -- you can find them in the skill-snitch-report.md file of each skill in MOOLLM. Here is skill-snitch analyzing and reporting on itself, for example:
https://github.com/SimHacker/moollm/blob/main/skills/skill-s...
MOOLLM's thoughtful-commitment skill also composes with cursor-mirror to trace the reasoning behind git commits.
https://github.com/SimHacker/moollm/tree/main/skills/thought...
MCP is still valuable for connecting to external systems. But for reasoning, simulation, and skills calling skills? In-context beats tool-call round-trips by orders of magnitude.
Ok I'm glad I'm not the only one who wondered this. This seems like simplified MCP; so why not just have it be part of an MCP server?
So if you want others to read the output you'll have to de-slopify it, ideally make it shorter and remove the tells.
If I go by good faith these days and trust that someone didn't just upload llm hallucinated bullshit, I'd sadly just be reading slop all day and not learning anything or worse even get deceived by hallucinations and make wrong assumptions. It's just a waste of someones precious life time.
LLMs can read through slop all day, humans can not without getting extremely annoyed.
It doesn't look like slop at all to me. GP claimed that this was written by ai without evidence, which I assumed to be based in bias, based on GP's comment history: https://news.ycombinator.com/threads?id=jondwillis They complaint they have about the writing style is not the style that is emblematic of Ai slop. Then, considering the depth of analysis and breadth of connection, this is not something current Ai is up to producing.
Are you also assuming the article was written by an Ai?