The agent passes the Turing test...
It's barely readable to humans, but directly and efficiently relevant to LLM's (direct reference -> referent, without language verbiage).
This suggests some (compressed) index format that is always loaded into context will replace heuristics around agents.md/claude.md/skills.md.
So I would bet this year we get some normalization of both the indexes and the referenced documentation (esp. matching terms).
Possibly also a side issue: API's could repurpose their test suites as validation to compare LLM performance of code tasks.
LLM's create huge adoption waves. Libraries/API's will have to learn to surf them or be limited to usage by humans.
Obviously directly including context in something like a system prompt will put it in context 100% of the time. You could just as easily take all of an agent's skills, feed it to the agent (in a system prompt, or similar) and it will follow the instructions more reliably.
However, at a certain point you have to use skills, because including it in the context every time is wasteful, or not possible. this is the same reason anthropic is doing advanced tool use ref: https://www.anthropic.com/engineering/advanced-tool-use, because there's not enough context to straight up include everything.
It's all a context / price trade off, obviously if you have the context budget just include what you can directly (in this case, compressing into a AGENTS.md)
How do you suppose skills get announced to the model? It's all in the context in some way. The interesting part here is: Just (relatively naively) compressing stuff in the AGENTS.md seems to work better than however skills are implemented.
That seems roughly equivalent to my unenlightened mind!
I think Vercel mixes skills and context configuration up. So the whole evaluation is totally misleading because it tests for two completely different use cases.
To sum it up: Vercel should us both files, agents.md is combination with skills. Both functions have two totally different purposes.
Having an agent manage its own context ends up being extraordinarily useful, on par with the leap from non-reasoning to reasoning chats. There are still issues with memory and integration, and other LLM weaknesses, but agents are probably going to get extremely useful this year.
And how do you guarantee that said relevant things actually get put into the context?
OP is about the same problem: relevant skills being ignored.
1. You absolutely want to force certain context in, no questions or non-determinism asked (index and sparknotes). This can be done conditionally, but still rule based on the files accessed and other "context"
2. You want to keep it clean and only provide useful context as necessary (skills, search, mcp; and really a explore/query/compress mechanism around all of this, ralph wiggum is one example)
Which makes sense.
& some numbers that prove that.
The article also doesn't mention that they don't know how the compressed index output quality. That's always a concern with this kind of compression. Skills are just another, different kind of compression. One with a much higher compression rate and presumably less likely to negatively influence quality. The cost being that it doesn't always get invoked.
> If you think there is even a 1% chance a skill might apply to what you are doing, you ABSOLUTELY MUST invoke the skill. IF A SKILL APPLIES TO YOUR TASK, YOU DO NOT HAVE A CHOICE. YOU MUST USE IT.
While this may result in overzealous activation of skills, I've found that if I have a skill related, I _want_ to use it. It has worked well for me.
works pretty well
Create a folder called .context and symlink anything in there that is relevant to the project. For example READMEs and important docs from dependencies you're using. Then configure your tool to always read .context into context, just like it does for AGENTS.md.
This ensures the LLM has all the information it needs right in context from the get go. Much better performance, cheaper, and less mistakes.
Cheaper because it has the right context from the start instead of faffing about trying to find it, which uses tokens and ironically bloats context.
It doesn't have to be every bit of documentation, but putting the most salient bits in context makes LLMs perform much more efficiently and accurately in my experience. You can also use the trick of asking an LLM to extract the most useful parts from the documentation into a file, which you then re-use across projects.
I expect the benefit is from better Skill design, specifically, minimizing the number of steps and decisions between the AI’s starting state and the correct information. Fewer transitions -> fewer chances for error to compound.
1. Those I force into the system prompt using rules based systems and "context"
2. Those I let the agent lookup or discover
I also limit what gets into message parts, moving some of the larger token consumers to the system prompt so they only show once, most notable read/write_file
If your goal is to always give a permanent knowledge base to your agent that's exactly what AGENTS.md is for...
Which is why I use a skill that is a command, that routes requests to agents and skills.
1. Start from the Claude Code extracted instructions, they have many things like this in there. Their knowledge share in docs and blog on this aspect are bar none
2. Use AGENTS.md as a table of contents and sparknotes, put them everywhere, load them automatically
3. Have topical markdown files / skills
4. Make great tools, this is still opaque in my mind to explain, lots of overlap with MCP and skills, conceptually they are the same to me
5. Iterate, experiment, do weird things, and have fun!
I changed read/write_file to put contents in the state and presented in the system prompt, same for the agents.md, now working on evals to show how much better this is, because anecdotally, it kicks ass
What a wonderful world that would be.
I guess you need to make sure your file paths are self-explanatory and fairly unique, otherwise the agent might bring extra documentation into the context trying to find which file had what it needed?
Skills are new. Models haven't been trained on them yet. Give it 2 months.
It's a difference of "choose whether or not to make use of a skill that would THEN attempt to find what you need in the docs" vs. "here's a list of everything in the docs that you might need."
With explicit skills, you can add new capabilities modularly - drop in a new skill file and the agent can use it. With a compressed blob, every extension requires regenerating the entire instruction set, which creates a versioning problem.
The real question is about failure modes. A skill-based system fails gracefully when a skill is missing - the agent knows it can't do X. A compressed system might hallucinate capabilities it doesn't actually have because the boundary between "things I can do" and "things I can't" is implicit in the training rather than explicit in the architecture.
Both approaches optimize for different things. Compressed optimizes for coherent behavior within a narrow scope. Skills optimize for extensibility and explicit capability boundaries. The right choice depends on whether you're building a specialist or a platform.
I have a skill in a project named "determine-feature-directory" with a short description explaining that it is meant to determine the feature directory of a current branch. The initial prompt I provide will tell it to determine the feature directory and do other work. Claude will even state "I need to determine the feature directory..."
Then, about 5-10% of the time, it will not use the skill. It does use the skill most of the time, but the low failure rate is frustrating because it makes it tough to tell whether or not a prompt change actually improved anything. Of course I could be doing something wrong, but it does work most of the time. I miss deterministic bugs.
Recently, I stopped Claude after it skipped using a skill and just said "Aren't you forgetting something?". It then remembered to use the skill. I found that amusing.
It’s really silly to waste big model tokens on throat clearing steps
Basically use a small model up front to efficiently trigger the big model. Sub agents are at best small models deployed by the bigger model (still largely manually triggered in most workflows today)
TFA says they added an index to Agents.md that told the agent where to find all documentation and that was a big improvement.
The part I don't understand is that this is exactly how I thought skills work. The short descriptions are given to the model up-front and then it can request the full documentation as it wants. With skills this is called "Progressive disclosure".
Maybe they used more effective short descriptions in the AGENTS.md than they did in their skills?
*You are the Super Duper Database Master Administrator of the Galaxy*
does not improve the model ability reason about databases?
they used prisma to handle their database interactions. they preached tRPC and screamed TYPE SAFETY!!!
you really think these guys will ever again touch the keyboard to program? they despise programming.