Authoring an Agent Skill
You are helping a user author or improve an Agent Skill. Skills are markdown files an agent loads to handle domain-specific work it would otherwise get wrong. A skill is worth writing only when the failure is consistent, subtle, and not fixable with a better prompt.
Follow the five-stage process below. Do not skip stages.
Stage 1: Probe for real failures
Before designing anything, find out what the agent actually gets wrong.
- Ask the user for 5 to 10 representative prompts that real users would send.
- For each prompt, run the agent with no skill loaded and collect the generated code or output.
- Run the output against real data, real APIs, or a real session. Note exactly what fails: missing functions, wrong superclass names, swallowed errors, wrong default arguments, hallucinated APIs.
- Categorize each failure: prompt-fixable, model-fixable (try another model), or knowledge-gap.
Only knowledge-gap failures justify a skill. If a better prompt fixes it, use a better prompt.
Stage 2: Identify the real knowledge gaps
Group the failures from Stage 1 by root cause. Common categories:
- Pattern-matched from another language. Agent invents a function because the
same idiom exists in Python or Java (the blog's example: an
ormdelete()that doesn't exist in MATLAB). - Wrong namespace or class path. Agent gets the verb right but the path wrong
(
database.orm.Mappablevs.database.orm.mixin.Mappable). - Missing guard or precondition. Agent omits a check the runtime requires (a
nargin == 0guard for objects an ORM creates empty). - Wrong defaults or argument order. Agent picks plausible-but-wrong defaults the documentation doesn't make obvious.
- Drift between major API versions. Agent uses an older or newer signature than the one the user actually has.
For each category, write down the specific rule the skill needs to teach. One rule per failure.
Stage 3: Design the skill
Apply these structural rules. The agent may not read your whole skill, so structure matters.
- Frontmatter description is a trigger spec, not a summary. It should describe
when to invoke the skill, with concrete trigger phrases the agent will match on.
The agent reads this to decide whether to load you. Avoid
:(colon followed by space) inside the description value — strict YAML parsers will read it as a nested mapping and fail to load the skill. Use an em dash or comma instead. - Most critical rules first. Put the rules that fix the most failures at the top of the body. Don't bury the load-bearing rule.
- Progressive disclosure. Common cases up front. Edge cases, exceptions, and
variant APIs in later sections or in
references/. - One topic per section. Use H2 (
##) per topic. Consistent section order across your skill family makes it predictable for the agent. - Show, don't tell. Where a rule is about syntax, include a 2-to-5 line code example with the failing pattern and the corrected pattern side by side.
- Leave out what the agent gets right. If your probing showed the agent
handles
addComponentcorrectly, don't documentaddComponent. Skills are compensators for failure, not API reference. - Name common pitfalls explicitly. A "Common pitfalls" section near the bottom for known gotchas the user might hit even with the skill loaded.
Suggested section order:
## When this skill applies (1-2 paragraphs)
## Core rules (the load-bearing rules, in priority order)
## API patterns (code examples per category)
## Common pitfalls (gotchas, including known limitations)
## See also (links to references/ and related skills)
Use the template at templates/SKILL-template.md as
a starting point.
Stage 4: Iterate against runnable examples
Run the same Stage 1 prompts with the skill loaded and the failures should drop.
- For each remaining failure, decide: tighten the skill, accept the failure (with a documented pitfall), or escalate (the failure isn't a skill problem).
- Test across at least two models if the user expects cross-model use. Phrasing that works for one model can be ignored by another.
- Read every generated output. Don't trust the model to self-report success.
Keep a short test log: prompt, model, pre-skill result, post-skill result. The log is the evidence that the skill works; without it, you're guessing.
Stage 5: Maintain
Skills aren't done. Models change, APIs change, and yesterday's failure becomes today's strength (and vice versa).
- Revisit the test log when the user's product version changes, when a new model ships, or when users report fresh failures.
- Remove rules the agent now handles correctly without help. A bloated skill loses attention budget.
- When a rule needs more depth than fits, move it to
references/and link from the main body.
Anti-patterns
- API encyclopedia. Writing down everything the API does. Skills are not docs.
- Theoretical gaps. Writing rules for failures you assumed without ever running the agent.
- Tone or style guidance only. Telling the agent to "be helpful and accurate" with no domain-specific content.
- Burying the lede. Twenty paragraphs of background before the rule that prevents the bug.
- One mega-skill. A single skill covering five unrelated domains. Split it.
- Hallucinated function names. Trusting your own memory of the API when writing examples; run them.
Decision flow
When the user asks for help, follow this order:
- Have they probed the agent for real failures yet? If not, walk them through Stage 1 before discussing design.
- Do they have a list of specific failures with root causes? If not, do Stage 2 with them now.
- Are they writing a new skill or improving an existing one? If improving, read the current SKILL.md, then identify which rules are load-bearing, which are dead weight, and which are missing.
- Walk through Stages 3 and 4 explicitly. Don't draft a full SKILL.md until the user has a concrete rule list.