Claude Code Mastery
Build better agents, skills, and workflows with Claude Code. Battle-tested patterns from building Claude Code itself, used daily at Anthropic with hundreds of skills in production.
The Three Laws of Agent Design
Before anything else, internalize these:
- The filesystem is how agents think. Write to disk, grep, process. Don't stuff 100 items into context.
- Prompt caching is architecture, not optimization. Design your entire system around prefix stability.
- Give Claude a way to verify its work. This alone 2-3x the quality of output.
When Building an Agent
Action Space Design
Consult tool-design reference for the complete tool design framework.
Most agent failures are tool design problems, not model problems. Design tools by imagining yourself solving the problem:
- Paper = minimal (just text output). Limited but safe.
- Calculator = specific tools (custom tool_use). More capable but rigid.
- Computer = bash + filesystem. Most powerful, most flexible.
Start with bash. Claude Code started with 4 tools: read, write, edit, bash. That covers 80% of tasks. Add custom tools only when bash genuinely can't do the job.
Anti-pattern: 50 custom tools, one for each operation. This creates a "needle in haystack" problem — the model's reasoning degrades as tool count increases.
The right question: "What permissions and environment should I provide?" — not "What should I ask?"
The Agent Loop
Every agent follows three phases:
- Gather Context — Use agentic search (grep, find, bash) before semantic search. Let Claude build its own context through progressive disclosure.
- Take Action — Bash for flexible operations, custom tools for frequently-used primaries, MCP for external services.
- Verify Work — Explicit rules with feedback (linting), visual feedback (screenshots), LLM-as-judge for fuzzy evaluation.
Consult agent-loop reference for detailed patterns per phase.
Progressive Disclosure — Don't Load Everything Upfront
Agents get dumber when you give them too much information upfront.
At startup, load only skill names and descriptions. Let Claude discover details when needed. This applies to:
- Skills (name + trigger only, full content on demand)
- MCP tools (stubs with
defer_loading: true, full schema via ToolSearch) - Reference files (agent reads them when it decides to)
- Data (write to files, grep when needed)
When Writing Skills
Skill Categories
Consult skill-categories reference for all 9 categories with examples.
The key categories:
- Library & API Reference — How to use a library/CLI. Focus on gotchas, not obvious docs.
- Product Verification — How to test and verify output. Worth a week of engineering.
- Data Fetching & Analysis — Connect to data stacks with credentials and common queries.
- Business Process Automation — Repetitive workflows as one command.
- Code Scaffolding — Generate framework boilerplate with natural language requirements.
- Code Quality & Review — Enforce org standards. Run as hooks or in CI.
- CI/CD & Deployment — Fetch, push, deploy with testing and rollback.
- Runbooks — Symptom -> investigation -> structured report.
- Infrastructure Operations — Routine maintenance with safety guardrails.
Skill Structure
A skill is a folder, not a file:
skills/skill-name/
SKILL.md # Instructions + trigger conditions
references/ # Detailed docs Claude reads on demand
api.md # Function signatures, usage examples
gotchas.md # Known failure points (THE highest-signal content)
examples.md # Good and bad output samples
scripts/ # Helper bash/Node scripts
verify.sh # Verify output has required sections
fetch.sh # Pre-built data fetching
assets/ # Templates, configs
template.md # Output template to copy
config.json # Skill metadata, setup data
The Description Field Is a Trigger, Not a Summary
Bad: "Generates meeting summaries"
Good: "Use when a new meeting is detected or user asks to summarize, recap, or analyze a specific meeting. Relevant when user says 'what happened in', 'summarize the', 'meeting notes for', 'action items from'."
The description is what Claude scans to decide if this skill applies. Write it as triggering conditions.
Build a Gotchas Section
The highest-signal content in any skill is the Gotchas section.
Start small. Every time Claude fails at a task, add the failure mode:
## Gotchas
- API returns empty results for queries under 3 characters
- Date field is UTC, not local timezone
- Rate limit is 100/min, batch requests in groups of 50
- Transcript can be null even when notes exist
This is how skills get better over time. The gotchas section is a living document.
Don't Over-Specify Steps
Anti-pattern:
Step 1: Call API with query
Step 2: Take first 3 results
Step 3: Format as bullets
Step 4: Post to Slack
Correct pattern:
Search for relevant data, get details for the most relevant items,
and synthesize a clear answer. Reference sources by name and date.
Gotchas:
- Results capped at 50, paginate for exhaustive search
- Null fields are common, check before including
Tell Claude WHAT, not HOW. It's smart enough to figure out the steps.
Include Helper Scripts
Don't make Claude reconstruct boilerplate each time. Provide scripts it can compose:
# scripts/fetch-data.sh — Claude calls this via bash
#!/bin/bash
curl -s "$API_URL/search?q=$1" | jq '.results[:10]'
Claude spends tokens on composition and reasoning, not on rebuilding the fetch logic.
Store Execution History
data/standups.log # Append-only log of every standup generated
data/last-digest.json # State from last run
Claude reads its own history and can tell what's changed since the last run. Use ${CLAUDE_PLUGIN_DATA} for durable storage across skill upgrades.
When Structuring CLAUDE.md
A team should share a single CLAUDE.md, checked into git, updated multiple times a week. Anytime Claude does something incorrectly, add it.
Consult claude-md reference for the complete structure guide.
The Rules
- < 200 lines. One engineer had 847 lines and got worse results than a 100-line version. Claude ignores bloated context.
- Update on every mistake. End your prompt: "Update CLAUDE.md so this doesn't happen again."
- Treat it like code. Review changes, test behavior, track in git.
- Focus on what challenges Claude's defaults. Don't document what Claude already knows.
Recommended Sections
# Project Name
## What This Is
One paragraph. What, why, for whom.
## Tech Stack
Languages, frameworks, key dependencies.
## Build & Test
Commands to build, test, lint, deploy.
## Architecture
Key directories, entry points, data flow.
## Conventions
Naming, patterns, anti-patterns specific to this project.
## Gotchas
Things Claude gets wrong. Updated continuously.
When Optimizing Prompt Caching
At Anthropic, prompt caching is treated as critical infrastructure — alerts on cache hit rate, SEVs when they drop too low.
Consult prompt-caching reference for the complete technical guide.
The Ordering Rule
Static content first, dynamic content last:
1. System prompt + tool definitions (globally cached)
2. CLAUDE.md / project context (cached per project)
3. Session context / MEMORY.md (cached per session)
4. Conversation messages (new each turn)
The Five Commandments
- Never change tools mid-session. Adding/removing a tool invalidates cache for the ENTIRE conversation.
- **Ne