Agents Best Practices
Use this skill when the user asks how to build, improve, debug, or evaluate an agentic harness. This is a general-purpose agent architecture skill. Coding agents are one subdomain only; apply the same principles to research, finance, legal, support, operations, sales, healthcare, education, data analysis, procurement, and workflow automation agents.
Core stance
An agent harness is the control plane around a model. The model proposes actions; the harness validates, authorizes, executes, records, summarizes, and returns observations. Keep the loop simple and make the runtime rigorous.
Default architecture:
user/task
-> instruction and context builder
-> model call
-> tool/action proposal
-> schema validation
-> permission decision
-> execution or approval pause
-> structured observation
-> context update
-> repeat within budget or finish
When to activate this skill
Use this skill for prompts involving any of these intents:
- build an agent, agentic workflow, AI worker, autonomous assistant, or harness;
- create a domain-specific MVP agent design, starter harness, implementation blueprint, or first production-safe version;
- choose between OpenAI, Anthropic, OpenAI-compatible APIs, direct tool loops, hosted tools, or SDKs;
- design tools, permissions, guardrails, approval flows, or sandboxing;
- create planning mode, goal mode, todo tracking, or long-running task behavior;
- add context compaction, memory, retrieval, scoped instructions, or prompt hierarchies;
- attach Agent Skills, reusable workflows, MCP servers, external connectors, or tool search;
- audit an existing agent for reliability, cost, prompt-cache hit rate, safety, latency, or observability;
- create system prompts or developer instructions for a domain-specific agent;
- make source-of-truth knowledge, validation signals, logs, metrics, or workflow state legible to an agent.
Do not use this skill for ordinary single-turn writing, translation, or Q&A unless the user is asking about the design of an agent that will perform those tasks.
How to use this skill
First, identify the user's design problem:
- Domain: what work the agent performs.
- Autonomy level: answer-only, draft-only, approval-gated action, or autonomous action within policy.
- Risk level: read-only, internal write, external communication, financial, legal, healthcare, security, destructive, or privileged.
- State duration: single turn, multi-turn session, resumable workflow, or long-running goal.
- Tool surface: internal APIs, hosted tools, MCP/external connectors, browser, sandbox, filesystem, database, communication, or computation.
- Validation: what proves the task is complete.
Then load the most relevant reference files, not all files by default. If the user asks to make or build an agent for a domain, default to MVP Builder Mode.
MVP Builder Mode
When the user asks to make, build, design, scaffold, or specify an agent for a domain, produce a concrete domain-specific MVP harness blueprint, not only advice. Use mvp-agent-blueprint.md as the primary reference and load other references as needed.
Default behavior:
- Infer a reasonable first version from the user's domain and stated constraints.
- State assumptions briefly instead of blocking on missing details.
- Design the smallest safe harness that can accomplish useful work.
- Include the core agentic loop, tool registry, permission matrix, context/memory/compaction, planning mode, goal-like loop criteria, skills/connectors, prompt-cache/cost strategy, observability, evals, and launch path.
- Mark high-risk actions as draft-only or approval-gated by default.
- Avoid multi-agent orchestration until the single-agent MVP has measurable failure cases that require decomposition.
Reference map
- Read mvp-agent-blueprint.md first when the user asks to create a new domain-specific agent or MVP harness.
- Read architecture.md for the full harness model and component boundaries.
- Read agent-legibility-feedback-loops.md for source-of-truth knowledge bases, agent-legible environments, validation loops, mechanical invariants, and recurring cleanup.
- Read agentic-loop.md for the provider-neutral loop, step budgets, retries, and loop variants.
- Read tools-and-permissions.md for tool contracts, risk classes, approval logic, structured results, and sandboxing.
- Read context-memory-compaction.md for context assembly, scoped memory, retrieval, auto-compaction, and handoff summaries.
- Read prompt-caching-and-cost.md for stable-prefix design, cache-aware context ordering, compaction/cache tradeoffs, telemetry, and cost control.
- Read planning-and-goals.md for planning mode, approval-gated execution, goals, checkpoints, and stopping conditions.
- Read skills-and-connectors.md for Agent Skills, progressive disclosure, MCP, external connectors, tool search, and attachment strategy.
- Read system-prompts-instructions.md for system/developer/user instruction hierarchy and prompt templates.
- Read provider-api-patterns.md for OpenAI, Anthropic, and OpenAI-compatible API implementation patterns.
- Read security-evals-observability.md for guardrails, threat models, tracing, evals, and launch gates.
- Read checklists.md for condensed implementation and audit checklists.
- Read source-links.md for official links and provider-specific references.
- Read coverage-audit.md to verify the skill covers the requested harness topics.
Default answer structure when advising a user
When the user asks for guidance, produce a concrete architecture, not generic principles:
- MVP boundary: smallest useful version, assumptions, non-goals, and launch criteria.
- Harness boundary: what the model does versus what application code does.
- Loop: how model calls, tool calls, tool results, stopping, and retries work.
- Instructions: system/developer/user instruction hierarchy and scoped memory.
- Tools: tool registry, schemas, outputs, risk classes, permissions, and approval points.
- Context: retrieval, memory, summarization, cache-aware ordering, compaction triggers, and rehydration.
- Planning/goals: when to enter planning mode, when to run a goal-like loop, and how to stop.
- Skills/connectors: how skills and MCP/external connectors are discovered, loaded, permissioned, and audited.
- Safety: prompt injection boundaries, secrets, sandboxing, data access, and guardrails.
- Observability/evals: traces, metrics, test cases, and failure probes.
- Rollout: minimal viable harness first, then add autonomy only when measured results justify it.
- Legibility loop: source-of-truth artifacts, validation signals, feedback capture, and recurring cleanup.
Non-negotiable principles
- The model does not execute actions directly; the harness does.
- Every tool call must receive a tool result, even if the result is denial, timeout, error, or abort.
- Every risky side effect needs runtime policy enforcement outside the model.
- Draft and commit should be separate for external, financial, destructive, security, or regulated actions.
- Tool schemas must be narrow, typed, validated locally, and auditable.
- Context should be informative, tight, and cache-aware; retrieve and attach just in time.
- Skills and external connectors should use progressive di