MCP Architecture
Vanilla MCP servers exposing tools, resources, and prompts to LLM clients over JSON-RPC 2.0. Spec target: 2025-11-25 (current stable; the 2026-07-28 release candidate is locked but not yet final — see §12). Server-focused; brief client section in RECIPES §10. Pinned deps in STACK.md.
Pairs with:
- security-reviewer — MCP servers expand the agent's blast radius; treat every tool call as untrusted input.
- rest-api-architect and grpc-architect — many MCP servers wrap an existing API; reuse those error and pagination conventions.
1. When to pick MCP (and when not to)
MCP exists to let an LLM client (Claude Desktop, Cursor, VS Code, ChatGPT, an agent) plug into your capabilities without per-client glue code. Reach for it when:
- The same capability is consumed by multiple LLM clients and you don't want N integrations.
- You need an LLM to call your tools, read your resources, or render your prompt templates inside a chat.
- The agent needs to dynamically discover what your service can do — REST/gRPC contracts are static; MCP
tools/listis dynamic per session.
Don't use MCP when:
- The caller is server-to-server backend code — use gRPC or REST. MCP is for LLM-mediated calls.
- You need cache semantics, public API discoverability, or
curl-first ergonomics — REST wins. - The work is a stable batch pipeline — MCP's interactivity overhead is wasted.
A common pattern: keep your REST/gRPC backend as the system of record; build a thin MCP server that wraps a curated subset of operations safe for an LLM to invoke.
2. Server primitives — tools, resources, prompts
| Primitive | Who controls invocation | Use for | Selector |
|---|---|---|---|
| Tool | Model decides | Side-effecting or compute actions (create_issue, search_db, run_query) | The model picks based on description + JSON schema |
| Resource | App (or user) decides | Read-only data injected into context (file contents, schemas, dashboards) | URI; supports templates and subscriptions |
| Prompt | User decides (slash-command UI) | Reusable templates the user invokes intentionally | Name + arguments |
The decision tree:
- If the LLM should autonomously call it → tool.
- If it's data the LLM should read (not act on) → resource.
- If the user picks it from a menu to start a workflow → prompt.
Wrong-primitive is the #1 design mistake. A read_file tool that the model calls 40 times per turn should probably be a resource template the client subscribes to. Inversely, a dangerous_delete resource is a category error — resources are read-only.
3. Tool design
Tools are the primary surface and the primary risk. Treat them like API endpoints, not RPC methods.
- One verb, narrow scope.
create_pull_requestnotgithub_action. Models pick better tools when names are specific and descriptions are short. - JSON Schema required. Every parameter typed, with descriptions. Omit no field. The model reads the schema to decide arguments — sloppy schemas produce sloppy calls.
- Description is the contract. It's what the model reads to decide whether to call. Lead with the action; end with one line on side effects and any required confirmations. <300 tokens.
- Two output channels — populate both when you declare
outputSchema. See §3a below. - Tool annotations are hints, not guarantees (see §4). They drive client UX (confirm vs auto-approve) but never enforce policy server-side. Validate on the server regardless of what the client claims.
- Return errors via
isError: truein the tool result, not as JSON-RPC errors. JSON-RPC errors mean protocol failures; tool-level failures (bad input, downstream API said 404) belong inside the result so the model can read and adapt. See §10.
3a. Tool output — unstructured content[] vs structuredContent
Every tool result carries content[] (always — the "unstructured" channel the model reads as text/media). Tools that declare outputSchema ALSO carry structuredContent (the typed channel the client app parses programmatically). The two are not alternatives; they coexist.
Unstructured — content[] is an ordered list of content blocks. The model reads these directly into its context. Block types:
| Type | Use for | Notes |
|---|---|---|
text | The default — prose, JSON dumps, tables | Always safe; every client renders it |
image | Inline images (data base64 + mimeType) | For diagrams, screenshots, chart renders; not all clients display |
audio | Inline audio clips (since 2025-03-26) | Rare; transcribe to text for broader client support |
resource_link | Pointer to a resource by URI (no body) | Client decides whether to follow and fetch via resources/read. Cheaper than embedding |
embedded_resource | Full resource contents inlined (uri + mimeType + text/blob) | When the model needs the content right now without a round trip |
- Default to one
textblock. Reach for the others only when a specific client capability earns its place. - Mixed blocks are fine. A search tool might return a one-line summary as
textplus Nresource_linkblocks for the hits. - Don't put secrets or stack traces in
text— the model treats it as readable context and may echo it back to the user.
Structured — structuredContent is a single JSON object validated against the tool's outputSchema (added in spec 2025-06-18). Use it when:
- The client application needs to parse the result (build a UI panel, chart, agent step).
- The model also benefits from a clean JSON view it can reason about.
When you declare outputSchema, the server MUST populate structuredContent AND SHOULD also emit a JSON-stringified copy as a text block in content[] — clients that don't yet render structured output (most chat UIs in 2026) need that fallback to show anything at all. Skeleton in RECIPES §3.
Don't declare outputSchema for free-form prose tools. A summarize_text tool returning a paragraph has no structure worth schematizing; one text block is the right answer.
4. Tool annotations and safety hints
Annotations declare behavioral properties so clients can gate confirmations and parallelism. None of these enforce anything — they are hints to the client UI.
| Annotation | Meaning | When true |
|---|---|---|
readOnlyHint | Does not modify any environment | Queries, lookups, status checks |
destructiveHint | May overwrite/delete (only meaningful when readOnlyHint=false) | Delete, force-push, revoke, drop |
idempotentHint | Repeated identical calls have same effect as one | PUT-style upserts; safe to retry |
openWorldHint | Touches external/unbounded entities | Web fetch, third-party API |
Defaults if you omit: assume the most dangerous (not readonly, destructive, not idempotent, openWorld). Set them explicitly.
- Always set
title— it's what the user sees in the client UI when prompted to approve a call. Thenameis for the model; thetitleis for the human. destructiveHintis broader than "deletes data" — overwriting a file, revoking a token, closing an issue, sending an email are all destructive. Err towardtrue.- Clients gate confirmations on these. Auto-approval policies (Claude Desktop's allowlist, Cursor's permissions) read annotations. Mislabeling a destructive tool as readonly turns user trust into a bug.
5. Resource design
Resources are read-only data accessed via URI. Use them for content the model should be able t