Context Budget Analysis
Calibration: Tier 3, Opus-primary. See repository README for model compatibility.
Analyze Claude Project context budget health. Determine operating mode (full-context vs. retrieval). Produce tiered file placement recommendations with a strategy matched to the project's growth trajectory, work-phase timing, and content routing needs.
Important
Token estimates are inherently approximate. Always label token counts as "estimated" and note a ±15% margin. Never state a precise token count as fact. Use ls -la /mnt/project/ for byte counts and divide by 4 for rough token estimates.
Evaluate files objectively. Users may resist demoting files they authored or consider important. Score every file on the six dimensions regardless of stated preferences. If a file scores as Tier 3, say so clearly.
Do not assume full-context is always the goal. With a ~66,500 token ceiling, many legitimate projects operate in retrieval mode. Assess the user's workload pattern before prescribing optimization strategy.
Skills and MCPs cannot trigger RAG. Do not recommend reducing Skills or disconnecting MCPs to recover knowledge file headroom. The two budget pools are independent for threshold purposes.
Context quality degrades with total context size, not just at limits. Even with tokens remaining, every token of overhead reduces the model's attention budget. Minimizing overhead improves output quality on every turn.
Compression has a quality floor. Not all content is equally compressible. Classify content type before recommending compression depth. See references/compression-execution.md for the Content-Type Classification (Type A prose vs. Type B precision reference) and structured off-ramps when compression stalls.
Optimize for where the project is going, not just where it is. A plan that restores full-context mode for one session before the next file addition pushes it back into RAG is a cleanup, not a strategy. Always assess growth trajectory before producing an optimization plan.
Model requirements
This Skill performs per-file evaluation against the six File Evaluation Dimensions, growth trajectory analysis, content routing decisions, and phased optimization planning with compression safeguards. Opus is recommended, with effort set to high or xhigh when the deployment context allows it. On Opus at default Adaptive effort, per-file evaluation and compression quality judgment may compress — set effort higher for intelligence-sensitive audits.
On non-Opus models (Sonnet 4.6, Haiku 4.5 with extended thinking enabled), expect compressed per-file evaluation, surface-level tier recommendations, and reduced synthesis across the growth trajectory. Quick Diagnostic mode degrades less than Full Budget Audit mode. The Skill will execute and produce correctly-shaped output; users should weight findings accordingly. Haiku without extended thinking is not a supported deployment target for this Skill.
Core Concepts
Operating Modes
Claude Projects operate in one of two modes:
- Full-context mode: All knowledge files loaded into context every turn. Default when total knowledge file tokens are under the threshold.
- Retrieval mode (RAG): Knowledge files searched via
project_knowledge_search; only relevant chunks loaded per turn. Activates when knowledge files exceed the token threshold.
Detection: Check whether project_knowledge_search is present in available tools. If present, retrieval mode is active. The "Indexing" label in the project UI files panel is also a visible indicator.
The Two-Pool Budget Architecture
| Budget Pool | Allocation (200K window) | Contents | RAG Impact |
|---|---|---|---|
| Knowledge file budget | ~66,500 tokens (~33%) | Knowledge files only (all types: .md, .pdf, .docx, .xlsx, .csv, .txt, images, GitHub repos) | Exceeding triggers RAG |
| Conversation budget | ~133,500 tokens (~67%) | Platform overhead, Skills, MCPs, CI, Memory, preferences, conversation, responses | Cannot trigger RAG |
Key implication: Adding Skills, connecting MCPs, or expanding Custom Instructions cannot trigger RAG mode. Conversely, reducing them cannot recover knowledge file headroom. The two pools are independent.
Token Estimation
Estimate knowledge file tokens: run ls -la /mnt/project/ for byte sizes, divide each by 4, sum. Compare against the ~66,500 token threshold.
Conversion rates: English prose ≈ 4 characters/token. Structured content (XML, code, tables) ≈ 3.25 characters/token. Mixed markdown ≈ 3.75 characters/token. Estimates are ±15% per file, ±10% for project totals.
Important: Include ALL knowledge file sources: uploaded files (all types), GitHub-connected repositories, and any other connected content sources.
Threshold-Exempt Overhead
Skills, MCPs, and other non-knowledge-file components are threshold-exempt. They affect conversation runway and attention quality but cannot trigger RAG. Key overhead sources: platform system prompt (~20–25K tokens), Custom Instructions (bytes ÷ 4), Memory (typically 500–3K tokens).
MCP loading modes: MCP overhead varies dramatically by loading mode. Deferred/load-as-needed (the claude.ai default) adds ~40–60 tokens per connector via a lightweight catalog plus ~5–7K flat for the deferred infrastructure — ~85% less than always-loaded mode (~3K–15K per connector). When the loading mode is unknown, note the estimate as a range and flag the uncertainty. See references/compression-execution.md for detailed overhead estimation.
Estimated conversation runway (assumes deferred MCP loading): lean project ~105K–110K tokens, moderate ~95K–105K, typical (6–8 MCPs, 15+ skills) ~85K–100K, heavy (always-loaded MCPs) ~50K–80K.
Context Window Sizes by Plan
| Plan | Context Window | Knowledge File Ceiling |
|---|---|---|
| Pro / Max / Team | 200K tokens | ~66,500 tokens (empirically measured) |
| Enterprise | 500K tokens | ~165,000 tokens (estimated at 33%, untested) |
| API | 1M tokens (GA for Opus 4.6, Sonnet 4.6) | User-controlled (no platform RAG) |
Operating Tiers
| Tier | Knowledge File Tokens | Status |
|---|---|---|
| 1 | Under ~30K | Comfortable headroom. Focus on structural quality. |
| 2 | ~30K–50K | Moderate. Optimization beneficial but not urgent. Monitor growth. |
| 3 | ~50K–66K | Approaching threshold. Proactive optimization recommended. |
| 4 | ~66K–500K | Retrieval mode. Optimize for recovery (if near) or retrieval quality. |
| 5 | Over ~500K | Heavy retrieval. Optimize for retrieval quality. Surface API deployment for cross-document synthesis workloads. |
File count is not a factor in RAG activation. Optimize file count for content organization and retrieval quality.
Context Pressure vs. RAG Switching
Two independent mechanisms. RAG switching is project-level and static (knowledge files exceed threshold). Context pressure is conversation-level and dynamic (conversation approaches 200K limit). A project can experience both, either, or neither.
Diagnostic shortcuts: "Claude forgot my instructions" → context pressure OR RAG (behavioral rules not retrieved). "Claude can't find content in my files" → RAG (content not retrieved) OR context pressure (compacted). "Responses getting shorter/weaker" → context pressure.
In full-context mode, compaction summarizes conversation history but leaves knowledge files intact. In retrieval mode, knowledge files load fresh each turn via search (not subject to compaction) but conversation history still compacts.
Placement Tiers
Files are classified into three tiers based on six evaluation dimensions (see references/evaluation-rubric.md for detailed scoring criteria):
- Tier 1 — Must Keep: High cross-reference density, behavioral content, high query frequency, severe degradation if absent.
- Tier 2 — Optimize or Relocate: Moderate scores. Candidates for compression, relocation, or restructuring.
- **Tier 3 —