capacity-planner

Sizing tool for ops teams that handle queued work — Support, CX, Customer Success, BizOps, IT ops, Finance ops. Built on Erlang-C queueing theory, Little's Law, and the operational-leadership canon (Fournier, Larson, Cleveland, Reinertsen). Deterministic, stdlib-only, no LLM calls.

Purpose

You are an ops leader sized 15 → 35 with no idea how the 35-person org will actually behave at peak load. Or you are at 88% utilization and SLA is starting to slip. Or you have a hiring budget approved and need to sequence it across four quarters without burning out the existing team. This skill answers those questions with arithmetic, not vibes.

It produces three artifacts:

Capacity sizing at 70/80/90% utilization against P50/P90/P99 demand, with P(SLA breach) at each point and a SAFE/WATCH/AT_RISK/CRITICAL risk band.
Utilization health at the per-member traffic-light level plus a team verdict (HEALTHY/SQUEEZED/OVERLOADED/UNBALANCED).
12-month quarterly hiring plan accounting for ramp curves, attrition, QoQ demand growth, and span-of-control manager triggers.

When to use

Annual ops capacity planning (October-November for the following fiscal year).
Quarterly re-sizing if demand changed >15% or attrition spiked.
Pre-budget defense — the math that justifies the headcount ask to your CFO.
Diagnostic when an ops team is missing SLA and you need to know whether it's a sizing problem, a process problem, or a bottleneck problem.
M&A / new-segment launch modeling — sizing a new team or combined org.

Workflow

Intake demand. Pull P50/P90/P99 daily ticket/case volume from your work system (Zendesk, Intercom, JSM, ServiceNow, Salesforce). If you only have averages, stop and pull the distribution. Single- point demand estimates are the most expensive anti-pattern in ops.
Model throughput. Run capacity_modeler.py with your demand, AHT, SLA target, current FTE, and shrinkage. Use --profile for your function (support / cx / bizops / finance-ops / it-ops). Read the 80%-utilization row — that's your sizing point.
Flag utilization risk. Run utilization_analyzer.py against your current team's actual utilization data. Anyone >85% sustained is a throughput-collapse risk per Reinertsen. Spread >30 percentage points across team means UNBALANCED — fix that before hiring.
Sequence hiring. Run hiring_sequencer.py with current FTE, target EOY, ramp time, attrition, and growth. It will front-load hires (Q1 35%, Q4 15%), apply ramp curves, and trigger a manager hire when span of control crosses 7 ICs/manager.
Walk the Forcing-question library (see below). One question at a time. Do not skip ahead. Answers must be written down before you commit the plan.

Scripts

scripts/capacity_modeler.py — Erlang-C sizing with shrinkage adjustment and P50/P90/P99 breach probabilities. --profile for industry defaults.
scripts/utilization_analyzer.py — per-member traffic-light + team-level health verdict with variance detection.
scripts/hiring_sequencer.py — 12-month quarterly plan with ramp, attrition, growth, max-hires-per-quarter constraint, and manager-trigger logic.

All three accept --input <path> (JSON), --output {markdown,json}, --sample (built-in example), and --help. Stdlib only.

References

references/queueing_theory_canon.md — Erlang, Little, Hopp & Spearman, Reinertsen, Kingman, Cleveland, ITIL, Armony et al. (8 sources). The math.
references/ops_workforce_planning_canon.md — Fournier, Larson, Google SRE Workbook, Frei, Lawler, Bersin, Gartner, Grove (8 sources). The people factors.
references/capacity_anti_patterns.md — 11 named anti-patterns with cited sources, tool guards, and the meta-discipline that Lencioni + Goldratt + Christensen impose. (8+ named sources.)

Assets

assets/capacity_brief_template.md — 20-minute fill-out template with JSON skeletons for all three tools and an output checklist.

Assumptions

This skill assumes:

Work is queued (tickets, cases, work items) — not project-style. If your team's work isn't queued, this is the wrong skill.
Demand has a stationary-enough distribution within a quarter. Step-changes (new product launch, M&A, regulatory shift) require re-running mid-quarter.
You have at least 90 days of historical demand data to compute P50/P90/P99. If not, generate the distribution from your sales / user-base forecast first.
Service is single-class within a queue. If you have hard priority tiers (P1/P2/P3 with class-specific SLAs), model each as a separate queue and sum.
Channels are modeled coherently. Multi-channel teams use the appropriate --profile with built-in shrinkage premium.

Anti-patterns

See references/capacity_anti_patterns.md for the full taxonomy with sources. Top eight:

Plan-to-100%-utilization (Reinertsen Principle 12)
Treat-ramp-as-instant (Larson)
Ignore-attrition-in-12-month-plan (Bersin)
Hire-ICs-forever-with-no-manager-trigger (Fournier)
Size-to-P50-demand-only (Cleveland)
No-shrinkage-adjustment (Cleveland, SRE Workbook)
Single-channel-model-for-multi-channel-work (Gartner, Kingman)
No-surge-plan-for-P99-events (Hopp & Spearman, Reinertsen)

Distinct from

c-level-advisor/vpe-advisor measures engineering throughput via DORA 4 metrics, story points, deployment frequency, and cycle time bottlenecks. It is for engineering teams shipping code. This skill is for ops teams handling tickets/cases. Different unit of work, different math (Erlang-C vs. DORA), different bottleneck (queueing-blind staffing vs. WIP + lead time).
c-level-advisor/chro-advisor does strategic workforce planning (1-5 year capability portfolios, talent supply, leadership succession). This skill does operational 0-12 month capacity sizing against demand. Per Lawler: conflating them gets you hired into the wrong jobs.
project-management/* tracks delivery throughput on projects (Jira velocity, sprint capacity). This skill sizes around steady- state queued work.
Sibling process-mapper finds the bottleneck. This skill sizes the team around a known bottleneck. Order of operations: process-mapper first → capacity-planner second. Hiring around the wrong constraint wastes the hires.
business-growth/cs-coverage (if it exists) sizes Customer Success coverage by ARR/CSM ratio and segment. This skill sizes by queued work volume (tickets, cases, escalations). For a CS team that handles both relationship work AND a ticket queue, run both.

Forcing-question library (Matt Pocock grill discipline)

Discipline: walk these one at a time. Do not skip ahead. Answers must be written down. If you can't answer one, that is your next investigation.

Q1 — "What is your bottleneck, and have you confirmed it empirically?"

Recommended answer: a named, measured stage in the workflow with queue-time data showing where work waits. Not a vibe. Not "escalations take too long". An actual measured queue.

Why it's the first question: Goldratt (The Goal, 1984) — every system has exactly one binding constraint at a time. Sizing around the wrong constraint wastes hires entirely. If you do not know your bottleneck, run process-mapper BEFORE this skill.

Canon: Eli Goldratt, The Goal (1984); Reinertsen, Principles of Product Development Flow (2009).

Q2 — "What service trade-off are you accepting?"

Recommended answer: a written, explicit choice — fast vs. empathetic, broad vs. deep, low-cost vs. high-quality. Frances Frei is unambiguous: you cannot win all four. The team that tries wins zero.

Why it matters: AHT, SLA, and shrinkage inputs are the operational expression of this trade-off. If they don't agree (e.g., you set AHT for "empathy" b

capacity-planner

Cómo agregar

Pega en el README de tu repo

Skills relacionadas

webapp-testing

brand-guidelines

frontend-design

web-artifacts-builder

Recibe nuevas skills de Design e Frontend todos los lunes

capacity-planner

Purpose

When to use

Workflow

Scripts

References

Assets

Assumptions

Anti-patterns

Distinct from

Forcing-question library (Matt Pocock grill discipline)

Q1 — "What is your bottleneck, and have you confirmed it empirically?"

Q2 — "What service trade-off are you accepting?"

Comentarios · Sin comentarios