Delivery Metrics Check
Assess delivery health using product-type-appropriate metrics. Check product_type from .claude/diamonds/active.yml to determine which assessment to run.
Product type routing (v0.11.0):
- software: Full DORA + APEX assessment (Parts 1-3 below)
- content_course, content_publication, content_media: Content Delivery Assessment (Part 4 below)
- ai_tool: AI Tool Assessment (Part 5 below) + DORA/APEX if code components exist
- service_offering: Service Delivery Assessment (Part 6 below)
Preflight: Read target canvas file(s) before any Write/Edit
Hard rule. Before issuing Write or Edit against any .claude/canvas/*.yml, use the Read tool on that file in this session. Claude Code's Read-before-Write check requires the Read tool specifically — cat/head/grep via Bash do NOT satisfy it.
Edit vs Write — different cost profiles (verified 2026-05-14):
Edit(exact-string replacement):Readwithlimit: 1satisfies the check at ~50 tokens. State-tracking is per-file, not per-byte — subsequentEditcalls work anywhere in the file. Use this for partial updates against large canvas files (e.g.,purpose.ymlat 800+ lines).Write(full replacement): do a full Read first. Write obliterates the file; you should see what you're about to replace. Thelimit:1shortcut is not appropriate here.
ID-bearing entries — scan the ID space before assigning (added 2026-05-15, v0.23.19): When adding a new component, opportunity, solution, or any other ID-bearing entry to a canvas file, run a Bash grep first to confirm the next ID in your prefix sequence is actually free:
grep "^ - id: <prefix>-" .claude/canvas/<file>.yml | sort -u
Replace <prefix> with the canvas's ID prefix (comp for landscape, opp for opportunities, sol for solutions, ht for human-tasks, etc.). Then pick the next free integer. validate_canvas.py has a duplicate-ID check (lines 230-239) that catches the failure on CI, but a duplicate can persist in the working tree for days if CI isn't run between edit and discovery — see roadmap-repo corrections.md 2026-05-15 "Duplicate canvas ID created in landscape.yml" for the worked example.
Original failure mode: anti-pattern #7 instance #5, 2026-05-09 — agent conflated Bash head with the Read tool, lost ~14k tokens to a Write-fail → remedial-full-Read → re-Write loop. The limit:1 discipline (graduated 2026-05-14, v0.23.18) prevents the second-order cost where the agent correctly follows the rule but full-Reads every time. The ID-scan discipline (graduated 2026-05-15, v0.23.19) prevents the related class where the agent reads enough of the file to satisfy the Edit check but not enough to see existing ID assignments — kin to anti-pattern #8 (Stale State Read).
If this skill writes to multiple canvas files, register each one first (limit:1 for Edit-only paths; full Read for Write paths) AND ID-scan any prefix you intend to assign.
See CLAUDE.md Canvas writes — Read before Write for the canonical rule.
Software Products
Assess delivery health using Forsgren's five DORA metrics AND LinearB's APEX AI-era metrics.
Part 1: DORA Metrics (Forsgren)
Gather current metrics from CI/CD, deployment logs, incident records.
Note: DORA expanded from 4 to 5 metrics. "MTTR" was renamed to "Failed Deployment Recovery Time" (FDRT) for precision — the original name was ambiguous with other mean-time-to-X metrics. "Reliability" was added as the 5th metric in the 2024 State of DevOps report.
Deployment Frequency: How often does code reach production?
- Elite: On-demand (multiple deploys/day)
- High: Between once/day and once/week
- Medium: Between once/week and once/month
- Low: Less than once/month
Lead Time for Changes: Commit to production time?
- Elite: Less than one hour
- High: Between one day and one week
- Medium: Between one week and one month
- Low: More than one month
Change Failure Rate: % of deployments causing failure?
- Elite: 0-15%
- High: 16-30%
- Medium: 31-45%
- Low: 46-100%
Failed Deployment Recovery Time (FDRT): Time to restore service after a failed deployment?
- Elite: Less than one hour
- High: Less than one day
- Medium: Between one day and one week
- Low: More than one week
Formerly "Mean Time to Recovery (MTTR)." Renamed for precision — FDRT measures recovery from failed deployments specifically, not all incidents.
Reliability: Does the software meet or exceed its reliability targets?
- Elite: Meets or exceeds targets
- High: Slightly below targets
- Medium: Moderately below targets
- Low: Significantly below targets
Added in DORA 2024. Measures operational reliability via SLOs/SLIs. Connects to SRE metrics in Part 3.
Part 2: APEX Metrics (LinearB)
"Faster coding doesn't mean faster delivery."
Assess the four APEX pillars to detect AI-era delivery problems:
A — AI Leverage
- What % of PRs/code changes are AI-generated or AI-assisted?
- What is the AI suggestion acceptance rate? (Benchmark: 32.7% for AI vs 84.4% for human — LinearB 2026)
- What is the AI rework rate? (% of AI code rewritten within 21 days)
- Is AI code quality comparable to human code? (Check corrections.md origin field)
P — Predictability
- Planning accuracy: % of planned work completed per cycle?
- Rework rate: % of ALL code rewritten within 21 days?
- Are delivery estimates getting more or less reliable with AI?
E — Flow Efficiency (The Shifting Bottleneck)
- End-to-end cycle time: is it actually decreasing?
- Review wait time: are PRs waiting longer before first review?
- AI review wait ratio: do AI PRs wait longer than human PRs? (Benchmark: 4.6x — LinearB 2026)
- KEY CHECK: Is coding faster but review/testing/deployment slower? If yes, the bottleneck has shifted. AI is generating code the pipeline can't absorb.
X — Developer Experience
- Developer satisfaction with AI tools (survey or conversation)
- Cognitive load: is AI helping or adding complexity?
- Burnout signals: unsustainable pace? Context-switching? Alert fatigue?
- Maps to BVSSH "Happier" dimension
Output
## DORA + APEX Assessment
### DORA Metrics
| Metric | Current | Level | Target | Gap |
|--------|---------|-------|--------|-----|
| Deploy freq | ... | ... | ... | ... |
| Lead time | ... | ... | ... | ... |
| Change fail rate | ... | ... | ... | ... |
| FDRT | ... | ... | ... | ... |
| Reliability | ... | ... | ... | ... |
### APEX Metrics (AI-Era)
| Pillar | Status | Key Signal |
|--------|--------|-----------|
| AI Leverage | ... | AI acceptance rate: ...% |
| Predictability | ... | Planning accuracy: ...%, Rework rate: ...% |
| Flow Efficiency | ... | Cycle time: ..., Review wait: ... |
| Developer Experience | ... | Satisfaction: ..., Burnout: ... |
### Shifting Bottleneck Check
[Is coding faster but review/deployment slower? Yes/No]
[If yes: where is the new bottleneck?]
### DORA Bottleneck
[The metric most constraining overall performance]
### Value Stream Diagnosis (if bottleneck detected)
If DORA shows a bottleneck, map the value stream to identify WHERE in the flow the constraint lives:
- Run `/mycelium:canvas-update` to update `.claude/canvas/value-stream.yml` with current stage timings
- Apply Theory of Constraints Five Focusing Steps (Goldratt): Identify -> Exploit -> Subordinate -> Elevate -> Repeat
- Look for wait times >> process times (a sign of queuing, not capacity, problems)
- Look for high handoff counts (each handoff adds delay and information loss)
- Calculate flow efficiency: process_time / lead_time -- target >25%
### Top 3 Improvements
1. [specific action]
2. [specific action]
3. [specific action]
Part 3: SRE Metrics (Error Budgets)
If SLIs/SLOs defined in .claude/canvas/dora-metrics.yml sre section:
- Review each service's SLI values against SLO targets
- Calculate error budget remaining: (SLO - actual) / (1 - SLO) * 100%
- Healthy (>50%): Ship feature