Framework Health Check

Mycelium evaluates its own process. This is triple-loop learning — the framework assessing whether it is getting better at producing good outcomes.

When to Use

Quarterly review (scheduled)
After 20 completed leaf cycles (triggered by cycle-history.yml count)
When process friction is suspected
Before major framework changes (baseline measurement)

Workflow

1. Load Cycle Data

Read .claude/canvas/cycle-history.yml.

Framework-self-host detection (per engine/cycle-learning.md#framework-on-framework-exemption): if the project root contains plugins/mycelium/plugin.json AND CLAUDE.md begins with # Mycelium:, this is the framework dogfooding itself. Skip the cycle-count gate and route to a corrections-graduation summary:

Count entries in .claude/memory/corrections.md (total, and ×graduated-to-mechanism in the last 90 days).
Read .claude/memory/cluster-instances.md and list clusters at-or-above their graduation criterion that are not yet graduated (this is the framework analogue of "actual outcome vs predicted ICE").
Skip cycle-derived dimensions (velocity, discard rate, confidence calibration, regression rate) — they do not apply. Still run Steps 2b, 4b, 4c, 4d.

Otherwise (product project, not framework-self-host): if fewer than 5 cycles recorded, report: "Insufficient cycle data for framework health assessment. [N] cycles recorded; minimum 5 needed. Continue recording outcomes."

2. Measure Five Dimensions

For each dimension, compute the metric and compare against trend (if prior assessments exist):

Cycle Velocity:

Average days from diamond creation to completion, grouped by scale
Trend: improving / stable / degrading
If degrading: flag for investigation

Discard Rate:

Count of discards per lifecycle phase
Average discard phase (1-10 scale)
Trend: shifting earlier (good) / shifting later (bad) / stable
If >50% of discards at Phase 7+: flag "late discard pattern"

Confidence Calibration:

For all cycles with predicted confidence and actual outcome:
- Compute: actual success rate per confidence band (0.3-0.5, 0.5-0.7, 0.7-0.9)
- Compare with expected rate (confidence 0.7 should succeed ~70%)
- Report calibration factor: actual/expected
If calibration factor < 0.8 or > 1.2: flag miscalibration

Gate Effectiveness:

For each theory gate, count: times checked, times passed, times failed
Compute hit rate: failures / total checks
Flag rubber stamps (0% failure rate) and hard blocks (>80% failure rate)
Theory X/Y audit (per ${CLAUDE_PLUGIN_ROOT}/harness/theory-tensions.md Tension 7): for any hard-block gate, check it is scaffolding (surfaces its why, an escape hatch exists, leaves the user more capable), not coercion (compliance for its own sake, no surfaced reason, no escape). A high-block gate that fails this audit is a Theory-X drift to remediate, not just a strict gate.

Regression Rate:

Count diamonds that regressed at least once / total diamonds
Trend: decreasing (good) / increasing (bad) / stable

2b. Re-run Deferred Design-Verification Eval Scenarios

Re-run any eval scenario tagged regression AND router-discipline from .claude/evals/scenarios/integration/. These are deferred design-time decisions that need periodic re-verification (the AGENTS.md router design is the canonical case — see agents-md-router-discipline.yml).

For each scenario:

Run via /mycelium:eval-runner against the scenario file
Compare result against the scenario's baseline_reference field
Report:
- Same outcome → design holding; no action
- Improved → either the design got better OR the model improved; investigate which (a model improvement that hides a design regression is a Goodhart trap)
- Regressed → design drifted; flag for remediation in this assessment

If a scenario fails its success_criteria for the first time, log to corrections.md as a new generalizable correction with the scenario name as evidence. Do not auto-remediate — surface the regression for human review.

3. Run Threshold Calibration

If cycle count ≥ minimum_n for any threshold in .claude/canvas/thresholds.yml:

Apply calibration rules from ${CLAUDE_PLUGIN_ROOT}/engine/adaptive-thresholds.md
Update calibrated values
Log changes in .claude/harness/decision-log.md

4. Check Goodhart Counter-Metrics

For each dimension, verify the counter-metric is not degrading:

Velocity improving BUT outcome quality declining? Flag.
Earlier discards BUT false positive rate rising? Flag.
Better calibration BUT decision speed dropping? Flag.

4b. Cluster Graduation-Readiness (added 2026-05-08)

Read .claude/memory/cluster-instances.md. For each cluster:

Compare instance count to graduation criterion. If a cluster has reached or exceeded its stated criterion without being graduated to the corresponding mechanism (e.g., 6+ instances with spec-only status when promotion bar requires implemented detection rules), surface as a graduation-readiness flag.
For spec-status clusters with linked spec docs (e.g., ${CLAUDE_PLUGIN_ROOT}/engine/consistency-check-spec.md): check whether the spec's promotion-bar conditions have been met. Concretely: count detection rules drafted vs. required, FP-rate measurements available vs. needed.
Recursive check: if a cluster's stated graduation criterion has been met for >30 days without graduation action, that's itself an instance of the documented-rule-diverges-from-enforcement cluster — log it.
Output: include cluster status in the dashboard under a new "Cluster Graduation Status" section.

This step closes the recursion the cluster log was created to address: graduation criteria become mechanically auditable rather than promises stored in commit messages.

4c. Receipts Highlights Rotation Cadence (added 2026-05-08)

The README's "How Mycelium got smarter" section shows 5 case headers; the full list lives in docs/receipts/cases/. Stale README highlights are a Goodhart signal: if the receipts surface freezes, the framework's "we get smarter with each cycle" claim degrades to "we got smarter once".

For each case currently on the README:

Check git-log staleness: when did the case header last change? If >90 days, flag as a rotation candidate.
Check for newer cases: are there cases under docs/receipts/cases/ newer than the rotation candidate that better demonstrate the framework's recent behavior?
Recommend rotation: surface specific rotate-out / rotate-in pairs in the dashboard. Rotation is a maintainer decision, not automatic — but the flag forces the decision rather than letting it drift.
Highlight gap signal: if no case has been added to docs/receipts/cases/ in >60 days, flag as a possible-low-friction signal — either the framework genuinely caused no recent friction (rare), or the dogfood loop has weakened (usually).

Per docs/contributing/style.md#highlights-rotation. Cases stay in docs/receipts/cases/ even when rotated off README; only the README mention rotates.

4d. Docs Health Cross-Surface (added 2026-05-08)

Run a lightweight version of /mycelium:canvas-health step 9b on docs/:

Stub freshness (any forthcoming-doc Last updated >60 days)
Length budget compliance (hard caps)
Marketing-voice scan
Information-scent scan on links

Surface in the dashboard. Full details delegate to /mycelium:canvas-health.

4e. Chat-UX Axiom Audit of Skill Output Templates (added 2026-05-30)

The chat-UX nudges in ${CLAUDE_PLUGIN_ROOT}/harness/design-principles.md ("the chat is a UI") shape live output, which has no stored corpus to audit retroactively. What is auditable is the static surface that pre-shapes live output: the ## Output/## Output Format blocks in ${CLAUDE_PLUGIN_ROOT}/skills/*/SKILL.md. Scan each for two axiom violations:

Hick's Law — an output template that instructs the agent to present a *li

framework-health

How to add

Drop this on your repo README

Related skills

internal-comms

babysit

do

smart-explore

Get new DevOps e Infra skills every Monday