Framework Health Check
Mycelium evaluates its own process. This is triple-loop learning — the framework assessing whether it is getting better at producing good outcomes.
When to Use
- Quarterly review (scheduled)
- After 20 completed leaf cycles (triggered by cycle-history.yml count)
- When process friction is suspected
- Before major framework changes (baseline measurement)
Workflow
1. Load Cycle Data
Read .claude/canvas/cycle-history.yml.
Framework-self-host detection (per engine/cycle-learning.md#framework-on-framework-exemption): if the project root contains plugins/mycelium/plugin.json AND CLAUDE.md begins with # Mycelium:, this is the framework dogfooding itself. Skip the cycle-count gate and route to a corrections-graduation summary:
- Count entries in
.claude/memory/corrections.md(total, and ×graduated-to-mechanism in the last 90 days). - Read
.claude/memory/cluster-instances.mdand list clusters at-or-above their graduation criterion that are not yet graduated (this is the framework analogue of "actual outcome vs predicted ICE"). - Skip cycle-derived dimensions (velocity, discard rate, confidence calibration, regression rate) — they do not apply. Still run Steps 2b, 4b, 4c, 4d.
Otherwise (product project, not framework-self-host): if fewer than 5 cycles recorded, report: "Insufficient cycle data for framework health assessment. [N] cycles recorded; minimum 5 needed. Continue recording outcomes."
2. Measure Five Dimensions
For each dimension, compute the metric and compare against trend (if prior assessments exist):
Cycle Velocity:
- Average days from diamond creation to completion, grouped by scale
- Trend: improving / stable / degrading
- If degrading: flag for investigation
Discard Rate:
- Count of discards per lifecycle phase
- Average discard phase (1-10 scale)
- Trend: shifting earlier (good) / shifting later (bad) / stable
- If >50% of discards at Phase 7+: flag "late discard pattern"
Confidence Calibration:
- For all cycles with predicted confidence and actual outcome:
- Compute: actual success rate per confidence band (0.3-0.5, 0.5-0.7, 0.7-0.9)
- Compare with expected rate (confidence 0.7 should succeed ~70%)
- Report calibration factor: actual/expected
- If calibration factor < 0.8 or > 1.2: flag miscalibration
Gate Effectiveness:
- For each theory gate, count: times checked, times passed, times failed
- Compute hit rate: failures / total checks
- Flag rubber stamps (0% failure rate) and hard blocks (>80% failure rate)
- Theory X/Y audit (per
${CLAUDE_PLUGIN_ROOT}/harness/theory-tensions.mdTension 7): for any hard-block gate, check it is scaffolding (surfaces its why, an escape hatch exists, leaves the user more capable), not coercion (compliance for its own sake, no surfaced reason, no escape). A high-block gate that fails this audit is a Theory-X drift to remediate, not just a strict gate.
Regression Rate:
- Count diamonds that regressed at least once / total diamonds
- Trend: decreasing (good) / increasing (bad) / stable
2b. Re-run Deferred Design-Verification Eval Scenarios
Re-run any eval scenario tagged regression AND router-discipline from .claude/evals/scenarios/integration/. These are deferred design-time decisions that need periodic re-verification (the AGENTS.md router design is the canonical case — see agents-md-router-discipline.yml).
For each scenario:
- Run via
/mycelium:eval-runneragainst the scenario file - Compare result against the scenario's
baseline_referencefield - Report:
- Same outcome → design holding; no action
- Improved → either the design got better OR the model improved; investigate which (a model improvement that hides a design regression is a Goodhart trap)
- Regressed → design drifted; flag for remediation in this assessment
If a scenario fails its success_criteria for the first time, log to corrections.md as a new generalizable correction with the scenario name as evidence. Do not auto-remediate — surface the regression for human review.
3. Run Threshold Calibration
If cycle count ≥ minimum_n for any threshold in .claude/canvas/thresholds.yml:
- Apply calibration rules from
${CLAUDE_PLUGIN_ROOT}/engine/adaptive-thresholds.md - Update calibrated values
- Log changes in .claude/harness/decision-log.md
4. Check Goodhart Counter-Metrics
For each dimension, verify the counter-metric is not degrading:
- Velocity improving BUT outcome quality declining? Flag.
- Earlier discards BUT false positive rate rising? Flag.
- Better calibration BUT decision speed dropping? Flag.
4b. Cluster Graduation-Readiness (added 2026-05-08)
Read .claude/memory/cluster-instances.md. For each cluster:
- Compare instance count to graduation criterion. If a cluster has reached or exceeded its stated criterion without being graduated to the corresponding mechanism (e.g., 6+ instances with spec-only status when promotion bar requires implemented detection rules), surface as a graduation-readiness flag.
- For
spec-status clusters with linked spec docs (e.g.,${CLAUDE_PLUGIN_ROOT}/engine/consistency-check-spec.md): check whether the spec's promotion-bar conditions have been met. Concretely: count detection rules drafted vs. required, FP-rate measurements available vs. needed. - Recursive check: if a cluster's stated graduation criterion has been met for >30 days without graduation action, that's itself an instance of the documented-rule-diverges-from-enforcement cluster — log it.
- Output: include cluster status in the dashboard under a new "Cluster Graduation Status" section.
This step closes the recursion the cluster log was created to address: graduation criteria become mechanically auditable rather than promises stored in commit messages.
4c. Receipts Highlights Rotation Cadence (added 2026-05-08)
The README's "How Mycelium got smarter" section shows 5 case headers; the full list lives in docs/receipts/cases/. Stale README highlights are a Goodhart signal: if the receipts surface freezes, the framework's "we get smarter with each cycle" claim degrades to "we got smarter once".
For each case currently on the README:
- Check git-log staleness: when did the case header last change? If >90 days, flag as a rotation candidate.
- Check for newer cases: are there cases under
docs/receipts/cases/newer than the rotation candidate that better demonstrate the framework's recent behavior? - Recommend rotation: surface specific rotate-out / rotate-in pairs in the dashboard. Rotation is a maintainer decision, not automatic — but the flag forces the decision rather than letting it drift.
- Highlight gap signal: if no case has been added to
docs/receipts/cases/in >60 days, flag as a possible-low-friction signal — either the framework genuinely caused no recent friction (rare), or the dogfood loop has weakened (usually).
Per docs/contributing/style.md#highlights-rotation. Cases stay in docs/receipts/cases/ even when rotated off README; only the README mention rotates.
4d. Docs Health Cross-Surface (added 2026-05-08)
Run a lightweight version of /mycelium:canvas-health step 9b on docs/:
- Stub freshness (any forthcoming-doc
Last updated>60 days) - Length budget compliance (hard caps)
- Marketing-voice scan
- Information-scent scan on links
Surface in the dashboard. Full details delegate to /mycelium:canvas-health.
4e. Chat-UX Axiom Audit of Skill Output Templates (added 2026-05-30)
The chat-UX nudges in ${CLAUDE_PLUGIN_ROOT}/harness/design-principles.md ("the chat is a UI") shape live output, which has no stored corpus to audit retroactively. What is auditable is the static surface that pre-shapes live output: the ## Output/## Output Format blocks in ${CLAUDE_PLUGIN_ROOT}/skills/*/SKILL.md. Scan each for two axiom violations:
- Hick's Law — an output template that instructs the agent to present a *li