Diamond Assess Skill
Evaluate current diamond state and recommend next action.
Workflow
-
Cognitive Forcing (ALWAYS FIRST — before any analysis):
Before presenting any assessment, ask the human for their unprimed judgment:
"Before I run the gates — where do you think this diamond stands right now? What feels solid and what feels shaky?"
Wait for the human's response. Record it. Then proceed with the full assessment below. After presenting the assessment (step 10), compare:
"You said [X]. The gates say [Y]. Where do we differ?"
This prevents the agent's analysis from anchoring the human's judgment. The human's pre-assessment often catches things the gates miss (Hoskins consistently outperformed the agent on product judgment calls).
Source: Buçinca, Malaya & Gajos (Cognitive Forcing Functions, Harvard CHI/CSCW 2021) — forcing initial human judgment before AI output significantly reduces automation bias and over-reliance on incorrect AI recommendations.
-
Identify the diamond: Which diamond (ID, scale, phase) is being assessed?
-
Gather current state:
- Current phase (Discover/Define/Develop/Deliver)
- Evidence collected so far
- Confidence score with breakdown
- Blockers or risks
-
Check theory gates for next transition:
- Reference ${CLAUDE_PLUGIN_ROOT}/engine/theory-gates.md for the current transition
- Check
product_typefrom.claude/diamonds/active.yml-- gates conditioned on product_type include:- Security Gate: full OWASP for software/ai_tool; platform-only for content; infra-only for service
- Delivery Metrics Gate: routes to product-type-appropriate metrics canvas
- Service Quality Gate: Downe applies to consumption experience for all product types; Nielsen only for digital interfaces
- Evaluate each applicable gate: Pass / Fail / Insufficient Evidence / N/A (if gate doesn't apply to this product_type)
- Read-before-claim (HARD RULE; anti-pattern #7 instance #4, 2026-05-09): Before claiming a required-evidence bucket is missing or partial (e.g., "Wardley Map | Missing", "user_research_synthesis | Insufficient"), the agent MUST use the Read tool on the canvas file that bucket maps to (e.g.,
landscape.yml,user-needs.yml,opportunities.yml,gist.yml). Spawn-note text, theory-gates.md references, and prior conversation context do NOT count as evidence of the bucket's actual state — only reading the file does. Treating consistency between spawn-note phrasing and an absence-claim as causal evidence is anti-pattern #7 (Consistency-as-Evidence). The graduation case for instance #4 was the agent recommending "build the Wardley map now" when the map was substantially complete — the agent had not opened landscape.yml. - Document what is missing for failed gates, naming the specific canvas file read and the specific field that is empty/incomplete (not the inference that it should be empty).
-
Check confidence threshold:
- Reference ${CLAUDE_PLUGIN_ROOT}/engine/confidence-thresholds.yml for the current scale
- Apply
project_type_adaptationsto compute effective threshold (see ${CLAUDE_PLUGIN_ROOT}/engine/confidence-thresholds.yml) - Compare current confidence to the effective threshold
- Identify what would increase confidence
-
Check for anti-patterns:
- Reference ${CLAUDE_PLUGIN_ROOT}/harness/anti-patterns.md
- Flag any detected failure modes
- For L1/L2 diamonds: also check for system archetypes (Senge) — Fixes That Fail, Shifting the Burden, Limits to Growth, Eroding Goals
- At L3->L4 transitions: also run the Design Completeness Check (quality/CLAUDE.md) to verify all layers of the product design stack have evidence. Source: Mill, building on Garrett.
-
Check canvas health:
- Run the
/mycelium:canvas-healthchecks inline: missing required files, stale confidence, inconsistent evidence types - Report any critical or warning-level findings
- This catches silent canvas degradation before it affects progression decisions
- Run the
6b. Check metric snapshot freshness (v0.14; L0/L1/L2/L5 only):
- If the current diamond scale is L0, L1, L2, or L5 AND
.claude/jit-tooling/active-metrics.ymlexists:- For each
status: activesource, find the newest file in.claude/evals/metrics/<source>/. - If the newest snapshot is >7 days old (or missing entirely), flag as a warning and recommend
/mycelium:metrics-pull. - If
.claude/jit-tooling/active-metrics.ymlis missing, recommend/mycelium:metrics-detect(softer — info-level, not a gate).
- For each
- Rationale: evidence loops for Purpose/Strategy/Opportunity/Market depend on external signal freshness. A stale snapshot silently anchors confidence.
- Do NOT block progression on stale snapshots — this is a NUDGE, not a gate.
- Check corrections.md:
- Any relevant past mistakes to avoid?
7b. Check trio perspective coverage (Torres Product Trio):
- For the current diamond phase, verify all three perspectives (product/design/engineering) have been applied.
- Reference
${CLAUDE_PLUGIN_ROOT}/engine/theory-gates.md§Trio Perspective Requirement for the per-scale coverage matrix. - Flag any missing perspectives as a gap: "Design perspective not yet applied at L[X]. Consider running
/mycelium:usability-checkor/mycelium:service-check." - If perspectives are in conflict, recommend
${CLAUDE_PLUGIN_ROOT}/engine/perspective-resolution.md.
-
Coaching check (Rother's Coaching Kata): Surface these five questions in the output to prompt the human's thinking:
- What is the target condition for this diamond? (What does "done" look like?)
- What is the actual condition right now? (Summarize from steps 2-7 above)
- What obstacles are preventing progress? Which one are you addressing now?
- What is your next step? What do you expect will happen? (Force a prediction before acting)
- When can we check what we learned from that step? (Commit to a review point) The coach (human) should answer these, not the agent. The agent surfaces them. Source: Rother (Toyota Kata) — the 5 questions install scientific thinking as a daily habit.
-
Log assessment in .claude/harness/decision-log.md (MANDATORY):
- APPEND a
### Diamond Assessmententry to.claude/harness/decision-log.md - Include: diamond ID and scale, gates passed/failed, current confidence with rationale, evidence gaps
- This log entry is essential for auditability — every assessment should be documented
- APPEND a
-
Recommend next action:
- If all gates pass and confidence meets threshold: recommend transition to next phase
- If gates fail: recommend specific actions to address failures
- If confidence is low: recommend evidence-gathering activities
- If anti-patterns detected: recommend corrective actions
- If regression needed: recommend which phase to return to and why
-
Play devil's advocate: Before recommending progression, ask:
- What are we most likely wrong about?
- What evidence have we dismissed?
- Is there a simpler path we're overlooking?
-
Report harness thickness (informational):
- Count: total skills, active guardrails, mandatory reads, hooks, theory gates
- Current: 49 skills, 37 guardrails, 4 mandatory reads, 5 hook layers, 13 gates
- If thickness has increased since last assess, note it
- This is observability, not a gate — purely informational
- Source: Trivedy (Anatomy of an Agent Harness, LangChain blog — "scaffolding should decrease as models improve," but harnesses remain valuable as they engineer systems around model intelligence)
Output Format
ALWAYS output in plain language first, then technical details.
Use ${CLAUDE_PLUGIN_ROOT}/engine/status-translations.md for translations.
ALWAYS render the journey map first. Follow `${CLAUDE_PLUGIN_ROOT}/engine/wayfinding.