Skill Forge
Skill engineering methodology and publishing pipeline. Defines what "well-engineered skill" means, validates skills against that standard, and produces publishable GitHub repos.
Three-Dimension Mental Model
Skill engineering decisions split along three orthogonal dimensions. Keep them separate — mixing causes self-contradictory choices.
| Dimension | Question | Decides |
|---|---|---|
| A. Entry | Should this be its own skill? | New capability skill / new rule-skill / just a reference file |
| B. Dependency | How do skills relate to each other? | Runtime (setup.sh, cross-repo OK) vs Maintenance (must ship together) |
| C. Publishing | How to package for distribution? | Single Skill repo / Collection repo / In-repo |
Each reference below covers ONE dimension (mostly):
references/publishing-strategy.md— Dimension Creferences/rule-skill-pattern.md— Dimension A (for rule-skills specifically)references/skill-composition.md— Dimension Breferences/anti-graceful-skip.md— orthogonal quality check (applies to all dimensions)
When a decision seems to conflict, check which dimension you're reasoning about. A/B/C answers do not constrain each other.
Engagement Principles
These rules always apply. Read them before acting.
- Assess before acting — first step is always understanding the situation (scan, inventory, read)
- Report before modifying — show findings, get user approval, then act
- Security > Structure > Quality > Polish — when multiple issues exist, fix in this priority
- Default to local-ready — forge runs through validation and fixes until local-ready. User can stop at any point
- One skill at a time for changes — diagnose in batch, modify one by one with user confirmation
- Local-ready = publish-ready — publishing only sends to remote, never re-validates
- Understand context — a skill may belong to a tool, or relate to other skills. Don't treat each in isolation
- Follow module interfaces — when the procedure calls a reference file, read the file and follow its EP. The module's own EP is the authority, not any inline summary in the parent
- Report what you can't resolve — severity follows the check's own criteria, not assumed user preference. A finding explained by another explicit rule is resolved, not a discrepancy — dismiss it with the reason
- Triage before validate — read the project's directory semantics first. Never grade a workshop against gold-standard skill criteria. If the target is mixed engineering content, run Triage to extract the skill before any audit work. See
references/triage.md
Execution Procedure
Follow the pseudocode step by step. At STEP 2, write a plan file with per-item checklists — this IS your execution checklist. Re-read the plan before each item to stay on track.
Forge
Trigger: "review", "check", "audit", "audit this project", "audit all my skills", "clean up my skills", "create a skill", "forge a skill", "build a skill for X", "extract a skill from this folder", "turn my prototype into a skill", "publish this skill", "push this to GitHub", "put this on GitHub"
def forge(target):
# STEP 0: Environment
run("scripts/setup.sh") # exit non-zero → STOP
config = assess_config_needs() # references/skill-configuration.md
if not config: assess_and_guide(target) # references/onboarding.md
# STEP 0.5: Triage — what is the target, before validating it?
state, skill_path = triage(target) # references/triage.md
# state ∈ {skill_shaped, workshop, empty}
# workshop → triage ran HITL dialogue, extracted skill into a clean dir, returned new path
# empty → no skill artifacts; falls into Nothing Found branch in STEP 1
# skill_shaped → target unchanged
if state == "workshop": target = skill_path # re-target to extracted skill
# AI judgment, not hardcoded thresholds. Read folder names + file types first;
# do not open file bodies until intent is locked. See references/triage.md §Signals.
# STEP 1: Discover — paths and classification ONLY
classified = discover_and_classify(target) # references/project-audit.md
# Find: SKILL.md (any depth), rules files, project instructions, setup scripts
#
# BOUNDARY: Discovery reads file PATHS and FRONTMATTER (for classification).
# Discovery also reads project standards (CLAUDE.md, AGENTS.md) — shared context.
# Discovery does NOT read: SKILL.md body, reference file content.
# Discovery does NOT validate: quality, structure, reference integrity.
# Discovery does NOT check git log, git diff, or previous review reports.
# Every review is a FULL review — no incremental/delta mode, no "nothing changed
# since last review" shortcuts. Prior results do not reduce current scope.
# Content reading and validation happen in STEP 3, driven by the plan.
# If you finish STEP 1 having already validated content → you collapsed the loop.
if classified: # --- Existing items ---
else: # --- Nothing found ---
context = detect_existing() # scan skills dirs + conversation
if len(context) > 1: context = ask_user("Which existing skill?")
elif not context: ask_user("What does this skill do? When should it trigger?")
search_ecosystem(target) # npx skills find / skills.sh
# Workspace: standalone + design-heavy → full workspace with backstage
# Signals: public/publishable, 3+ expected references, multi-session,
# user mentioned design docs. Rule-skill/in-repo/prototype → skip.
if assess_workspace_need(name, context): # HITL — user confirms
Skill("repo-scaffold", f"scaffold {name}, git init but skip push")
path = f"{config.skill_workspace}/{name}-project/{name}/"
else:
path = f"{config.skill_workspace}/{name}/"
source_docs = detect_source_documents(context) # backstage, research, outputs
scaffold_skill_md(path, context) # follows references/skill-format.md standards
ep_contract = extract_ep_signatures(path) # function calls in SKILL.md EP
if source_docs:
transform_references(source_docs, ep_contract)
else:
write_references(path, ep_contract)
assert all_ep_calls_have_matching_defs(path) # GATE
readme = Skill("readme-craft", f"create {path}") # references/templates.md
assert readme.delivered # README required for local ready
write_artifacts(path) # LICENSE, .gitignore — skip if exist
items = [SkillItem(path)]
# From here: all items — new or existing — go through the same pipeline.
# Capability detection is NOT a separate step. Each reference module defines
# its own applicability criteria. The validation tables cover every reference.
# No explicit caps.X enumeration — all references are checked uniformly.
# STEP 2: Plan — GATE: file must exist AND be per-item structured before Step 3
plan_path = f"/tmp/skill-forge-{name}.md"
delete_if_exists(plan_path) # always fresh, no resume between runs
# Plan MUST be organized per-item, NOT per-check-type.
# Each discovered item gets its own top-level checklist entry with sub-steps.
# Step 3 iterates this plan item by item — no plan means no loop.
#
# Plan structure (every plan follows this, no exceptions):
#
# ## Steps
# - [ ] 1. Validate <item-path>
# - [ ] Security scan
# - [ ] Validate (all table rows)
# - [ ] 2. Validate <item-path>
# - [ ] ...
# ## Findings #