Project Context
- git installed: !
git --version 2>/dev/null - current branch: !
git branch --show-current 2>/dev/null - CLAUDE.md: !
find . -maxdepth 1 -name "CLAUDE.md" -type f - project-discovery.md: !
find . -maxdepth 3 -name "project-discovery.md" -type f
Constraints (read before anything else)
This skill writes production and test code in your working tree. It is an execution skill, not a document generator. These constraints shape every step and override any instinct to move faster.
- The observed-failure gate is load-bearing. No production-code change until a test has been run and observed to fail for the intended reason in this loop. A test that passes on first run is a stop-and-diagnose signal, not progress. This single rule is what separates real TDD from TDD-flavored code. The verbatim Three Laws and Canon TDD steps it derives from are in references/tdd-loop.md; pull that reference when a step needs the canon or the implementation gears.
- Two hats. Never refactor while any test is red. See references/tdd-loop.md for the canonical statement.
- One behavior at a time. Exactly one test list item becomes one runnable test per loop. Newly discovered scenarios are written to the list and deferred, never implemented in the current loop.
- BDD framing. Tests describe observable behavior, named in the project's existing test-naming convention, asserting outcomes through the public interface — never private state. The behavior-naming and Given/When/Then protocol is in references/bdd-framing.md; pull it when Step 2 needs it.
- You will be tempted to fake this. The specific ways an agent fakes TDD, and the discipline that catches each, are in references/failure-modes.md; pull it when a loop feels off (a test passes on first run, no red is shown, the implementation has outrun the test, refactor is being skipped).
- YAGNI governs the refactor step and the test list. Apply the rule in ../../references/yagni-rule.md: remove duplication, but do not add abstractions, configuration, or indirection without evidence. Speculative structure added "for flexibility" during refactor is a YAGNI candidate. Speculative scenarios on the test list are deferred with a reopen trigger, never silently added.
Test-Driven Development
Step 1: Resolve Project Config and Confirm Scope
Resolve commands. Read CLAUDE.md's ## Project Discovery section for the
test command (under ### Commands and Tests, not ### Frameworks and Tooling), the lint command, the build command, language, and framework. If
absent, fall back to project-discovery.md. If still absent, run
${CLAUDE_SKILL_DIR}/scripts/detect-tdd-context.sh and parse its output for
git state and manifest-inferred commands. Store the resolved test, lint, and
build commands for use in every later step.
Resolve standards and decisions. Resolve the coding-standards directory and
ADR directory the same way: read CLAUDE.md's ## Project Discovery section;
fall back to project-discovery.md; fall back to Glob defaults (docs/,
docs/adr/, docs/coding-standards/, docs/decisions/). Also check
CLAUDE.md and AGENTS.md for inline standards. Read the standards and
ADRs whose titles, paths, or one-line summaries indicate they govern the area
being built. Cap at five documents; if more than five look relevant, list them
and read only the five with the strongest apparent relevance — defer the rest
until refactor surfaces a need. These govern the green and refactor steps.
If none exist, state that plainly and plan to infer conventions from the
surrounding code instead.
Report scope, then proceed (no gate). This skill runs autonomously after
the initial request: it does not stop for confirmation. State to the user, in a
few lines: the behavior or feature to be built, the resolved test/lint/build
commands, the standards and ADRs found (or that none were), the current branch,
and that the skill will now write code in a red-green-refactor loop. If
current branch from Project Context is the repository's default branch
(main or master), recommend working on a branch, but do not wait for an
answer. This is a report the user reads while the work runs, not a gate.
Continue immediately to Step 2 without waiting for a response.
The one exception. If the initial request or the provided context explicitly states the human wants to review, verify, or approve the plan or test list before implementation, then this becomes a gate: build the test list in Step 2, present it together with this scope report, and wait for approval before starting the Step 3 loop. Absent an explicit request like that, the skill runs to completion without further human input.
The one input that can still block is a missing test command: if it could not
be resolved from CLAUDE.md, project-discovery.md, the discovery script, or
manifest inference, ask the user for it, because TDD is impossible without a
way to run tests. Exhaust inference before asking; this is a hard dependency,
not a discretionary checkpoint.
Step 2: Build the BDD Test List
Turn the requested feature or behavior into a test list (Kent Beck's "test list" pattern). Each item is one observable behavior, phrased as a behavior sentence, not as an implementation note. "Returns the unrounded fee for a sub-dollar charge" is a list item; "use a BigDecimal" is not. Follow references/bdd-framing.md for how to phrase and name behaviors, and which test-naming convention to adopt (the project's existing convention and any discovered coding standard win over a literal "should" default).
Order the list outside-in by user value: the next item is the most important thing the system does not yet do. For an item that is user-observable behavior at a system boundary, write the outer acceptance test for it first (it will be red until its inner behaviors exist) and record it as the outer loop for that item. For internal or utility behavior with no meaningful system boundary, the outer acceptance test is optional; the inner loop alone is correct.
Apply YAGNI to the list itself. A scenario earns a place only with evidence it is needed now (a user-described need, a named dependency, an existing code path that breaks, a regulation, a real incident). Scenarios that fail the evidence test go to a deferred list with the trigger that would reopen them. Do not pad the list for symmetry or completeness.
Report the test list to the user. Unless the verify-plan exception from Step 1 applies, continue to Step 3 immediately without waiting for approval. When that exception applies, present the test list together with the Step 1 scope report and wait for approval before entering the loop.
Step 3: The Red-Green-Refactor Loop
Pick exactly one item from the list. Choose one that teaches you something and that you are confident you can implement in one cycle (Beck's "one step test"). Then run these three phases in order. Do not collapse them.
Read once, don't reread. Within a single loop iteration, do not reread a
file you have already read in this iteration unless you have edited it. When
grep returns a line number, use Read with offset and limit to read
20-40 lines around the target — not the entire file. Rereading whole source
files between Red and Green of the same behavior is overhead, not discipline.
Red
Write exactly one test for the chosen behavior. Name it for the behavior in the project's convention. Assert an observable outcome through the public interface (Given = arrange the state before; When = the one action under test; Then = assert the observable result). Write no more of the test than is sufficient to fail; a compilation failure is a failure.
Run the resolved test command directly with Ba