TDD

One small failing test, just enough code to pass, then a look. Repeat.

Install

Drop in ~/.claude/skills/tdd/ (per-user) or <repo>/.claude/skills/tdd/ (per-project).

Skip TDD for

Spikes, copy/style/visual edits, one-off scripts, generated files. If the codebase has no tests at all, see untested-code.md first.

Two rules that keep TDD honest

Never start a second test before the first one passes.
Never edit production code while red, except to make red green.

If either slips, back up.

What a good test looks like

The name describes a capability, not a method. Read it out loud — it should sound like something a user or caller would care about.

✅ logged_out_user_cannot_publish_a_post
✅ schedule_overlap_returns_409

❌ post_service_calls_repo_save
❌ schedule_returns_object_with_status

The body has three sections in order — set up, do the thing, check what's observed:

test "expired tokens are rejected":
  token = issue_token(ttl_seconds: 60)
  advance_clock(seconds: 120)

  result = verify(token)

  assert result.ok == false
  assert result.reason == "expired"

If renaming a private function tomorrow would break the test even though no behavior changed, the test was tied to internals. Rewrite or delete it. See test-anatomy.md.

The cardinal mistake: tests in bulk

Writing five tests up front and then five implementations produces tests that describe an imagined system. They lock you into the wrong shape and stop pulling their weight once any pair shares a code path.

Each test must exist because of something you learned writing the previous one.

Wrong:  RED  t1 t2 t3 t4 t5    →   GREEN  c1 c2 c3 c4 c5
Right:  t1→c1, t2→c2, t3→c3, ...

Same mistake in miniature: writing one test and reaching inside to "also handle" something it doesn't cover. Don't.

Before the cycle — wire up the test command

The Stop hook runs your tests at the end of every turn (green allows the turn to end, red blocks it). It reads the command from .agents/tdd/test-command.txt at the project root.

If that file is missing, set it up before starting the cycle:

Read CLAUDE.md first — projects often document the canonical test command there.
Otherwise check package.json (scripts.test), pyproject.toml / pytest.ini, Makefile (test: target), Cargo.toml, or analogous config files for the project's stack.
If multiple plausible commands exist (e.g. test:unit vs test:e2e), ask the user which one to use for TDD.

Once decided, create .agents/tdd/ if missing and write the command on a single line to .agents/tdd/test-command.txt. Confirm with the user the first time.

The user can edit the file at any time to change the command (e.g. switching from npm test to npm run test:unit).

The cycle

1. Decide. If a spec/<slug>.md exists for this task, read it — Behaviors + Acceptance criteria are your queue. Otherwise list the behaviors to verify in rough order with whoever cares.

2. Smoke run. Pick the first behavior. Test → fails → minimum code → passes. If this first cycle takes more than 15 minutes, the slice is too big.

3. Each next behavior. Same shape. One test at a time. No speculation. No "while I'm here." If a rule slips, see smells.md.

4. Cleanup. Tests green? Invoke the simplify skill on the recent changes — it reviews code for reuse, quality, and efficiency and applies any improvements found. After simplify, see cleanup.md for additional cleanup. Cleanup only on green; never add behavior during cleanup. (The Stop hook will also remind you to invoke simplify once per session if you forget.)

When you're done

Stop adding tests when all are true:

Every acceptance criterion has a test.
Every interesting branch has a test (skip getters, plumbing).
The domain edge cases (empty input, expired things, races, boundaries) are covered.
Reading just the test file would teach the feature.

100% coverage with bad tests is worse than 70% with good ones.

Per-cycle checklist

Test names a capability, not a method.
Test only touches the public API.
Test would survive a rename of internals.
Code written is the minimum for this test.
No second test queued before this one is green.

Map

test-anatomy.md — body shape, assertions per test, naming.
smells.md — bad-test patterns + warning signs while you work or review.
boundaries.md — mocks, fakes, stubs; what counts as an edge.
untested-code.md — TDD on legacy.
shape.md — deep modules + designing for tests.
cleanup.md — what to do after green.

tdd

How to add

Drop this on your repo README

Related skills

claude-api

skill-creator

oh-my-issues

claude-mem

Get new Desenvolvimento skills every Monday

TDD