Testing: Write Tests That Catch Real Bugs

Write, structure, and maintain tests across unit, integration, E2E, accessibility, and performance layers. The goal is tests that catch regressions, document behavior, and run fast in CI - not tests that exist to inflate coverage numbers.

Target versions (May 2026):

Vitest 4.1.2, Jest 30.3.0
Playwright 1.59.0, Cypress 15.13.0
pytest 9.0.2, pytest-cov 7.1.0
Go 1.26.1 (testing stdlib, testing/synctest GA)
Rust 1.94.1 (cargo test, cargo-nextest 0.9.132)
Testing Library 16.3.2 (@testing-library/react)
axe-core 4.11.2 (@axe-core/playwright)
Grafana k6 1.7.1 (load testing)

When to use

Writing new tests (unit, integration, E2E, accessibility, performance)
Debugging flaky or failing tests
Designing test architecture for a project (fixture strategies, factory patterns, test data)
Setting up test infrastructure in CI (parallelization, sharding, coverage gates)
Choosing testing tools or migrating between test frameworks
Implementing TDD workflow
Adding accessibility or visual regression tests to an existing suite

When NOT to use

Reviewing existing test quality or correctness as part of a code review - use code-review
Security-specific testing (penetration testing, OWASP checks) - use security-audit
Cleaning up verbose/sloppy test code - use anti-slop
Ad-hoc web browsing, scraping, or page interaction outside of tests - use browse
CI/CD pipeline architecture (test jobs run inside pipelines, but pipeline design is ci-cd's domain) - use ci-cd
Database testing patterns at the engine level - use databases
Writing or refining LLM prompts (use prompt-generator)
Infrastructure or configuration validation outside tests (use terraform, ansible, or kubernetes)
AI/ML model evaluation or LLM output scoring - use ai-ml
Infrastructure-level load or chaos testing beyond application tests (use kubernetes for cluster-level chaos, or ci-cd for pipeline-integrated load test orchestration)

AI Self-Check

AI tools consistently produce the same testing mistakes. Before returning any generated test code, verify against this list:

Performance

Split fast unit tests from integration, browser, and performance suites.
Use fixtures and test data builders to avoid repeated expensive setup.
Shard or parallelize only after isolating shared state, ports, databases, and clocks.

Best Practices

Test behavior through stable public interfaces, not implementation details.
Use stable roles/test IDs for UI tests; do not select generated CSS classes.
Every regression fix gets a failing test that would have caught the bug.

Workflow

Step 1: Determine scope

Based on context:

New feature -> write tests alongside or before the code (TDD when appropriate)
Bug fix -> write a failing test first that reproduces the bug, then fix
Existing untested code -> prioritize critical paths, not 100% coverage
Test infrastructure -> set up runners, CI config, coverage gates

Identify the project's existing test framework from config files (vitest.config.ts, jest.config.*, pyproject.toml, Cargo.toml, *_test.go, playwright.config.ts). Match it. Don't introduce a second test runner without a reason.

Step 2: Choose the test layer

Layer	Tests what	Speed	When to use
Unit	Single function/module in isolation	ms	Pure logic, utilities, data transforms, state machines
Integration	Multiple modules, real dependencies	seconds	API handlers, database queries, service boundaries
E2E	Full user flows through the UI	seconds-minutes	Critical paths, checkout flows, auth, onboarding
Accessibility	WCAG compliance, screen reader compat	seconds	Every user-facing component/page
Visual	Screenshot comparison	seconds	UI components after style changes
Performance	Load, latency, throughput	minutes	Before releases, after arch changes

The testing pyramid still holds: many unit tests, fewer integration tests, fewest E2E tests. Invert it and your CI takes 45 minutes and everyone ignores test failures.

Step 3: Write the test

Follow the language-specific patterns below. Universal principles:

Arrange-Act-Assert (or Given-When-Then):

// Arrange: set up test data and dependencies
// Act: call the thing being tested
// Assert: verify the outcome

Test naming: describe the scenario, not the function.

# Bad:  test_calculate_total
# Good: test_calculate_total_applies_discount_when_cart_exceeds_100
# Good: it("returns 401 when token is expired")

Step 4: Validate

Run the full test suite: failures in other tests may indicate your change broke something
Check coverage delta: new code should be covered, but don't chase vanity numbers
Run in CI if possible - tests that pass locally but fail in CI are the worst kind

TDD Workflow

Use TDD when the behavior is well-defined upfront. Skip it when exploring or prototyping.

Red: write a test that fails (confirm it fails for the right reason)
Green: write the minimum code to make the test pass (ugly is fine)
Refactor: clean up without changing behavior (tests still pass)

TDD works best for: pure functions, data transformations, state machines, API contracts, bug reproduction.

TDD works poorly for: UI layout, exploratory prototyping, integration with undocumented APIs.

Mocking Strategy

Mock at boundaries, not everywhere. Over-mocking produces tests that pass while the real code is broken.

What to mock	What NOT to mock
External APIs (HTTP, gRPC)	Your own pure functions
Database (when unit

testing

How to add

Drop this on your repo README

Related skills

claude-api

skill-creator

oh-my-issues

claude-mem

Get new Desenvolvimento skills every Monday