Code Review: Deep Correctness Audit
Find bugs that actually break things. Not style, not slop - correctness, reliability, and logic errors that will bite in production.
This skill complements anti-slop (code quality/style) and security-audit (vulnerabilities/OWASP). Those catch "is the code clean?" and "is the code safe?" - this one catches "does the code actually work?"
Covers: TypeScript/JavaScript, Python, Go, Java, Bash/Shell, and Infrastructure as Code (Terraform, Ansible, Helm, Kubernetes, Docker/Compose, Proxmox/LXC). Universal patterns apply everywhere; language-specific sections add targeted checks.
When to use
- Reviewing recent changes for bugs, regressions, edge cases, or fragile assumptions
- Sanity-checking code before merge or release
- Looking for logic errors that static tooling may miss
- Doing a focused correctness review where style and security are secondary
The Three Questions
Every finding answers one of:
- Will it crash? - null derefs, unhandled errors, resource exhaustion, missing imports
- Will it do the wrong thing? - logic errors, off-by-ones, wrong comparisons, missing cases
- Will it break later? - race conditions, implicit ordering, fragile assumptions, API contract drift
When NOT to use
- Style, verbosity, or machine-generated code quality issues - use anti-slop
- Exploitable vulnerabilities, auth flaws, or secret scanning - use security-audit
- Pipeline architecture design - use ci-cd
- End-of-session doc hygiene or instruction-file cleanup - use update-docs
AI Self-Check
Before reporting any finding at >= 80% confidence, verify:
- Read full context: read the entire function/file, not just the flagged line
- Check for tests: is there a test covering this case? Is the test correct?
- Check git blame: is this new code or battle-tested? Pre-existing issues belong out of scope
- Check for explaining comments: a comment explaining the pattern means someone already considered it
- Cite the evidence: exact file, line, and code that proves the issue. No citation = no finding
- Adversarial self-check: argue against each finding. If the counter-argument is convincing, drop it
- Construct a failing case: for P0 findings, describe the specific input or sequence that triggers the bug
- Verify API/stdlib claims: AI code review suggestions frequently contain factual errors about framework behavior. If unsure, look it up
- Boundary values on numeric inputs flagged: zero, negative, and overflow values on page numbers, sizes, counts, and indices are high-confidence findings - do not suppress with the 80% threshold
- Current source checked: dated versions, CLI flags, API names, and support windows are verified against primary docs before repeating them
- Hidden state identified: local config, credentials, caches, contexts, branches, cluster targets, or previous runs are made explicit before acting
- Verification is real: final checks exercise the actual runtime, parser, service, or integration point instead of only linting prose or happy paths
- Routing overlap checked: overlapping skills, trigger terms, and "When NOT to use" boundaries are checked before returning guidance
- Spec claims verified: claims about tool behavior, output contracts, or repo conventions are checked against current docs, scripts, or skill files
- Line references verified: every finding points to code that exists in the reviewed diff
- Behavioral claim proven: findings describe a plausible failing input, race, leak, or regression
Performance
- Start with changed public interfaces, shared utilities, migrations, and concurrency boundaries.
- Use tests and static analysis to validate suspected issues instead of reading the entire repo linearly.
- Merge duplicate findings into one high-signal comment with affected locations.
Best Practices
- Lead with bugs and risks, not style preferences.
- Do not request rewrites unless the current structure blocks correctness or maintainability.
- Call out missing tests only when a specific behavior or risk needs coverage.
Workflow
Step 1: Scope the review
Default scope based on context:
- If invoked right after writing code in this session -> self-check (review what you just wrote)
- If there are uncommitted changes (
git diff --name-only) -> recent changes - If the user specifies files/dirs/commits -> targeted review
- Otherwise -> ask the user
Available scopes:
- Full codebase review - scan everything, report by category
- Recent changes - check git diff or specific commits
- Specific files/dirs - targeted review
- Self-check - review code you just wrote in this session
Large diffs (> 500 lines): Chunk by file. Review each file with its surrounding context, then do a cross-file pass looking for integration issues (mismatched types across boundaries, inconsistent error handling, broken call chains). Large diffs are also a code smell worth noting in Observations.
Step 2: Gather project context
Before reviewing any code, build context:
- Read project instruction files (
AGENTS.mdor equivalent) if present - project conventions, patterns, known gotchas - Check the project's language/framework versions (package.json, pyproject.toml, go.mod, etc.)
- Understand the architecture - monolith, microservices, CLI tool, library?
- Note any custom error handling patterns, logging conventions, or testing requirements
This context prevents false positives. A pattern that's wrong in a React app might be correct in a Node CLI tool.
Step 3: Run mechanical checks first (if available and practical)
Before manual review, run standard tooling to clear obvious issues - but only when it makes sense:
- TypeScript:
tsc --noEmit/eslint(skip if notsconfig.json/.eslintrc*, or if the project has 500+ TS files - too slow) - Python:
ruff check/mypy(skip if nopyproject.toml/ruff.toml/mypy.ini) - Shell:
shellcheck(fast, always worth running if installed) - Terraform:
terraform validate(skip ifterraform inithasn't been run - validate requires initialized providers) - Ansible:
ansible-lint(skip if no.ansible-lintconfig and the project isn't primarily Ansible)
When to skip a tool:
- No config file for it in the project (no
tsconfig.json, nopyproject.toml, etc.) - Reviewing a small diff (< 5 files) - linting the whole project for a 3-file change is wasted effort
- The user just wants a quick review, not a full audit
When a tool isn't installed: Don't silently skip it. Tell the user which tools are missing so they can install them. Example: "shellcheck isn't installed - consider pacman -S shellcheck for shell script linting." This is a one-time heads-up, not a blocker - continue the review without it.
Linters catch syntax, imports, and known anti-patterns mechanically. This skill focuses on what automated tools miss: logic errors, edge cases, incorrect assumptions, and subtle bugs that require understanding intent. Don't burn time and tokens on linter output - move to the actual review.
Step 4: Review with four focus areas
Review the code through four lenses. These aren't sequential passes - they're dimensions to evaluate as you read. The order reflects priority: understanding intent comes first because everything else depends on it.
Focus 1: Understand Intent Read the code to understand what it's supposed to do. If reviewing a diff, read the surrounding context too. Check commit messages, PR descriptions, or comments for stated intent. You can't find bugs if you don't know what "correct" looks like.
Focus 2: Trace Logic Paths Follow every code path. For each branch, loop, or condition:
- What happens on the happy path?
- What happens on each error path?
- What happen