Code Audit
Two-pass audit on recent changes or specified files: an adversarial Opus deep review followed by an independent Codex second opinion.
Runs on the latest Opus with
effort: xhigh. Correctness over token cost — these settings are deliberate and should not be tuned down.
How this works
Pass 1 — Opus deep audit. Spawn a general-purpose agent with model: "opus" using the prompt template below. The agent reads code, runs verification commands, and returns a structured report. It does NOT fix anything. The prompt explicitly frames the agent as adversarial.
Pass 2 — Codex standard review. Run /codex:review for an independent PR-review-style second opinion on the final fixed state (after Pass 1 findings are resolved). Pass 1 already provides the adversarial lens; Pass 2 answers "does a normal review also think this is clean?"
/codex:adversarial-review is reserved for genuinely security-critical changes. Only use it when a bug in the change would directly cause a security incident. The three triggers:
- Changes to code that enforces the authentication/authorization boundary (auth middleware, token signing/verification, session validation, password hashing). Not routine new endpoints that reuse existing auth.
- Direct use of cryptographic primitives (any new code using
crypto.*,webcrypto,node:crypto, signing, encryption, key derivation). - User explicitly flags the change as security-critical.
If none of these apply, use /codex:review. Reaching for adversarial by default produces noise and leaves no escalation path when something truly warrants it.
Scoping the audit
- With arguments (e.g.
/audit src/routes/admin.ts): scope to those files and their related code. - Without arguments: audit the most recent feature or fix completed end-to-end in this session. Identify it by:
git diff/git statusfor uncommitted changes- If the tree is clean,
git log origin/HEAD..HEADfor unpushed commits - Focus on the single coherent change, not the session's full output
- If there are multiple unrelated changes, ask the user which one
Fill in the CHANGE_CONTEXT block in the agent prompt below with: summary, all files modified, design decisions made, known edge cases.
Agent prompt template
Paste everything between the === markers into the Opus subagent's prompt. The subagent reads "you" as itself.
=== BEGIN OPUS AGENT PROMPT ===
You are an adversarial code auditor. Find every bug, type safety issue, dead code artifact, and UX problem in the change below. Assume broken until proven otherwise.
Rules:
- READ/GREP/BASH to verify — do NOT edit or write files.
- Work every applicable checklist item, plus anything off-checklist that looks wrong, smells off, or could break in production.
- Report every finding in the table at the end. Be specific:
file:line, actual value, expected value — no vague "this seems off". - Severity: CRITICAL (breaks functionality) / HIGH (user-visible wrong behavior) / MEDIUM (type safety, data integrity) / LOW (dead code, style) / INFO (observations).
- Zero findings is a valid answer. Don't invent problems.
Shell command rules — avoid approval prompts and secret leaks:
- No
$VARfor secrets. Extract once viagrep KEY= .env, paste the literal into later commands. Neversource .env && curl -H "Bearer $TOKEN". - No
$(...)substitution. Run the inner command, paste its output into the outer command. - Heredocs: always quote the delimiter (
<<'FOO'). Unquoted heredocs execute$vars/$(cmds)in the body. Better: write to a temp file and pass the path. - Relative paths (
src/foo.ts) — permission patterns are relative-aware. - Never print secrets. Your stdout → context → report → provider logs. Extract, use, return findings only. Never include API keys, JWTs, passwords, or
.envcontents.
What changed
{{CHANGE_CONTEXT — filled in before spawning}}
Audit checklist
Work through each category. Skip sections that clearly don't apply.
1. Type safety
- Do runtime values match their TypeScript types? Especially nullability — if code sets a field to
null, is the typeT | null? - Are there
ascasts,anytypes, or!non-null assertions that bypass the type system? Justified? - Do API response shapes match the frontend type interfaces field-by-field?
2. API contracts
- Request: does the endpoint validate input correctly (required fields, types, constraints)?
- Response: does every field the frontend reads exist in the backend response? Same name, same type?
- Status codes: success/error codes used correctly (200 vs 201, 400 vs 403 vs 404)?
- Auth: which roles/keys can access this endpoint? Tested?
3. Data consistency
- If data is merged from multiple sources: combined result correctly shaped, sorted, sized?
- If pagination is affected: does
totalmatch reality? Doeslimitcap correctly? Doesoffsetwork across pages? - If filters are added: does every UI filter value exist in the backend's allowed list? Does the backend handle each correctly?
- Are label maps, constant lists, and dropdown options in sync?
4. State and side effects
- Database writes: constraints respected (NOT NULL, FK, CHECK, UNIQUE)?
- Can the operation run twice without corruption (idempotency)?
- Error paths handled? What happens if an external call fails mid-operation?
- Queues, caches, Redis interactions consistent?
5. Auth and access control
- New or changed write endpoints: does every role get the correct access (allowed or blocked)?
- Any endpoint accidentally public or missing an auth gate?
- User identity fields populated (
triggered_by_*,created_by_*, etc.)?
6. Dead code and cleanup
- Imports for removed components/functions also removed?
- CSS classes for removed UI components also removed?
- Type definitions for removed features also removed?
- Commented-out blocks, unused variables, orphaned files?
7. Edge cases
- Empty state: what does the UI show with zero results?
- Null propagation: does consuming code handle optional-null without crashing?
- Boundary values:
limit=0,offset=999999, empty string inputs? - Concurrent access: can two requests conflict on the same data?
8. Live verification
- Call the changed endpoint(s) with valid auth. Does the response match expectations?
- Call with each auth tier if relevant (viewer, admin, no auth).
- Test at least 3 different parameter combinations including an edge case.
- If UI-visible: do labels, badges, filters all render correctly?
9. Open investigation
- Anything else that looks wrong, fragile, or likely to cause problems?
- Patterns that work today but would break under reasonable future changes?
- Performance concerns (N+1 queries, unbounded loops, missing indexes)?
- Security concerns (injection, privilege escalation, data leakage)?
- Anything that contradicts the project's documented invariants in
CLAUDE.md?
Report format
## Audit Report: {{change_name}}
### Issues found
| # | File:Line | Severity | Issue | Expected | Actual |
|---|-----------|----------|-------|----------|--------|
| 1 | ... | CRITICAL | ... | ... | ... |
### Verified OK
- [ ] Types match runtime ✓
- [ ] Pagination correct ✓
- [ ] Auth gates verified ✓
- [ ] ...
### Live test results
| Test | Expected | Actual |
|------|----------|--------|
| ... | ... | ✓ / ✗ |
### Open observations
- (Anything not in the checklist that the auditor noticed)
=== END OPUS AGENT PROMPT ===
After the audit
- Present Pass 1's report to the user.
- Fix every issue Opus found.
- Re-run Pass 1 (Opus) before Pass 2 if the fix was substantial: ≥3 CRITICAL or ≥8 total findings, >5 files touched, any function rewrite >30 lines, or this is already a second audit-fix cycle. Otherwise skip to step 4.
- Run
/codex:review(or/codex:adversarial-reviewif the change meets one of the three narrow triggers above).