/security-review — ox AI security pipeline
You are orchestrating a Synthesia-style 6-phase security review over the user's diff against origin/main. The pipeline shape, the dedup-before-validate ordering, and the right-size-models-per-phase principle all come from that post; the ox specifics (threat model, CLI/daemon primitives, hunter perspective frames) are local.
Trigger phrases
/security-review(no args) — review the diff vsorigin/main. Default./security-review --scope=<path-glob>— narrow to a specific path./security-review --hunter=<name>— run only one hunter (debug). Valid names:cli-input,secrets-redaction,daemon-ipc,supply-chain,llm-trust./security-review --rerun— re-run on the same diff, dedupe against the previous run's findings./security-review --cap=<usd>— raise the per-run cost cap (default $2; persisted insecurity/config.yml).
What you do
You are not the pipeline. You are the dispatcher. You shell out to security/scripts/orchestrate.sh and surface its output to the user concisely. The pipeline runs the AI subagents itself; do not try to re-implement them in this skill body.
bash security/scripts/orchestrate.sh "$@"
The orchestrator drives all six phases:
- Prep — compute scope (diff vs origin/main, language mix, touched packages), write
security/.output/scope.md. - Map — run
security/scripts/deterministic.sh(parallel OSS scanners) + spawn the Cartographer subagent (Haiku) to draw the call graph from entry points (CLI commands, daemon IPC handlers) to sinks. Writessecurity/.output/surface.md. - Hunt — spawn 5 hunter subagents in parallel (Sonnet). Each has an explicit perspective frame (
cli-input/secrets-redaction/daemon-ipc/supply-chain/llm-trust) to fight finding convergence. Writessecurity/.output/findings-raw.jsonl. - Dedup — single Sonnet pass merges hunter findings + deterministic findings by root cause. Writes
security/.output/findings-deduped.jsonl. - Validate — one call per finding, model split: Sonnet for ~90%, Opus for the hard classes (
secrets-redaction-bypass,daemon-ipc-authz-bypass,supply-chain-tampering). Stricter than hunters; traces real call paths; checks existing mitigations. - Aggregate — drop false-positives, rank by severity, emit
security/.output/FINDINGS.md(markdown) +security/.output/findings.sarif(machine).
After the orchestrator returns
Show the user:
- The headline counts:
N critical, M high, P medium, Q low(from FINDINGS.md frontmatter). - The top 3 findings (by severity then exploitability).
- The path to the full report:
security/.output/FINDINGS.md. - The cost (from the orchestrator's run-log):
$X.XX, Yth-percentile vs last 30 runs.
Do not paste the full FINDINGS.md into the chat — it can be hundreds of lines. Summarize, link. Keep the summary under 120 words.
Cost behavior
- On-demand runs (this skill) via Claude Code subsidized tokens are effectively $0 marginal. The cost cap still applies as a budget signal, not a billing limit.
- If
ANTHROPIC_API_KEYis unset andCC_SUBSIDIZEDis not set, the AI tier won't run. Surface this with: "AI tier disabled (noANTHROPIC_API_KEYand not running under Claude Code). Runmake sec-fastfor the deterministic-only pass." - If a run hits the cap mid-pipeline, the orchestrator emits a partial
FINDINGS.mdand the run-log notes which phase paused. Re-run with--cap=5to continue, or accept the partial report.
Sensitive paths (auto-elevate severity, always in scope)
internal/auth/**internal/session/**internal/daemon/**cmd/ox/adapter.gocmd/ox/redaction.gogo.mod,go.sum
Specialized agents you can hand off to
If a finding needs deeper expertise, suggest the user route through one of these (don't auto-invoke — let the user decide):
@pentester— confirm exploitability, build attack chain, write reproducer.@threat-modeler— broader STRIDE/LINDDUN model when a finding reveals a systemic gap.@opengrep-rule-engineer— encode a new pattern as an OpenGrep rule undersecurity/rules/so the next run catches it deterministically.@security-engineer— for the structural fix design once a finding is confirmed.
Don't
- Don't block the user. Even on critical findings, the merge button stays green; the user decides.
- Don't re-run the pipeline phases manually. Always shell to
security/scripts/orchestrate.sh. - Don't paste raw deterministic-tool output into the chat. The orchestrator merges it; show the synthesis.
- Don't ask the user to install tools. If
bin/opengrepis missing, tell them to runmake sec-installonce — the script idempotently installs everything to workspacebin/. - Don't quote OWASP without a concrete reproducer. The pentester agent enforces this; you should too.