Agent Workflow Loops
This skill defines the operational loops that implementer agents follow when making code changes and writing tests. Each loop has explicit entry criteria, exit criteria, and escalation rules. If you are an agent, follow these loops exactly.
You do not review your own work. All reviews are performed by an independent reviewer. Prefer Claude via the bundled scripts. If Claude is unavailable, use a different model before asking your own model family to review. Same-model shell-outs are the last resort. You never grade your own homework.
Bundled references:
references/testing-standards.md— Test quality standards (how to write tests)references/audit-workflow.md— Test gap discovery (how to find what's missing)references/perspective-catalog.md— Review perspective selection (used by primary and fallback code review)references/review-prompt.md— Code review prompt template for fallback reviewersreferences/audit-prompt.md— Test audit prompt template for module-scope (full-contract) auditsreferences/diff-audit-prompt.md— Test audit prompt template for diff-scope (per-commit) audits; used bydiff-test-audit.shwhen--gitis active
Bundled scripts:
$SKILL_DIR/scripts/specialist-review.sh— Provider-aware Claude/Gemini/Codex CLI path for code review$SKILL_DIR/scripts/diff-test-audit.sh— Provider-aware Claude/Gemini/Codex CLI path for test audit
Locate Scripts
The bundled scripts live inside the installed skill directory, not the project tree.
Before invoking any script, resolve SKILL_DIR so paths work regardless of install scope:
SKILL_DIR="$(ls -d ~/.codex/skills/agent-loops 2>/dev/null || ls -d .codex/skills/agent-loops 2>/dev/null)"
All script invocations below use "$SKILL_DIR/scripts/...". Run the snippet above once
at the start of your session and reuse the variable.
Architecture: Who Does What
| Role | Agent | How |
|---|---|---|
| Implementer | Codex or Gemini | Writes code changes and test code |
| Code Reviewer | Claude preferred; non-self scripted fallback next; same-model provider last; fresh-context Codex final fallback | specialist-review keeps same-model shell-outs last; fallback reviewer uses bundled prompts and produces a review artifact |
| Test Auditor | Claude preferred; non-self scripted fallback next; same-model provider last; fresh-context Codex final fallback | diff-test-audit keeps same-model shell-outs last; fallback auditor uses bundled prompts and produces an audit artifact |
| Remediator | Codex or Gemini | Fixes findings from the independent review/audit artifact |
Critical rule: Codex and Gemini NEVER self-review unless every independent provider path has already failed. Every review step must be performed by an independent reviewer using this selection order:
- Bundled script with automatic Claude-first, self-last provider selection
- A fresh-context Codex reviewer that did not implement the change
If neither path is available, stop and escalate to the user.
Why Shell-Based Review (Even for Claude)
The bundled scripts aren't a Codex/Gemini accommodation — they exist to enable
cross-model independent review, which every agent benefits from. The
provider rotation explicitly keeps the current agent's own model family last,
so a Claude agent invoking specialist-review.sh gets its review from Gemini
or Codex first, not another Claude instance.
Two kinds of reviewer independence are in play:
- Cross-model independence (shell scripts): reviewer is a different model family with different training data and alignment. Catches blind spots inherent to the current model. Requires shelling out to a different provider.
- Fresh-context independence (sub-agents): reviewer is the same model family but with no prior context. Catches local anchoring bias. Cheap to obtain via Task-tool sub-agents in Claude Code.
Agent-loops uses the first mechanism as its baseline because cross-model is a
stronger guarantee than fresh-context alone. Claude-native sub-agent flows
(like multi-specialist-review) add within-model multi-perspective diversity
on top of the shell-based baseline when warranted — they don't replace it.
Reviewer Selection Order
When a review or audit is required, use this exact fallback chain:
- Bundled script first. Let the bundled script try Claude, then another model family, and keep the current model family last. The script validates the artifact contract before accepting the generated artifact.
- Fresh-context Codex next. Spawn a reviewer agent with fresh context. That
agent must:
- not have authored or edited the implementation under review
- receive only the task spec, relevant diff/module/tests, and the bundled references
- act only as reviewer/auditor, not as implementer
- write its result to a markdown artifact under
.agents/reviews/
- Escalate if no independent reviewer is available.
Treat fallback artifacts exactly like script-generated REVIEW_FILE or
REPORT_FILE outputs in the loops below. Call out fallback usage in the handoff so
humans know whether the review came from Claude, Gemini, Codex, or fresh-context Codex.
Skill Invocation Reference
Pre-Review: Impact Analysis with Codanna (Optional)
Before requesting code review, you can use codanna to understand the blast radius of your changes. This provides grounded structural data that helps scope the review and catch issues the diff alone won't reveal.
# What calls the functions you changed?
codanna mcp find_callers process_request --watch
# What's the full impact if this symbol changes?
codanna mcp analyze_impact DatabaseConnection --watch --json
# Feed impact data into review context
IMPACT=$(codanna mcp analyze_impact "$CHANGED_SYMBOL" --watch --json 2>/dev/null)
This is optional — agent-loops works without codanna. But when available, impact data makes reviews more precise and catches downstream breakage the diff doesn't show.
specialist-review — Request Code Review
When: After completing implementation, after each remediation cycle. What you get back: Findings with severity levels (P0-P3) and a verdict (BLOCKED / PASS WITH ISSUES / CLEAN).
IMPORTANT: Source Files Only
Scope specialist-review to source files only. Do NOT include test files
(*.test.*, *.spec.*, __tests__/) in the path filter — tests are reviewed
separately in Loop 2 via diff-test-audit.
IMPORTANT: Do Not Review the Code Yourself
Your ONLY job is to invoke an independent reviewer and read the output artifact. Do NOT analyze the diff as the reviewer. Do NOT write review comments yourself. Do NOT adopt perspectives yourself. Route the review to Claude first, then fallback if needed.
Claude is still the preferred reviewer because it can load domain-specific skills such as
owasp-top-10, secure-coding-practices, and python-testing-patterns. The bundled
script now tries Claude first, then a different model family, and keeps same-model
shell-outs for last resort. Codex and Gemini are both available as explicit providers.
If the automated providers are unavailable or fail, continue with a fresh-context Codex
reviewer instead of reviewing the code yourself.
Automated Path: Provider-Aware Script
LONG-RUNNING CALL — USE THE POLLING PATTERN BELOW. This script invokes an external LLM and takes 3-5 minutes for larger diffs. Do NOT start remediation, tests, or commits while the review is in progress. When calling from a Bash tool, set
timeout: 600000(10 min) — the default 120-second timeout will kill the subprocess before the reviewer finishes. The review is only done when you have aREVIEW_FILEpath in hand.
Polling invocation (REQUIRED) — run the review in the background and poll so the Bash tool receives periodic output and does not time out:
# Start review