Gemini Orchestrator
When the user invokes /gemini <instruction>, Claude delegates the implementation
to Gemini CLI via its headless mode (-p/--prompt), monitors in real time, and reports.
Known Limits
Hard constraints of the Gemini CLI — not config options.
1. No --max-turns flag
Vibe lets you cap turn count (--max-turns 8). Gemini CLI has no equivalent.
Timeout is the only runaway-control lever. A stuck run burns the full timeout
before dying. Set timeouts conservatively and decompose tasks.
2. High context overhead (~900–10k tokens before your task starts)
Gemini CLI loads a large default system prompt on every run:
- Simple prompt → ~883 tokens before the model responds
- File-read task → ~10k tokens of context before first tool call
This means:
- Each run costs more than token-naive estimates suggest
- Short timeouts can expire during context-loading on a slow connection
- The overhead is mostly cached on repeated calls to the same model in a session
3. 503 backoff eats your timeout silently
On the free-tier Gemini API, the model is frequently "under high demand." The CLI auto-retries with exponential backoff — observed taking 60–90s before work even begins. This is invisible until you see the first tool call.
Always add 90s buffer to your "real work" estimate:
Timeout budget = expected_work_secs + 90s backoff buffer + 30s context load
4. No --agent flag
Gemini CLI is single-mode only. There is no way to switch to a review-only or
plan-only agent. Use plan mode (--approval-mode plan) as a partial substitute.
5. No --workdir flag
The delegate script handles this by cd-ing into the workdir before running.
6. No pseudo-TTY needed (positive difference vs Vibe)
Gemini CLI works fine in a plain pipe — no script -q -c wrapper needed.
7. Orchestration chain has 5 independent failure points
The delegation pipeline is: Gemini CLI -> plain pipe -> Python stream parser -> result event tokens -> git diff -> JSON log. Each link can fail independently:
| Link | Failure mode | Symptom |
|---|---|---|
| Gemini CLI | Auth expired, quota hit, 503 | Immediate exit or silent 90s hang |
| Stream parser | Gemini changes its JSON event schema | Tool calls not detected, token count 0 |
| result event | Missing on timeout or crash | Tokens logged as 0, cost not computed |
| git diff | Not a git repo, or Gemini committed mid-run | Wrong file count |
| JSON log | ~/.local/share/ not writable | Silent log skip |
When a run produces unexpected results, check these links top to bottom.
Step 1 — Detect workdir
git rev-parse --show-toplevelin the current directory.- If ambiguous or no git repo → ask with
AskUserQuestion.
Step 2 — Choose mode
| Mode | Flag | Writes files? | Use for |
|---|---|---|---|
impl | --yolo | Yes | Implementing changes (default) |
plan | --approval-mode plan | No | Safe exploration, reading, planning |
Use plan mode when you want Gemini to read the codebase and report back without
touching any files. Proposed writes appear as [plan-write] and are blocked.
Step 3 — Decompose the task
Critical rule: Gemini works best on atomic, focused tasks. Given the context overhead and 503 risk, keep tasks smaller than you might expect.
Decide whether to delegate at all:
gemini-delegate has real overhead (503 backoff, context load, stream parser, git diff, JSON log). For trivial changes the setup cost exceeds the savings.
| Signal | Action |
|---|---|
| 1 file, ≤ ~10 lines to change, location already known | Do it directly — don't delegate |
| 1 file, logic non-trivial OR location unclear | Delegate |
| 2–3 files, single objective | Delegate |
| >3 files OR multi-step logic OR migrations | Delegate, broken into sub-tasks |
The sweet spot is medium to heavy tasks.
| Size | Definition | Approach |
|---|---|---|
| Trivial | 1 file, change is obvious and located | Skip delegation — edit directly |
| Simple | 1 file, non-trivial logic or unknown location | 1 gemini call, impl mode |
| Medium | 2–3 related files, 1 goal | 1 gemini call with structured prompt |
| Complex | >3 files OR business logic OR DB migrations | Decompose |
Decomposition for complex tasks:
Sub-task 1: Explore relevant files — plan mode, 120s
Sub-task 2: Implement change A in file X — impl mode, 180s
Sub-task 3: Implement change B in file Y — impl mode, 180s
Sub-task 4: Verify / test — plan mode, 120s
→ Check git diff between sub-tasks before launching the next.
Step 4 — Write the Gemini prompt
Gemini has no context from the parent conversation. The prompt must be self-contained.
Structure of a good Gemini prompt:
Stack: Python/Flask, SQLAlchemy, SQLite
Key files: app.py (routes + fetch), models.py (Entry)
TASK: [one single thing to do, stated as an imperative]
CONSTRAINTS:
- [what must not break]
- [expected format if relevant]
VERIFY: grep for "def function_name" in file.py and confirm it exists.
Formulation rules:
- One task per prompt — never "also do X and Y"
- Name the exact files to modify
- Include a grep-based verification criterion (not a file re-read)
- Language: English (better Gemini performance)
- Keep prompts under ~500 words — longer prompts increase context overhead
Verification — always use grep, not file re-read:
VERIFY: grep for "def extract_labels" in app.py and confirm it exists.
A grep is unambiguous. A file re-read can miss content outside the context window.
Examples:
❌ Bad (too vague, too wide):
Fix the API, add a signal classifier, update the UI with colored badges
✅ Good (atomic, verifiable):
Stack: Python/Flask. File: app.py
TASK: In fetch_data(), convert the date string (format "YYYY-MM-DD")
to datetime.date before returning, and convert id to str.
VERIFY: grep for "datetime.date" in app.py and confirm it exists.
Step 5 — Launch Gemini
~/tools/gemini-delegate "<workdir>" "<prompt>" [timeout-secs] [mode]
| Argument | Default | Notes |
|---|---|---|
workdir | — | Absolute path, must exist |
prompt | — | Self-contained task description |
timeout-secs | 180 | Budget: work + 90s backoff + 30s context load |
mode | impl | impl (writes ok) or plan (read-only) |
Recommended timeouts:
- Plan/explore only:
120 - Simple change (1 file):
180 - Medium change (2–3 files):
270 - Hard ceiling:
300— decompose instead
Examples:
# Explore only — safe, no writes
~/tools/gemini-delegate "/path/to/project" "Read app.py and describe the route structure" 120 plan
# Implement a single-file change
~/tools/gemini-delegate "/path/to/project" "Stack: Flask. File: app.py. TASK: ..." 180 impl
# Background run
~/tools/gemini-delegate "/path/to/project" "..." 240 impl > /tmp/gemini_out.txt 2>&1 &
# Monitor with: tail -f /tmp/gemini_out.txt
Step 6 — Supervise in real time
The script prints live:
=== GEMINI START ===
Workdir : /path/to/project
Mode : impl (yolo)
Timeout : 180s
Prompt : Stack: Python/Flask. File: app.py ...
====================
[init] model=gemini-2.5-flash
[read] app.py
[write] app.py
[gemini] Done. Converted date to datetime.date in fetch_data().
Tool calls: 3
Gemini tokens: 1,234 (900 in + 334 out, 0 cached) | ~$0.0003 (8.2s)
Claude Sonnet 4.6 eq: same tokens ~$0.0077 (ratio x25.7)
=== GEMINI DONE (exit: 0) ===
=== SYNTAX OK (1 file(s) checked) ===
=== UNCOMMITTED CHANGES ===
app.py | 4 ++--
[log]