Research Idea Creator
Generate publishable research ideas for: $ARGUMENTS
Overview
Given a broad research direction from the user, systematically generate, validate, and rank concrete research ideas. This skill composes with /research-lit, /novelty-check, and /research-review to form a complete idea discovery pipeline.
Constants
- PILOT_MAX_HOURS = 2 — Skip any pilot estimated to take > 2 hours per GPU. Flag as "needs manual pilot".
- PILOT_TIMEOUT_HOURS = 3 — Hard timeout: kill pilots exceeding 3 hours. Collect partial results if available.
- MAX_PILOT_IDEAS = 3 — Pilot at most 3 ideas in parallel. Additional ideas are validated on paper only.
- MAX_TOTAL_GPU_HOURS = 8 — Total GPU budget for all pilots combined.
- REVIEWER_MODEL =
gpt-5.5— Default model for the Codex backend. Must be an OpenAI model (e.g.,gpt-5.5,o3,gpt-4o). Manual backend uses whatever model the user chooses, but it must be a non-Claude model — the executor is Claude, so pasting into any Claude product makes Claude judge Claude and voids the cross-model invariant (seeshared-references/reviewer-routing.md). - REVIEWER_BACKEND =
codex— Default: Codex MCP (xhigh). Override with— reviewer: oracle-profor Oracle MCP, or— reviewer: manualfor Manual Review MCP. If manual-review MCP is unavailable, stop and print the install command; do not fall back to Codex. Seeshared-references/reviewer-routing.md. - OUTPUT_DIR =
idea-stage/— All idea-stage outputs go here. Create the directory if it doesn't exist.
💡 Override via argument, e.g.,
/idea-creator "topic" — pilot budget: 4h per idea, 20h total.
Reviewer Calling Convention
When calling the reviewer for idea evaluation, branch on REVIEWER_BACKEND:
If REVIEWER_BACKEND = codex:
Use mcp__codex__codex for new review threads.
Use mcp__codex__codex-reply for follow-up rounds (reuse threadId).
If REVIEWER_BACKEND = manual:
Use mcp__manual_review__review for new review threads with:
prompt: [exact same prompt that would go to Codex]
config: {"model_reasoning_effort": "xhigh"}
Save the returned threadId.
Use mcp__manual_review__review_reply for follow-up rounds with:
threadId: [saved manual-review threadId]
prompt: [follow-up prompt]
config: {"model_reasoning_effort": "xhigh"}
Prompt fidelity: the manual prompt must be exactly the same text that Codex would receive. Review tracing applies equally to both backends.
Workflow
Phase 0: Load Research Wiki (if active)
Skip this phase entirely if research-wiki/ does not exist.
If research-wiki/ exists, resolve the canonical helper using the
shared resolution chain (see ../research-wiki/SKILL.md for the
contract):
cd "$(git rev-parse --show-toplevel 2>/dev/null || pwd)" || exit 1
ARIS_REPO="${ARIS_REPO:-$(awk -F'\t' '$1=="repo_root"{print $2; exit}' .aris/installed-skills.txt 2>/dev/null)}"
WIKI_SCRIPT=".aris/tools/research_wiki.py"
[ -f "$WIKI_SCRIPT" ] || WIKI_SCRIPT="tools/research_wiki.py"
[ -f "$WIKI_SCRIPT" ] || { [ -n "${ARIS_REPO:-}" ] && WIKI_SCRIPT="$ARIS_REPO/tools/research_wiki.py"; }
[ -f "$WIKI_SCRIPT" ] || {
echo "WARN: research_wiki.py not found at .aris/tools/, tools/, or \$ARIS_REPO/tools/." >&2
echo " The idea-creation primary output (idea ranking) will still be produced." >&2
echo " Wiki integration (load query_pack, write idea pages, add edges, rebuild query_pack) will be skipped." >&2
echo " Fix: rerun 'bash tools/install_aris.sh', export ARIS_REPO, or 'cp <ARIS-repo>/tools/research_wiki.py tools/'." >&2
WIKI_SCRIPT=""
}
if research-wiki/query_pack.md exists AND is less than 7 days old:
Read query_pack.md and use it as initial landscape context:
- Treat listed gaps as priority search seeds
- Treat failed ideas as a banlist (do NOT regenerate similar ideas)
- Treat top papers as known prior work (do not re-search them)
Still run Phase 1 below for papers from the last 3-6 months (wiki may be stale)
else if research-wiki/ exists but query_pack.md is stale or missing:
if [ -n "$WIKI_SCRIPT" ]: python3 "$WIKI_SCRIPT" rebuild_query_pack research-wiki/
Then read query_pack.md as above
Phase 1: Landscape Survey (5-10 min)
Map the research area to understand what exists and where the gaps are.
-
Scan local paper library first: Check
papers/andliterature/in the project directory for existing PDFs. Read first 3 pages of relevant papers to build a baseline understanding before searching online. This avoids re-discovering what the user already knows. -
Search recent literature using WebSearch:
- Top venues in the last 2 years (NeurIPS, ICML, ICLR, ACL, EMNLP, etc.)
- Recent arXiv preprints (last 6 months)
- Use 5+ different query formulations
- Read abstracts and introductions of the top 10-15 papers
-
Build a landscape map:
- Group papers by sub-direction / approach
- Identify what has been tried and what hasn't
- Note recurring limitations mentioned in "Future Work" sections
- Flag any open problems explicitly stated by multiple papers
-
Identify structural gaps:
- Methods that work in domain A but haven't been tried in domain B
- Contradictory findings between papers (opportunity for resolution)
- Assumptions that everyone makes but nobody has tested
- Scaling regimes that haven't been explored
- Diagnostic questions that nobody has asked
Phase 1.5: Parallel lens fan-out (Tier-aware) — breadth, not verdict
Idea generation benefits from breadth: more independent analytic angles
surface more candidate ideas. This skill fans out candidate generation
across analytic lenses, then funnels every candidate through the single
Phase-4 cross-model jury. Fan-out widens the jury's input; it never makes the
accept/reject decision. This follows
shared-references/fan-out-pattern.md;
the verdict stays cross-model per
shared-references/acceptance-gate.md
(idea novelty/quality is a Type-B verdict — same-family generation is fine,
same-family acquittal is not).
Lenses (the structural-gap angles from Phase 1, step 3):
method-transfer (works in domain A, untried in B) · contradiction
(conflicting findings to resolve) · untested-assumption (everyone assumes,
nobody tested) · scaling-regime (unexplored regime) · diagnostic
(question nobody asked). This set is a floor, not a ceiling — add a
domain-specific lens when the direction warrants.
Tier-portable dispatch (the Phase-4 jury downstream is identical on every tier):
- Tier 1 (Workflow available): spawn one Claude subagent per lens; each runs the Phase-1 survey through its lens and the Phase-2 generation prompt restricted to that lens, returning candidates as structured output.
- Tier 2 (Agent tool, no Workflow): spawn the same per-lens subagents via the Agent tool.
- Tier 3 (no spawning): enumerate the lenses sequentially in one pass — the original single-thread behavior, made explicit. No capability assumed.
Why the lens shards are Claude, not Codex. Generation is candidate production, not a verdict, so same-family is safe — and Codex MCP is serial (concurrent codex calls hang), so spending its scarce capacity on parallel generation is both unsafe-to-parallelize and wasteful. Reserve Codex for the one Phase-4 jury call. On Tier 1/2 the lens subagents are the generators; the single Phase-2 codex brainstorm below still runs once as an optional cross-model seed (a generator, not a judge), and its ideas join the merged pool.
Per-shard output (the generation-fan-out schema from
fan-out-pattern.md — shard_id +
candidates[] + per-item dedup_key):
{"shard_id": "<lens id>", "candidates": [{"summary": "...", "hypothesis"