Research Idea Creator

Generate publishable research ideas for: $ARGUMENTS

Overview

Given a broad research direction from the user, systematically generate, validate, and rank concrete research ideas. This skill composes with /research-lit, /novelty-check, and /research-review to form a complete idea discovery pipeline.

Constants

PILOT_MAX_HOURS = 2 — Skip any pilot estimated to take > 2 hours per GPU. Flag as "needs manual pilot".
PILOT_TIMEOUT_HOURS = 3 — Hard timeout: kill pilots exceeding 3 hours. Collect partial results if available.
MAX_PILOT_IDEAS = 3 — Pilot at most 3 ideas in parallel. Additional ideas are validated on paper only.
MAX_TOTAL_GPU_HOURS = 8 — Total GPU budget for all pilots combined.
REVIEWER_MODEL = gpt-5.5 — Default model for the Codex backend. Must be an OpenAI model (e.g., gpt-5.5, o3, gpt-4o). Manual backend uses whatever model the user chooses, but it must be a non-Claude model — the executor is Claude, so pasting into any Claude product makes Claude judge Claude and voids the cross-model invariant (see shared-references/reviewer-routing.md).
REVIEWER_BACKEND = codex — Default: Codex MCP (xhigh). Override with — reviewer: oracle-pro for Oracle MCP, or — reviewer: manual for Manual Review MCP. If manual-review MCP is unavailable, stop and print the install command; do not fall back to Codex. See shared-references/reviewer-routing.md.
OUTPUT_DIR = idea-stage/ — All idea-stage outputs go here. Create the directory if it doesn't exist.

💡 Override via argument, e.g., /idea-creator "topic" — pilot budget: 4h per idea, 20h total.

Reviewer Calling Convention

When calling the reviewer for idea evaluation, branch on REVIEWER_BACKEND:

If REVIEWER_BACKEND = codex: Use mcp__codex__codex for new review threads. Use mcp__codex__codex-reply for follow-up rounds (reuse threadId).

If REVIEWER_BACKEND = manual: Use mcp__manual_review__review for new review threads with: prompt: [exact same prompt that would go to Codex] config: {"model_reasoning_effort": "xhigh"} Save the returned threadId. Use mcp__manual_review__review_reply for follow-up rounds with: threadId: [saved manual-review threadId] prompt: [follow-up prompt] config: {"model_reasoning_effort": "xhigh"}

Prompt fidelity: the manual prompt must be exactly the same text that Codex would receive. Review tracing applies equally to both backends.

Workflow

Phase 0: Load Research Wiki (if active)

Skip this phase entirely if research-wiki/ does not exist.

If research-wiki/ exists, resolve the canonical helper using the shared resolution chain (see ../research-wiki/SKILL.md for the contract):

cd "$(git rev-parse --show-toplevel 2>/dev/null || pwd)" || exit 1
ARIS_REPO="${ARIS_REPO:-$(awk -F'\t' '$1=="repo_root"{print $2; exit}' .aris/installed-skills.txt 2>/dev/null)}"
WIKI_SCRIPT=".aris/tools/research_wiki.py"
[ -f "$WIKI_SCRIPT" ] || WIKI_SCRIPT="tools/research_wiki.py"
[ -f "$WIKI_SCRIPT" ] || { [ -n "${ARIS_REPO:-}" ] && WIKI_SCRIPT="$ARIS_REPO/tools/research_wiki.py"; }
[ -f "$WIKI_SCRIPT" ] || {
  echo "WARN: research_wiki.py not found at .aris/tools/, tools/, or \$ARIS_REPO/tools/." >&2
  echo "      The idea-creation primary output (idea ranking) will still be produced." >&2
  echo "      Wiki integration (load query_pack, write idea pages, add edges, rebuild query_pack) will be skipped." >&2
  echo "      Fix: rerun 'bash tools/install_aris.sh', export ARIS_REPO, or 'cp <ARIS-repo>/tools/research_wiki.py tools/'." >&2
  WIKI_SCRIPT=""
}

if research-wiki/query_pack.md exists AND is less than 7 days old:
    Read query_pack.md and use it as initial landscape context:
    - Treat listed gaps as priority search seeds
    - Treat failed ideas as a banlist (do NOT regenerate similar ideas)
    - Treat top papers as known prior work (do not re-search them)
    Still run Phase 1 below for papers from the last 3-6 months (wiki may be stale)
else if research-wiki/ exists but query_pack.md is stale or missing:
    if [ -n "$WIKI_SCRIPT" ]: python3 "$WIKI_SCRIPT" rebuild_query_pack research-wiki/
    Then read query_pack.md as above

Phase 1: Landscape Survey (5-10 min)

Map the research area to understand what exists and where the gaps are.

Scan local paper library first: Check papers/ and literature/ in the project directory for existing PDFs. Read first 3 pages of relevant papers to build a baseline understanding before searching online. This avoids re-discovering what the user already knows.
Search recent literature using WebSearch:
- Top venues in the last 2 years (NeurIPS, ICML, ICLR, ACL, EMNLP, etc.)
- Recent arXiv preprints (last 6 months)
- Use 5+ different query formulations
- Read abstracts and introductions of the top 10-15 papers
Build a landscape map:
- Group papers by sub-direction / approach
- Identify what has been tried and what hasn't
- Note recurring limitations mentioned in "Future Work" sections
- Flag any open problems explicitly stated by multiple papers
Identify structural gaps:
- Methods that work in domain A but haven't been tried in domain B
- Contradictory findings between papers (opportunity for resolution)
- Assumptions that everyone makes but nobody has tested
- Scaling regimes that haven't been explored
- Diagnostic questions that nobody has asked

Phase 1.5: Parallel lens fan-out (Tier-aware) — breadth, not verdict

Idea generation benefits from breadth: more independent analytic angles surface more candidate ideas. This skill fans out candidate generation across analytic lenses, then funnels every candidate through the single Phase-4 cross-model jury. Fan-out widens the jury's input; it never makes the accept/reject decision. This follows shared-references/fan-out-pattern.md; the verdict stays cross-model per shared-references/acceptance-gate.md (idea novelty/quality is a Type-B verdict — same-family generation is fine, same-family acquittal is not).

Lenses (the structural-gap angles from Phase 1, step 3): method-transfer (works in domain A, untried in B) · contradiction (conflicting findings to resolve) · untested-assumption (everyone assumes, nobody tested) · scaling-regime (unexplored regime) · diagnostic (question nobody asked). This set is a floor, not a ceiling — add a domain-specific lens when the direction warrants.

Tier-portable dispatch (the Phase-4 jury downstream is identical on every tier):

Tier 1 (Workflow available): spawn one Claude subagent per lens; each runs the Phase-1 survey through its lens and the Phase-2 generation prompt restricted to that lens, returning candidates as structured output.
Tier 2 (Agent tool, no Workflow): spawn the same per-lens subagents via the Agent tool.
Tier 3 (no spawning): enumerate the lenses sequentially in one pass — the original single-thread behavior, made explicit. No capability assumed.

Why the lens shards are Claude, not Codex. Generation is candidate production, not a verdict, so same-family is safe — and Codex MCP is serial (concurrent codex calls hang), so spending its scarce capacity on parallel generation is both unsafe-to-parallelize and wasteful. Reserve Codex for the one Phase-4 jury call. On Tier 1/2 the lens subagents are the generators; the single Phase-2 codex brainstorm below still runs once as an optional cross-model seed (a generator, not a judge), and its ideas join the merged pool.

Per-shard output (the generation-fan-out schema from fan-out-pattern.md — shard_id + candidates[] + per-item dedup_key):

{"shard_id": "<lens id>", "candidates": [{"summary": "...", "hypothesis"

idea-creator

How to add

Drop this on your repo README

Related skills

dev-browser

agent-browser

understand-chat

understand-dashboard

Get new Pesquisa e Web skills every Monday