Research Idea Creator

Generate publishable research ideas for: $ARGUMENTS

Overview

Given a broad research direction from the user, systematically generate, validate, and rank concrete research ideas. This skill composes with /research-lit, /novelty-check, and /research-review to form a complete idea discovery pipeline.

Constants

PILOT_MAX_HOURS = 2 — Skip any pilot estimated to take > 2 hours per GPU. Flag as "needs manual pilot".
PILOT_TIMEOUT_HOURS = 3 — Hard timeout: kill pilots exceeding 3 hours. Collect partial results if available.
MAX_PILOT_IDEAS = 3 — Pilot at most 3 ideas in parallel. Additional ideas are validated on paper only.
MAX_TOTAL_GPU_HOURS = 8 — Total GPU budget for all pilots combined.
REVIEWER_MODEL = gpt-5.5 — Model used via a secondary Codex agent for brainstorming and review. Must be an OpenAI model (e.g., gpt-5.5, o3, gpt-4o).
REVIEWER_BACKEND = codex — Default: Codex xhigh reviewer through spawn_agent / send_input. Use --reviewer: oracle-pro only when explicitly requested; if Oracle is unavailable, warn and fall back to Codex xhigh.
OUTPUT_DIR = idea-stage/ — All idea-stage outputs go here. Create the directory if it doesn't exist.

💡 Override via argument, e.g., /idea-creator "topic" — pilot budget: 4h per idea, 20h total.

Workflow

Phase 0: Load Research Wiki (if active)

Skip this phase entirely if research-wiki/ does not exist.

Resolve the wiki helper using the Codex-side canonical chain (see ../shared-references/wiki-helper-resolution.md):

ARIS_REPO="${ARIS_REPO:-$(awk -F'\t' '$1=="repo_root"{print $2; exit}' .aris/installed-skills-codex.txt 2>/dev/null)}"
WIKI_SCRIPT=""
[ -n "$ARIS_REPO" ] && [ -f "$ARIS_REPO/tools/research_wiki.py" ] && WIKI_SCRIPT="$ARIS_REPO/tools/research_wiki.py"
[ -z "$WIKI_SCRIPT" ] && [ -f tools/research_wiki.py ] && WIKI_SCRIPT="tools/research_wiki.py"
[ -z "$WIKI_SCRIPT" ] && [ -f ~/.codex/skills/research-wiki/research_wiki.py ] && WIKI_SCRIPT="$HOME/.codex/skills/research-wiki/research_wiki.py"

If research-wiki/query_pack.md exists and is less than 7 days old, read it as initial landscape context:

treat listed gaps as priority search seeds
treat failed ideas as a banlist
treat top papers as known prior work
still run Phase 1 for papers from the last 3-6 months because the wiki may be stale

If research-wiki/ exists but query_pack.md is stale or missing, rebuild it only when WIKI_SCRIPT is available. If the helper is unavailable, continue without rebuilding and report that wiki refresh was skipped.

Phase 1: Landscape Survey (5-10 min)

Map the research area to understand what exists and where the gaps are.

Scan local paper library first: Check papers/ and literature/ in the project directory for existing PDFs. Read first 3 pages of relevant papers to build a baseline understanding before searching online. This avoids re-discovering what the user already knows.
Search recent literature using WebSearch:
- Top venues in the last 2 years (NeurIPS, ICML, ICLR, ACL, EMNLP, etc.)
- Recent arXiv preprints (last 6 months)
- Use 5+ different query formulations
- Read abstracts and introductions of the top 10-15 papers
Build a landscape map:
- Group papers by sub-direction / approach
- Identify what has been tried and what hasn't
- Note recurring limitations mentioned in "Future Work" sections
- Flag any open problems explicitly stated by multiple papers
Identify structural gaps:
- Methods that work in domain A but haven't been tried in domain B
- Contradictory findings between papers (opportunity for resolution)
- Assumptions that everyone makes but nobody has tested
- Scaling regimes that haven't been explored
- Diagnostic questions that nobody has asked

Phase 2: Idea Generation (brainstorm with external LLM)

Use a secondary Codex agent for divergent thinking:

spawn_agent:
  model: REVIEWER_MODEL
  reasoning_effort: xhigh
  message: |
    You are a senior ML researcher brainstorming research ideas.

    Research direction: [user's direction]

    Here is the current landscape:
    [paste landscape map from Phase 1]

    Key gaps identified:
    [paste gaps from Phase 1]

    Generate 8-12 concrete research ideas. For each idea:
    1. One-sentence summary
    2. Core hypothesis (what you expect to find and why)
    3. Minimum viable experiment (what's the cheapest way to test this?)
    4. Expected contribution type: empirical finding / new method / theoretical result / diagnostic
    5. Risk level: LOW (likely works) / MEDIUM (50-50) / HIGH (speculative)
    6. Estimated effort: days / weeks / months

    Prioritize ideas that are:
    - Testable with moderate compute (8x RTX 3090 or less)
    - Likely to produce a clear positive OR negative result (both are publishable)
    - Not "apply X to Y" unless the application reveals genuinely surprising insights
    - Differentiated from the 10-15 papers above

    Be creative but grounded. A great idea is one where the answer matters regardless of which way it goes.

Save the agent id for follow-up.

Save a Review Tracing record for this spawn_agent call following ../shared-references/review-tracing.md, including the landscape summary, prompt summary, raw idea list path, reviewer route, and saved agent id.

Phase 3: Mechanical consolidation + objective feasibility gate

This phase does NOT judge idea quality, novelty, or impact — those are the job of the Phase-4 cross-model reviewer (a different model family). Dropping ideas here on a same-family novelty or impact call would pre-filter the reviewer's input with same-family judgment — the opposite of why ARIS uses a cross-model reviewer at all. Phase 3 only (a) clusters near-duplicate ideas and (b) drops ideas that are OBJECTIVELY out of budget; everything else passes through ANNOTATED, not eliminated.

Objective feasibility gate (safe to gate here): drop an idea ONLY on a mechanical, budget-based fact — estimated compute > 1 week of available GPU time, OR a dataset that is provably unavailable. Do NOT drop on "implementation looks complex" — annotate complexity instead.
Novelty signal — ANNOTATE, do not eliminate: do 2-3 targeted searches and attach a prior_work note (what looks related, with links). This is input for the Phase-4 reviewer, not a filter; full /novelty-check runs in Phase 4. Do NOT drop an idea here because it "might already be done."
Impact signal — ANNOTATE, do not eliminate: attach a one-line so_what note (why the result would matter either way). Do NOT drop on a same-family "a reviewer wouldn't care" call — that is exactly what the Phase-4 cross-model reviewer is for.

Every feasible, non-duplicate idea — with its prior_work and so_what annotations — proceeds to Phase 4, where the cross-model reviewer does the quality/novelty narrowing.

Phase 4: Deep Validation (for top ideas)

For each surviving idea, run a deeper evaluation:

Novelty check: Use the /novelty-check workflow (multi-source search + GPT-5.4 cross-verification) for each idea

Critical review: Use GPT-5.4 via send_input (same agent):

send_input:
  target: [saved reviewer id from the earlier idea review]
  message: |
    Here are our top ideas after filtering:
    [paste surviving ideas with novelty check results]

    For each, play devil's advocate:
    - What's the strongest objection a reviewer would raise?
    - What's the most likely failure mode?
    - How would you rank these for a top venue submission?
    - Which 2-3 would you actually work on?

Combine rankings: Merge your assessment with GPT-5.4's ranking. Select top 2-3 ideas for pilot experiments.

Phase 5: Parallel Pilot Experiments (for top 2-3 ideas)

Before committing to a full

idea-creator

Como adicionar

Cole no README do seu repo

Skills relacionadas

dev-browser

agent-browser

understand-chat

understand-dashboard

Receba novas skills de Pesquisa e Web toda segunda