agent-research-aggregator
Should I run? (decision gate)
Before starting Phase 1, check whether aggregation is actually needed:
| Situation | Action |
|---|---|
workspace/inputs/idea.md and workspace/inputs/experimental_log.md both exist and are non-empty | Skip this skill entirely. Proceed directly to paper-orchestra. |
| Either file is missing or empty, and the user provided a directory path | Run this skill with that directory as --search-roots. |
| Either file is missing or empty, and no directory was provided | Scan cwd and ~ by default; show the discovery summary to the user before continuing. |
| The inputs exist but look thin (e.g. idea.md has < 5 lines, no numeric data in experimental_log.md) | Ask the user whether to supplement with aggregation or proceed as-is. |
The skill is intentionally a pre-pass — it is cheap to skip and should only run when the structured inputs don't already exist.
A pre-processing skill for PaperOrchestra (arXiv:2604.05018). Reads scattered
experimentation artifacts from AI coding-agent cache directories and synthesizes
them into the structured (I, E) input pair the PaperOrchestra pipeline expects.
[.claude/] [.cursor/] [.antigravity/] [.openclaw/]
│ │ │ │
└────────────┴──────────────┴───────────────┘
│
Phase 1: Discovery
(discover_logs.py)
│
discovered_logs.json
│
Phase 2: Extraction
(LLM call per log batch)
│
raw_experiments.json
│
Phase 3: Synthesis
(LLM call — consolidate)
│
synthesis.json
│
Phase 4: Formatting
(format_po_inputs.py)
│
┌────────────┴────────────┐
workspace/inputs/ workspace/ara/
idea.md aggregation_report.md
experimental_log.md discovered_logs.json
raw_experiments.json
synthesis.json
The output drops directly into workspace/inputs/ so the user can immediately
run paper-orchestra on the same workspace.
Inputs
| Parameter | Required | Default | Description |
|---|---|---|---|
--search-roots | no | cwd, ~ | Comma-separated directories to scan for agent caches |
--agents | no | all | Comma-separated subset: claude,cursor,antigravity,openclaw |
--workspace | no | ./workspace | PaperOrchestra workspace root |
--depth | no | 4 | Max directory scan depth (prevents runaway scans on large home dirs) |
--since | no | none | Only include logs modified after this date (ISO 8601: 2025-01-01) |
The user specifies these when invoking the skill, or you may ask them for
--search-roots if the current directory has no detectable agent caches.
Phase 1 — Discovery (deterministic)
Run the discovery script to catalog every relevant log file:
python skills/agent-research-aggregator/scripts/discover_logs.py \
--search-roots <roots> \
--agents <agents> \
--depth <depth> \
--since <since> \
--out workspace/ara/discovered_logs.json
The script exits with code 2 when no --project filter is set (this is
expected on the first run). It prints a "Projects found" list to stdout —
show it to the user immediately.
If no logs are found at all: stop and ask the user to specify
--search-roots or point you at a directory that contains agent cache folders.
Phase 1.5 — Project Selection (mandatory)
A paper can only be written from a single project. You must ask the user which project to use before any LLM processing begins.
- Display the numbered project list from the discovery summary, e.g.:
Projects found: [1] /home/alice/projects/my-rl-experiment (42 files) [2] /home/alice/projects/llm-eval-suite (17 files) [3] /home/alice/projects/old-demo (3 files) - Ask: "Which project should this paper be based on? Please choose a number or paste the project path."
- Do not proceed to Phase 2 until the user has answered.
- Re-run discovery with the chosen project to filter the manifest:
python skills/agent-research-aggregator/scripts/discover_logs.py \
--search-roots <roots> \
--agents <agents> \
--depth <depth> \
--since <since> \
--project "<chosen project path>" \
--out workspace/ara/discovered_logs.json
This overwrites discovered_logs.json so only the selected project's files
remain. The script exits 0 on success.
If the discovery finds only one project: skip the question and inform the
user: "Only one project found: <path>. Using it for the paper." — then
re-run with --project automatically.
If the discovery summary shows irrelevant files after filtering: ask the user whether to include or exclude them before continuing to Phase 2. Err on the side of inclusion — the extraction prompt is conservative.
Phase 2 — Extraction (LLM-assisted)
Process discovered logs in batches (group by agent type; keep batches under ~50 KB of raw text to stay within context limits):
For each batch:
- Read the log files in the batch (the script's
--listoutput tells you which file paths to read). - Apply the extraction prompt from
references/extraction-prompt.mdas your system message. - Pass the raw log text as the user message.
- Collect the structured JSON the LLM returns (see schema in the prompt).
- Append to
workspace/ara/raw_experiments.json.
After all batches:
python skills/agent-research-aggregator/scripts/extract_experiments.py \
--discovered workspace/ara/discovered_logs.json \
--out workspace/ara/raw_experiments.json \
--validate-only
Run this in --validate-only mode to check the combined JSON is well-formed
and meets the minimum schema (experiments array non-empty, each entry has
hypothesis or method or results). Fix any malformed entries before Phase 3.
Phase 3 — Synthesis (LLM-assisted)
Consolidate possibly-redundant experiment records from multiple agent caches into a single coherent research narrative. This is ONE LLM call.
System message: Use references/synthesis-prompt.md verbatim.
User message:
<raw_experiments>
{contents of workspace/ara/raw_experiments.json}
</raw_experiments>
The LLM must return a synthesis.json with keys:
research_question— the overarching question being investigatedhypothesis— the core proposed solution / claimmethod_summary— how the approach works (concise, no data leakage)key_contributions— 2–5 bullet stringsexperimental_setup— datasets, metrics, baselines, implementation notesresults_tables— array of{title, headers[], rows[]}markdown-table objectsqualitative_observations— free-form text blocks (what worked, what didn't, failure modes, ablation insights)iteration_history— ordered list of{iteration_id, change_description, outcome}entries if multiple iterations are detectedopen_questions— questions that remain unanswered in the logs
Save to workspace/ara/synthesis.json.
Note: By this point, the user has already selected a single project in Phase 1.5. The synthesis should represent one coherent research thread. If the LLM still surfaces multiple disconnected research questions, flag this as a data quality warning in the audit report (Phase 5) but do not re-ask for project selection — that decision was made earlier.
Phase 4 — Formatting (deterministic)
Convert synthesis.json into PaperOrchestra input files:
pyt