Cortivex Agent Replay
You have access to an execution replay system that records every decision, tool call, and output an agent makes during a pipeline run, then lets you replay, diff, and analyze those traces. When a pipeline produces unexpected results, replay is how you find out why -- step through the agent's reasoning, compare two runs side-by-side, or re-execute the same trace with different models or inputs to isolate the cause.
Overview
Agent replay operates on execution traces. A trace is a complete, ordered record of everything an agent did during a pipeline node execution: the input it received, every tool call it made (with arguments and responses), every intermediate reasoning step, every decision point, and the final output it produced. Traces are stored as structured JSON in .cortivex/traces/ and can be replayed, diffed, or analyzed at any time after the original run.
Replay is not re-running the pipeline from scratch. Replay re-executes the agent's decision logic against the recorded inputs and tool responses, optionally substituting different models, prompts, or configurations to see how the output changes. This makes it fast (no actual file I/O or test execution) and deterministic (same inputs always available).
When to Use
- A pipeline produced an incorrect or suboptimal result and you need to understand which agent decision led to it
- You want to compare how two different models handle the same task (swap claude-sonnet for claude-haiku and diff the outputs)
- A pipeline that previously worked has started producing worse results and you need to identify when the regression began
- You need to debug a specific node failure without re-running the entire pipeline
- You want to optimize agent prompts by replaying the same trace with modified system instructions and comparing outputs
- To feed execution data into the cortivex-learn system for pattern detection and insight generation
When NOT to Use
- For live monitoring of running pipelines -- use
/cortivex statusinstead - As a substitute for unit tests -- replay validates agent behavior, not code correctness
- For traces older than 30 days unless explicitly archived -- traces are automatically pruned by default
- When the original trace was recorded against a codebase that has since changed substantially -- tool responses will no longer match reality
How It Works
Trace Structure
A trace captures the full execution timeline of a single agent within a single pipeline node. It contains metadata (run_id, node_id, model, timestamp, duration, cost), the input data from upstream nodes, an ordered array of steps, the final output, and aggregate metrics.
Each step has a type: reasoning (chain of thought), tool_call (tool name, arguments, response), decision (chosen action and alternatives considered), or output (final structured result). Steps include timestamps, duration, and token counts.
Recording
Recording is automatic when enabled. Every pipeline run with trace: true in its config captures traces for all nodes. Traces are written to .cortivex/traces/{run_id}/{node_id}.trace.json as each node completes.
Recording adds minimal overhead (2-5% duration increase, no extra API calls) because it captures data the agent is already producing.
Replay Modes
Full Replay re-executes the agent's decision logic from step 0 through the final output, using the recorded tool responses. The agent receives the same input and sees the same tool results, but makes fresh decisions. This reveals whether the agent's behavior is deterministic or whether it makes different choices on the same inputs.
Selective Replay re-executes only specific steps or step ranges. Use this to focus on a particular decision point without replaying the entire trace. You can start replay from any step and the system will inject the recorded state up to that point.
Modified Replay re-executes the trace with substitutions: a different model, different system prompt, different temperature, or different input data. The tool responses remain the same (from the recording), but the agent's reasoning and decisions may differ. This is the primary mechanism for A/B testing agent configurations.
Pipeline Configuration
Recording Traces in a Pipeline
name: pr-review-traced
version: "1.0"
description: PR review with full execution tracing
trace: true # enable tracing for all nodes
trace_config:
storage_path: .cortivex/traces/
retention_days: 30
capture_reasoning: true # include chain-of-thought
capture_tool_responses: true # include full tool output
max_trace_size_mb: 50 # cap per-trace file size
nodes:
- id: security_scan
type: SecurityScanner
config:
scan_depth: deep
- id: code_review
type: CodeReviewer
depends_on: [security_scan]
config:
review_scope: changed_files
- id: auto_fix
type: AutoFixer
depends_on: [code_review]
config:
fix_categories: [style, bugs]
Replay and Diff Pipeline
name: replay-comparison
version: "1.0"
description: Replay a trace with a different model and compare results
nodes:
- id: replay_original
type: ReplayAgent
config:
trace_id: "ctx-a1b2c3"
node_id: "code_review"
mode: full
- id: replay_modified
type: ReplayAgent
config:
trace_id: "ctx-a1b2c3"
node_id: "code_review"
mode: modified
overrides:
model: claude-haiku-4-20250414
temperature: 0.2
- id: diff_results
type: ReplayAgent
depends_on: [replay_original, replay_modified]
config:
action: diff
left: replay_original
right: replay_modified
diff_format: structured
Integration with Learning System
name: replay-to-learn
version: "1.0"
description: Analyze replay data and feed insights to cortivex-learn
nodes:
- id: analyze_traces
type: ReplayAgent
config:
action: analyze
trace_ids: ["ctx-a1b2c3", "ctx-d4e5f6", "ctx-g7h8i9"]
analysis_type: failure-patterns
- id: record_insights
type: CustomAgent
depends_on: [analyze_traces]
config:
system_prompt: |
Take the trace analysis results and record actionable insights
using cortivex_insights. Focus on patterns that appear across
multiple traces: common failure points, model performance
differences, and configuration optimizations.
MCP Tool Reference
Record a Trace
Recording is typically automatic via pipeline config, but can be started manually:
cortivex_replay({
action: "record",
run_id: "ctx-a1b2c3",
node_id: "code_review",
config: {
capture_reasoning: true,
capture_tool_responses: true,
max_steps: 500
}
})
Response:
{
"trace_id": "trace-7f3a",
"status": "recording",
"run_id": "ctx-a1b2c3",
"node_id": "code_review",
"started_at": "2025-01-15T09:30:00Z",
"storage_path": ".cortivex/traces/ctx-a1b2c3/code_review.trace.json"
}
Replay a Trace
cortivex_replay({
action: "replay",
trace_id: "trace-7f3a",
mode: "full",
overrides: {
model: "claude-haiku-4-20250414",
temperature: 0.3,
system_prompt_append: "\nFocus only on security-related issues."
}
})
Response:
{
"replay_id": "replay-2c9d",
"source_trace": "trace-7f3a",
"overrides_applied": {
"model": "claude-sonnet-4-20250514 -> claude-haiku-4-20250414",
"temperature": "0.5 -> 0.3",
"system_prompt": "appended 1 instruction"
},
"result": {
"output_changed": true,
"steps_total": 23,
"steps_diverged_at": 8,
"original_issues_found": 7,
"replay_issues_found": 4,
"matching_issues": 4,
"missing_issues": 3,
"cost": { "original": "$0.018", "replay": "$0.003" },
"duration": { "original_ms": 48000, "replay_ms": 12000 }
}
}
Diff Two Runs
cor