Cortivex Agent Replay

You have access to an execution replay system that records every decision, tool call, and output an agent makes during a pipeline run, then lets you replay, diff, and analyze those traces. When a pipeline produces unexpected results, replay is how you find out why -- step through the agent's reasoning, compare two runs side-by-side, or re-execute the same trace with different models or inputs to isolate the cause.

Overview

Agent replay operates on execution traces. A trace is a complete, ordered record of everything an agent did during a pipeline node execution: the input it received, every tool call it made (with arguments and responses), every intermediate reasoning step, every decision point, and the final output it produced. Traces are stored as structured JSON in .cortivex/traces/ and can be replayed, diffed, or analyzed at any time after the original run.

Replay is not re-running the pipeline from scratch. Replay re-executes the agent's decision logic against the recorded inputs and tool responses, optionally substituting different models, prompts, or configurations to see how the output changes. This makes it fast (no actual file I/O or test execution) and deterministic (same inputs always available).

When to Use

A pipeline produced an incorrect or suboptimal result and you need to understand which agent decision led to it
You want to compare how two different models handle the same task (swap claude-sonnet for claude-haiku and diff the outputs)
A pipeline that previously worked has started producing worse results and you need to identify when the regression began
You need to debug a specific node failure without re-running the entire pipeline
You want to optimize agent prompts by replaying the same trace with modified system instructions and comparing outputs
To feed execution data into the cortivex-learn system for pattern detection and insight generation

When NOT to Use

For live monitoring of running pipelines -- use /cortivex status instead
As a substitute for unit tests -- replay validates agent behavior, not code correctness
For traces older than 30 days unless explicitly archived -- traces are automatically pruned by default
When the original trace was recorded against a codebase that has since changed substantially -- tool responses will no longer match reality

How It Works

Trace Structure

A trace captures the full execution timeline of a single agent within a single pipeline node. It contains metadata (run_id, node_id, model, timestamp, duration, cost), the input data from upstream nodes, an ordered array of steps, the final output, and aggregate metrics.

Each step has a type: reasoning (chain of thought), tool_call (tool name, arguments, response), decision (chosen action and alternatives considered), or output (final structured result). Steps include timestamps, duration, and token counts.

Recording

Recording is automatic when enabled. Every pipeline run with trace: true in its config captures traces for all nodes. Traces are written to .cortivex/traces/{run_id}/{node_id}.trace.json as each node completes.

Recording adds minimal overhead (2-5% duration increase, no extra API calls) because it captures data the agent is already producing.

Replay Modes

Full Replay re-executes the agent's decision logic from step 0 through the final output, using the recorded tool responses. The agent receives the same input and sees the same tool results, but makes fresh decisions. This reveals whether the agent's behavior is deterministic or whether it makes different choices on the same inputs.

Selective Replay re-executes only specific steps or step ranges. Use this to focus on a particular decision point without replaying the entire trace. You can start replay from any step and the system will inject the recorded state up to that point.

Modified Replay re-executes the trace with substitutions: a different model, different system prompt, different temperature, or different input data. The tool responses remain the same (from the recording), but the agent's reasoning and decisions may differ. This is the primary mechanism for A/B testing agent configurations.

Pipeline Configuration

Recording Traces in a Pipeline

name: pr-review-traced
version: "1.0"
description: PR review with full execution tracing
trace: true                               # enable tracing for all nodes
trace_config:
  storage_path: .cortivex/traces/
  retention_days: 30
  capture_reasoning: true                  # include chain-of-thought
  capture_tool_responses: true             # include full tool output
  max_trace_size_mb: 50                    # cap per-trace file size
nodes:
  - id: security_scan
    type: SecurityScanner
    config:
      scan_depth: deep

  - id: code_review
    type: CodeReviewer
    depends_on: [security_scan]
    config:
      review_scope: changed_files

  - id: auto_fix
    type: AutoFixer
    depends_on: [code_review]
    config:
      fix_categories: [style, bugs]

Replay and Diff Pipeline

name: replay-comparison
version: "1.0"
description: Replay a trace with a different model and compare results
nodes:
  - id: replay_original
    type: ReplayAgent
    config:
      trace_id: "ctx-a1b2c3"
      node_id: "code_review"
      mode: full

  - id: replay_modified
    type: ReplayAgent
    config:
      trace_id: "ctx-a1b2c3"
      node_id: "code_review"
      mode: modified
      overrides:
        model: claude-haiku-4-20250414
        temperature: 0.2

  - id: diff_results
    type: ReplayAgent
    depends_on: [replay_original, replay_modified]
    config:
      action: diff
      left: replay_original
      right: replay_modified
      diff_format: structured

Integration with Learning System

name: replay-to-learn
version: "1.0"
description: Analyze replay data and feed insights to cortivex-learn
nodes:
  - id: analyze_traces
    type: ReplayAgent
    config:
      action: analyze
      trace_ids: ["ctx-a1b2c3", "ctx-d4e5f6", "ctx-g7h8i9"]
      analysis_type: failure-patterns

  - id: record_insights
    type: CustomAgent
    depends_on: [analyze_traces]
    config:
      system_prompt: |
        Take the trace analysis results and record actionable insights
        using cortivex_insights. Focus on patterns that appear across
        multiple traces: common failure points, model performance
        differences, and configuration optimizations.

MCP Tool Reference

Record a Trace

Recording is typically automatic via pipeline config, but can be started manually:

cortivex_replay({
  action: "record",
  run_id: "ctx-a1b2c3",
  node_id: "code_review",
  config: {
    capture_reasoning: true,
    capture_tool_responses: true,
    max_steps: 500
  }
})

Response:

{
  "trace_id": "trace-7f3a",
  "status": "recording",
  "run_id": "ctx-a1b2c3",
  "node_id": "code_review",
  "started_at": "2025-01-15T09:30:00Z",
  "storage_path": ".cortivex/traces/ctx-a1b2c3/code_review.trace.json"
}

Replay a Trace

cortivex_replay({
  action: "replay",
  trace_id: "trace-7f3a",
  mode: "full",
  overrides: {
    model: "claude-haiku-4-20250414",
    temperature: 0.3,
    system_prompt_append: "\nFocus only on security-related issues."
  }
})

Response:

{
  "replay_id": "replay-2c9d",
  "source_trace": "trace-7f3a",
  "overrides_applied": {
    "model": "claude-sonnet-4-20250514 -> claude-haiku-4-20250414",
    "temperature": "0.5 -> 0.3",
    "system_prompt": "appended 1 instruction"
  },
  "result": {
    "output_changed": true,
    "steps_total": 23,
    "steps_diverged_at": 8,
    "original_issues_found": 7,
    "replay_issues_found": 4,
    "matching_issues": 4,
    "missing_issues": 3,
    "cost": { "original": "$0.018", "replay": "$0.003" },
    "duration": { "original_ms": 48000, "replay_ms": 12000 }
  }
}

Diff Two Runs

cor

cortivex-agent-replay

How to add

Drop this on your repo README

Related skills

internal-comms

babysit

do

smart-explore

Get new DevOps e Infra skills every Monday