REPL Scratchpad

A persistent Python REPL session that acts as a scratchpad for coding agents. Compose multi-step operations in code. Variables persist across turns. Only print() output enters context.

Based on the Recursive Language Models (RLM) approach by Zhang, Kraska, and Khattab — extended with cross-turn persistence via tmux.

The Problem

Coding agents waste context on raw data. Every tool call result — file contents, API responses, query results — lands in the conversation and stays there forever. By turn 30, the model has forgotten why it started.

The REPL scratchpad fixes this: the agent writes code that processes data inside the REPL and only print()s what matters. The raw data never enters the conversation.

When to Use

Multi-step data exploration (query -> filter -> analyze -> summarize)
Tasks requiring 3+ sequential tool calls that could be composed in one code block
Working with structured data (JSON, CSV, API responses, database results)
Accumulating state across turns (storing intermediate results for later use)
Any task where raw tool output would bloat context unnecessarily

When NOT to Use

Simple single-file reads or edits (use Read/Edit tools directly)
Git operations (use Bash directly)
Tasks with no intermediate data to process

Prerequisites

tmux installed and available in PATH
python3 installed and available in PATH

Setup

Start the scratchpad session by running:

bash <skill-path>/scripts/setup.sh

Where <skill-path> is the directory where you cloned this skill (e.g., ~/.claude/skills/repl-scratchpad).

This creates a tmux session named scratchpad with a persistent Python interpreter.

Core Workflow

Sending code to the scratchpad

Write the code to a temp file
Tell the scratchpad to execute it
Read only the output

# Step 1: Write code to temp file
cat > /tmp/scratchpad_cmd.py << 'PYEOF'
import json
data = json.loads(open("/path/to/file.json").read())
filtered = [x for x in data if x["status"] == "error"]
print(f"{len(filtered)} errors found")
for item in filtered[:5]:
    print(f"  - {item['name']}: {item['message']}")
PYEOF

# Step 2: Execute in persistent session
tmux send-keys -t scratchpad "_scratchpad_exec('/tmp/scratchpad_cmd.py')" Enter

# Step 3: Wait and read output file (each execution overwrites cleanly)
sleep 1
cat /tmp/_scratchpad_output.txt

The Print Contract

CRITICAL PRINCIPLE: Only print() output should enter the conversation context.

DO: print(f"{len(results)} items found") — summary enters context
DO: print(json.dumps(summary, indent=2)) — structured summary enters context
DO NOT: return raw query results, file contents, or API responses into context
DO NOT: use the Bash tool to cat large files — read them inside the scratchpad instead

The scratchpad processes everything. Context sees only what was explicitly printed.

Variable Persistence

Variables assigned in the scratchpad persist across turns:

# Turn 1
services = load_services()
print(f"Loaded {len(services)} services")

# Turn 2 — 'services' is still here
degraded = [s for s in services if s["error_rate"] > 0.05]
print(f"{len(degraded)} degraded")

# Turn 3 — both 'services' and 'degraded' are still here
for s in degraded:
    print(f"  {s['name']}: {s['error_rate']:.1%}")

Use this to build up working state incrementally. Store intermediate results in variables instead of dumping them into the conversation.

Composition Pattern

Instead of making individual tool calls:

BAD (3 turns, all output in context):
  Turn 1: Bash -> curl API -> full JSON response in context
  Turn 2: Bash -> jq filter -> filtered output in context
  Turn 3: Bash -> analyze -> analysis in context

Compose in one scratchpad execution:

# GOOD (1 turn, only summary in context):
import urllib.request, json
resp = json.loads(urllib.request.urlopen("http://api/services").read())
broken = [s for s in resp if s["error_rate"] > 0.05]
deps = {s["id"]: get_deps(s["id"]) for s in broken}
print(f"{len(broken)} services degraded:")
for s in broken:
    print(f"  {s['name']} -> {', '.join(deps[s['id']])}")

Advanced Patterns

Map-Reduce: Fan-Out Processing

Use Python's built-in concurrency to process many files/items in parallel inside the REPL. All processing stays in the scratchpad — only the summary enters context.

import concurrent.futures, os, glob

def analyze_file(path):
    with open(path) as f:
        lines = f.readlines()
    imports = [l for l in lines if l.startswith("import") or l.startswith("from")]
    classes = [l for l in lines if l.strip().startswith("class ")]
    return {"path": path, "lines": len(lines), "imports": len(imports), "classes": len(classes)}

files = glob.glob("src/**/*.py", recursive=True)
with concurrent.futures.ThreadPoolExecutor(max_workers=10) as pool:
    results = list(pool.map(analyze_file, files))

# 200 files processed, zero context used. Only summary prints:
print(f"{len(results)} files: {sum(r['lines'] for r in results):,} lines")
big = sorted(results, key=lambda r: -r['lines'])[:5]
for r in big:
    print(f"  {r['lines']:>5} lines  {r['path']}")

Recursive Drill-Down

Process data in a loop, drilling deeper on each iteration. The REPL accumulates state without re-querying.

# Turn 1: broad scan
import os, glob
all_files = glob.glob("**/*.ts", recursive=True)
by_dir = {}
for f in all_files:
    d = os.path.dirname(f)
    by_dir.setdefault(d, []).append(f)
print(f"{len(all_files)} files across {len(by_dir)} dirs")
for d in sorted(by_dir, key=lambda d: -len(by_dir[d]))[:5]:
    print(f"  {len(by_dir[d]):>3} files  {d}")

# Turn 2: drill into the largest directory (by_dir still in memory)
target = sorted(by_dir, key=lambda d: -len(by_dir[d]))[0]
details = []
for f in by_dir[target]:
    with open(f) as fh:
        content = fh.read()
    exports = [l for l in content.split('\n') if 'export' in l]
    details.append({"file": os.path.basename(f), "lines": content.count('\n'), "exports": len(exports)})
print(f"\n{target}/ deep dive:")
for d in sorted(details, key=lambda x: -x['lines']):
    print(f"  {d['lines']:>4} lines  {d['exports']:>2} exports  {d['file']}")

Spawning Subagents for Heavy Lifting

When the REPL hits limits (needs LLM reasoning, must read hundreds of files, or requires framework-specific tools), delegate to subagents. The REPL orchestrates the work and collects results.

Claude Code

Claude Code subagents are spawned via the Agent tool from the main conversation. The scratchpad prepares the work, the main agent fans out subagents, and results flow back.

Pattern: use the REPL to identify what needs processing, then ask the main agent to spawn subagents for each item.

# Step 1: REPL identifies the targets
import glob, os
files = glob.glob("src/**/*.ts", recursive=True)
large = [f for f in files if os.path.getsize(f) > 10000]
print("Files needing deep review:")
for f in large:
    print(f"  {f} ({os.path.getsize(f)//1000}KB)")

Then tell the main agent:

Spawn parallel subagents to review each of these files:
- src/agent/turn-loop.ts
- src/store/index.ts
- src/server/routes.ts
Each subagent should analyze imports, exports, and complexity.

Claude Code spawns subagents using the Agent tool with these key parameters:

subagent_type: built-in types (Explore, Plan, general-purpose) or custom agents
run_in_background: set true for parallel execution
model: haiku for fast/cheap tasks, sonnet/opus for complex analysis
isolation: "worktree" for subagents that modify files

Custom subagents are defined as markdown files in .claude/agents/ or ~/.claude/agents/. See Claude Code subagent docs for full reference.

OpenAI Codex

Codex supports multi-agent workflows (ex

repl-scratchpad

How to add

Drop this on your repo README

Related skills

claude-api

skill-creator

claude-mem

oh-my-issues

Get new Desenvolvimento skills every Monday