Performance Reviewer
Reviews code for performance issues before they show up in production metrics. Measurement-grounded — every finding references a profiler output, an EXPLAIN plan, or a metric from observability-architect. Guesses don't go in the report. Findings table + severity guide + tooling reference in RECIPES.md.
1. When to invoke
- PR touches a hot path (request handler, DB query, loop over large data, queue consumer).
- Endpoint shows up in latency p99 / error-rate dashboards.
- Before a planned load test.
- After a metric regression — dashboard burn, SLO degradation.
- New code with
for+await/for+ DB call patterns (the N+1 detector).
2. Output format
Same shape as other reviewers — findings table + summary. Sample row layout and severity rubric in RECIPES § 1–2.
Every finding must include a measurement — EXPLAIN ANALYZE, a pprof flame, a histogram bucket, a hyperfine result. "Looks slow" is not a finding.
3. Review approach
- Measure first. Without a measurement, there's nothing to review. Run
EXPLAIN ANALYZEon the suspect query, attach the profiler to the running process, or check the dashboard for the endpoint's p99. - Read the diff with measurement in hand. The measurement narrows where to look.
- Propose fixes that produce another measurement. A fix without "and the new latency is X" is incomplete.
4. What to check — by category
Database — N+1 and unindexed scans
The single most common perf bug. Per sql-architect §5:
- N+1 detection. A list endpoint that fires one query for parents, then one per parent for children. Look for
for x in xs: y = repo.get(x.id). Fix: batch withIN (?, ?, ?), a single JOIN, or a DataLoader-style batcher. - In tests: log every SQL statement; a handler emitting > 3 queries per request is suspect.
- Missing indexes. Per sql-architect §3: index every FK; index columns used in
WHERE; composite column order is most-selective-first. EXPLAIN ANALYZEshows the truth. Look forSeq Scanon large tables,Sortoperations spilling to disk,actual rows>>plan rows(stale stats — runANALYZE).- Pagination by OFFSET is O(N). Switch to cursor per sql-architect §4.
- Connection pooling — any handler opening a connection per request is a bug. Pool in the lifespan.
Async — blocking I/O in event loop
The Python async equivalent of N+1.
time.sleepinsideasync def— blocks the event loop. Useawait asyncio.sleep(d).requests.get(...)insideasync def— blocking HTTP. Usehttpx.AsyncClient.- Sync DB driver (
psycopg2) insideasync def— blocking. Usepsycopg 3async. - Detection:
asyncio.get_event_loop().slow_callback_duration = 0.1in dev logs callbacks over 100 ms. asyncio.to_thread(blocking_fn, args)when you genuinely must call a sync API.asyncio.TaskGroup, not bareasyncio.gather— handles cancellation and exception aggregation properly per python-architect §4.
Go — goroutine and allocation issues
- Unbounded goroutine spawn. A loop firing
go func()per item without backpressure. Use a worker pool orerrgroupwithSetLimit(n). sync.WaitGroup.Go(fn)per go-architect §7 — Go 1.25+, cleaner thanAdd(1) / defer Done().- Channels with wrong direction or buffer. Unbuffered + goroutine on each side = deadlock waiting to happen.
context.Background()in handlers — request context not propagated; cancellation never reaches downstream calls.- Allocation hot paths.
pprofheap profile; ifruntime.makeslice/runtime.newobjecttopsalloc_objects, reuse buffers viasync.Pool. - Slice growth — appending without
make([]T, 0, capacity)reallocates. If you know the size, set it.
Algorithms and data structures
- O(N²) loops over growing collections. Nested
forwith linear lookups (if x in list) — convert the inner lookup to aset/map. sort.Slicein a hot path when items arrive sorted from upstream — trust upstream order.- Eager materialization of iterators.
list(big_generator)defeats the purpose. Stay lazy. - String concatenation in loops — quadratic in both languages with
+. Use"".join(parts)/strings.Builder. - Regex in hot paths — compile once at module load, never inside the function.
Memory
- Unbounded caches. A
dict/mapgrowing without eviction is a memory leak. LRU with max size, or TTL. - Long-lived references to short-lived data. Closures capture the parent frame; if the closure outlives the function, the frame stays alive.
- Large struct copies. A 2 KB struct passed by value through 10 layers — pointer once size justifies it (~64 bytes is a rough Go cutoff).
- Goroutine leaks — a goroutine waiting on a never-closed channel is a leak. Go 1.26's experimental
goroutineleakprofile catches them.
Network / external calls
- Missing timeouts — every outbound HTTP / DB / queue / RPC call has a context deadline. Unbounded waits cause cascading failures.
- No retry with backoff — flaky downstream gets hammered in a tight loop. Jittered exponential backoff is mandatory.
- HTTP keep-alive disabled — TLS handshake per request. Reuse the client.
- Single-flight — N concurrent requests for the same key dedupe via
singleflight(Go) / a request-scoped cache (Python). Otherwise a cache miss for a hot key floods downstream.
Sampling and observability cost
- High-cardinality labels in metrics — per observability-architect §2: metric explosion is a perf issue for Prometheus too.
- Logging in tight loops —
log.Infoinside a 1M-iteration loop dominates the runtime. Log once at the end with a summary. - Tracing every internal call — per-call spans for hot inner loops kill throughput. Trace request boundaries + major sub-ops, not every function.
5. Tooling
Tools that ground the findings — full reference in RECIPES § 3. Always reference the tool output in the Evidence column.
Quick picks: SQL → EXPLAIN (ANALYZE, BUFFERS) + pg_stat_statements. Go → pprof. Python → py-spy. Bench → hyperfine (CLI), go test -bench, pytest-benchmark. Load → wrk / k6. Symptom → Grafana RED+USE.
6. What this skill does NOT do
- Load testing. Generating load is operational work —
k6/wrkruns against staging, not in a code review. - Capacity planning. "How many instances at 2× traffic" is a different conversation.
- Premature optimization. A finding requires a measurement; gut-feel optimizations are rejected. If the code isn't measurably slow, don't change it.
7. Cross-skill ties
- sql-architect §3–§5, §9 — indexing, query patterns, N+1,
EXPLAIN ANALYZE. Most common source of findings. - observability-architect — RED/USE metrics reveal where to look. Findings often start with "this dashboard shows..."
- grafana-architect — burn rate / SLO dashboards trigger reviews.
- go-architect §7 / python-architect §4 — concurrency primitives.
- rest-api-architect §5 / [§14](../../protocols/rest-api-architect/SKILL.md#