Error and Correctness Traps

Overview

Common bugs grouped by domain: floats that won't compare equal, retries that hammer a downed service, singletons that wreck testability, and others. When you write code in one of these domains, stop and run the matching checks before you commit.

This is a rigid skill. Jump to the sub-section that matches what you're writing and run that sub-section's checks.

These checks matter most when code will reach real users in production. In MVPs, prototypes, internal dev tools, and one-off scripts where the architecture is still in flux, prefer the simplest thing that works.

When to invoke

Invoke when you're about to:

Add or change error-handling around a call that can fail
Compare, sum, or accumulate floating-point numbers
Write concurrent, parallel, or threaded code, or share mutable state between threads
Call a remote process, web service, database, or another machine
Introduce a singleton or any globally-shared mutable state
Choose a data structure or algorithm on a path that runs often or on large inputs
Add or change log statements that may fire at high volume
Review code that handles errors, floats, concurrency, remote calls, singletons, or hot-path data structures

Non-triggers — do NOT invoke for

Renaming a local variable inside one function
Adding a docstring to an existing function
Fixing a typo in a comment
Formatting-only changes handled by a formatter
Adjusting a config value in a config file with no logic change
Skimming code for context without producing findings or edits
An early-stage MVP or prototype where the architecture is still in flux
An internal dev tool, debugging endpoint, or one-off script
Throwaway code expected to be replaced before reaching users

If the change touches one of these domains even slightly, invoke anyway — the per-domain check is short and the bugs are not.

Checks by domain

Errors (97/21, 97/26, 97/29)

Distinguish business exceptions from technical ones. A technical exception means the system can't proceed — bad arguments, broken DB connection, programming error. Let it bubble to a top-level handler that puts the system in a safe state (rollback, log, alert, friendly user message); the caller can't fix it. A business exception is part of the contract — withdrawing from an empty account, booking an unavailable slot — and is an alternative return path the caller is expected to handle. Give them separate types or hierarchies; mixing them blurs the contract. (Bergh Johnsson, 97/21.)
Never write the empty catch. try { ... } catch (...) {} silently swallows everything. Same for ignoring return codes (printf's return value, write()'s short-write count) and pretending errno doesn't exist. Example: a service-call wrapper swallows every exception and returns null, so every downstream caller has to invent their own theory of what null means. Expose erroneous conditions in your interfaces; if handling errors feels onerous, the interface is wrong. (Goodliffe, 97/26.)
Don't rely on unexplained magic. If your change depends on behavior nobody can explain (build picks a DLL by load order, deployment reads an undocumented env var, a job runs because of a side effect in a config file), surface it in your summary to the user before shipping — don't bury the dependency. (Griffiths, 97/29.)

Numerics (97/33)

Never compare floats with ==. 0.1 + 0.2 != 0.3 in IEEE 754 — the canonical demonstration. Compare with a tolerance appropriate to the magnitude of the values involved (≈ ε|x|, where ε is machine epsilon — ~1e-7 for float, ~1e-16 for double).
Watch for catastrophic cancellation. Subtracting nearly-equal floats promotes roundoff to the most significant digits. Example: solving x² - 100000x + 1 = 0 directly via the quadratic formula gives a wildly wrong small root because -b + sqrt(b² - 4) cancels; compute one root and derive the other from r1 * r2 = c/a. Same shape of error appears in any series with alternating signs of similar magnitude.
Don't use float for money. Use a fixed-point or decimal type. Floats are for scientific calculation where you accept ε-level error; financial code does not accept it. (Allison, 97/33.)

Concurrency & IPC (97/41, 97/57)

Default to message passing over shared mutable state. When you reach for a lock around shared data, ask first whether the data could be owned by one process/actor that others message. CSP-style designs (Erlang, Go channels, actor frameworks in mainstream languages) sidestep most race / deadlock / livelock bugs by construction. Reserve shared-memory + locks for cases you have measured and understood. (Winder, 97/57.)
Count IPCs per user stimulus, not lines of code. Each remote call is non-trivial latency; sequential calls add. Example: ORM lazy-loading produces 1,000 sequential 10ms DB calls for one page render — minimum 10s response time before any rendering work. Ratios in the thousands appear routinely in slow apps. Apply parsimony (one round-trip carrying the right data), parallelism (overall latency = longest call, not sum), or caching. (Stafford, 97/41.)
Retry with backoff and a cap, never in a tight loop. Example: while (!call()) call(); against a downed service hammers it the moment it comes back. Exponential backoff, jitter, and a max-retries ceiling are the minimum; idempotency on the server side is what makes retry safe at all.

Limits & Performance (97/46, 97/89)

Know the complexity of the data structure you picked. Linked list vs. hash vs. balanced tree on a million items is the difference between snappy and unusable. Pick by access pattern (lookup-heavy → hash; ordered iteration → tree; tiny + cache-friendly → array), not by what's familiar. (van Winkel, 97/89.)
Don't recompute invariants inside loops. Example: for (i = 0; i < strlen(s); ++i) — strlen runs every iteration, scanning the whole string each time, turning O(n) work into O(n²). Hoist the length out. The same shape applies to repeated DB lookups, repeated config parses, and repeated regex compilations inside hot loops. (van Winkel, 97/89.)
Respect the cache hierarchy when it dominates. Register and L1 are nanoseconds; RAM is ~20ns; disk is ~10ms; network is ~20–100ms — orders of magnitude apart. A "worse" big-O algorithm with a predictable access pattern can beat a "better" one that thrashes cache. When perf matters, measure rather than reason from complexity alone. (Colvin, 97/46.)

Globals & Singletons (97/73)

Resist the singleton. Most singletons encode a single-instance assumption that turns out to be premature, broadcast across the design as hidden coupling. They wreck unit-test independence (you can't substitute a mock), introduce subtle multi-threading bugs (naive locking slow, double-checked locking famously broken in several languages), and have no defined cleanup order at shutdown. Example: a Logger.getInstance() called from every layer means tests can't intercept output, can't run in parallel, and inherit log state from previous tests.
If you genuinely need one instance, hide it behind an interface. Restrict the global access to a few well-defined construction sites; everywhere else, accept the dependency through a parameter typed by interface. Callers don't know whether a singleton or a fresh object satisfies the interface — and tests can substitute either. (Saariste, 97/73.)

Production resilience (`RI/*`)

When the call will run under load against a downstream that can fail, the per-call hardening is the first write. These checks matter most in production code.

Set an explicit timeout on every remote call. Library defaults are wrong (None, "infinity", "many minutes"). Pick a per-call budget based on the downstream's realistic latency plus margin, and cap retries inside that budget. *(

correctness-traps

Cómo agregar

Pega en el README de tu repo

Skills relacionadas

xlsx

mem-search

weekly-digests

how-it-works

Recibe nuevas skills de Dados e Análise todos los lunes

Error and Correctness Traps

Overview

When to invoke

Non-triggers — do NOT invoke for

Checks by domain

Errors (97/21, 97/26, 97/29)

Numerics (97/33)

Concurrency & IPC (97/41, 97/57)

Limits & Performance (97/46, 97/89)

Globals & Singletons (97/73)

Production resilience (`RI/*`)

Comentarios · Sin comentarios

Cómo agregar

Pega en el README de tu repo

Skills relacionadas

xlsx

mem-search

weekly-digests

how-it-works

Recibe nuevas skills de Dados e Análise todos los lunes

Error and Correctness Traps

Overview

When to invoke

Non-triggers — do NOT invoke for

Checks by domain

Errors (97/21, 97/26, 97/29)

Numerics (97/33)

Concurrency & IPC (97/41, 97/57)

Limits & Performance (97/46, 97/89)

Globals & Singletons (97/73)

Production resilience (RI/*)

Comentarios · Sin comentarios

Production resilience (`RI/*`)