Logic & Consistency Audit — v6
Adversarial cross-artifact auditor. Given any set of artifacts, find where they contradict each other, where references don't resolve, where the causal order breaks, where inference chains snap, and where hidden assumptions are violated.
Domain-agnostic: code + tests + docs, spec + implementation, contract + emails, prompt + output + ground truth, dataset + analysis + chart, design + ADR + PR.
Six phases. Phases 0–3 scan and map. Phase 4 attacks. Phase 5 verifies the report itself before publishing. Don't skip phases — that's where the real bugs hide.
Finding schemas
Every finding uses one of two schemas. Choose based on what the finding is.
Contradiction (two artifacts say opposite things):
[ID] contradiction [artifact:location]
"[quoted A]" [v: tool → result]
"[quoted B]" [v: tool → result]
Why: [why impossible or always wrong]
Evidence: Direct/Inferred/Circumstantial. Confidence: High/Medium/Low.
Depends-on: [IDs]. Fix: [if computable]
Absence (missing handler, unestablished precondition, counterfactual gap):
[ID] absence [artifact:location]
Claim: "[quoted text that implies something should exist]" [v: tool → result]
Absent: [what is missing — described precisely, not quoted]
Impact: [what silently fails, breaks, or goes unverified]
Evidence: Inferred/Circumstantial. Confidence: Medium/Low.
Depends-on: [IDs]. Fix: [if computable]
The [v: tool → result] tag is required on every quoted value in a
Direct-evidence finding. It is the structural enforcement of the verification
gate — without it, the finding cannot be Direct. See §Verification gate.
Evidence grading
State evidence type on every finding. Ceilings are hard — no exceptions.
Direct — explicit in the artifact text. No inference required.
→ Ceiling: High. Requires [v:] tag on every quoted value.
Inferred — follows necessarily from what artifacts say, but requires a
reasoning step. State the step.
→ Ceiling: Medium. [v:] tag recommended but not required.
Circumstantial — suspicious pattern, but innocent explanation exists.
State the innocent explanation.
→ Ceiling: Low. No [v:] required.
Absence findings are always Inferred or Circumstantial — never Direct. Phase 3 and Phase 4 findings are always Inferred or Circumstantial.
Borderline cases
These are the hard calls. Classification depends on whether a reasoning step is required.
Direct vs Inferred — synonyms and near-equivalents:
spec.md says "admin role required". Code checks user.role === 'administrator'.
→ NOT Direct. "admin" and "administrator" may be the same, but that equivalence
requires a reasoning step. Classify as Inferred. State the step: "'admin' interpreted
as 'administrator' — these may be distinct values in this system."
Direct vs Inferred — arithmetic:
Invoice says "Total: €150". Line items: €80 + €60 = €140.
→ Direct. Arithmetic is computation, not inference. Run it with a tool. The
contradiction is explicit once computed. Classify as Direct with [v: exec → sum=140].
Inferred vs Circumstantial — missing handler: Function raises ValueError. Caller has no try/except. → Inferred if you can confirm the call path with a tool (grep shows direct call). → Circumstantial if a higher-level handler might exist outside the artifact set. State which: "grep confirms direct call at app.py:42; no higher-level handler visible in scope → Inferred."
Verification gate
[v: tool → result] is a structured inline tag that proves a quoted value
was verified by a tool call, not recalled from memory.
Format: [v: read:42 → confirmed] / [v: grep "returns null" → 1 match auth.py:87]
/ [v: count → 7] / [v: exec → sum=142.50]
Rules:
- Every Direct-evidence finding must have
[v:]on every quoted value. - Without a
[v:]tag, the finding is Inferred at best — downgrade it. - The tag must reference a real tool call made during this audit session.
- In Phase 5 self-audit: any
[v:]tag that doesn't match the actual tool result drops the finding entirely.
This makes the verification gate structural, not advisory. You cannot file a Direct finding without showing the receipt.
Check priority by artifact type
Run Phase 2 checks in this order. High-yield first — don't burn context on low-yield checks before the ones most likely to find real bugs.
| Primary artifact types | Priority order |
|---|---|
| Code + tests | 2.5 → 2.1 → 2.4 → 2.3 → 2.7 → 2.8 → rest |
| Spec + implementation | 2.1 → 2.5 → 2.6 → 2.3 → 2.7 → 2.2 → rest |
| Docs + code | 2.1 → 2.2 → 2.7 → 2.5 → 2.9 → rest |
| Data + analysis + chart | 2.9 → 2.3 → 2.7 → 2.4 → 2.10 → rest |
| Contracts + correspondence | 2.4 → 2.7 → 2.1 → 2.2 → 2.6 → rest |
| Prompt + output | 2.9 → 2.5 → 2.3 → 2.7 → rest |
| Mixed / unknown | 2.7 → 2.1 → 2.5 → 2.4 → 2.9 → rest |
Per-check scope & stop rules
Each Phase 2 check has a scope bound and stopping rule. Exceeding the bound wastes context. Stopping early means recording the check as partial.
| Check | Scope | Stop when |
|---|---|---|
| 2.1 Reference resolution | All explicit cross-artifact pointers | All resolved or all failures filed |
| 2.2 Identity & equivalence | Every named entity appearing in ≥2 artifacts | No unexplained drift found in full sweep |
| 2.3 Quantifier & set consistency | Every universal/existential claim ("all", "every", "no", "always", "never", "N items") | ≤20 instances: check all. >20: check first 10 + random 5 + last 5. Stop at 3 violations |
| 2.4 Causal & temporal | All dated/ordered events across artifacts | Full timeline built and checked |
| 2.5 I/O coherence | All producer/consumer pairs at artifact boundaries | All pairs checked or 3 violations found |
| 2.6 Completeness | All requirements/sections in the authoritative artifact | Full gap map produced |
| 2.7 Contradiction | All claims about values, states, behaviour that appear in ≥2 artifacts | Full sweep or 5 contradictions found |
| 2.8 Boundary & edge | All boundary values and unit declarations | Full sweep, flag first instance of each drift type |
| 2.9 Self-reference | All self-describing claims ("N sections", "complete", "all passed") | Full sweep — these are few and high-value |
| 2.10 Realism | Placeholder scan + statistical checks | Full scan for placeholders; stats only if ≥20 data points |
If context budget forces an early stop, record: 2.X partial — checked N/M [units], stopped: context budget.
Large artifact strategy (single file >500 lines): Do not read the entire file into context. Instead:
- Read the header/imports/exports section (first ~30 lines) to understand shape.
- For each check, use targeted grep/search rather than full reads.
- Read specific line ranges only when a grep match needs context (±10 lines).
- If a check genuinely requires full-file reads (e.g. completeness), record
it as
⚠ partialciting the file size and what was sampled.
Differential mode
Activated automatically when a previous audit report is in scope alongside the current artifacts.
Matching algorithm:
-
For each finding in the prior report, extract its fingerprint:
(normalized_artifact_path, normalized_quoted_content_or_claim)- Normalize path: strip leading
./, lowercase, collapse//. - Normalize content: strip whitespace, lowercase.
- Normalize path: strip leading
-
Search the current artifact set for each fingerprint:
- Exact match → PERSISTS (same artifact, same content).
- Content matches, path changed → SHIFTED (artifact renamed/moved). Re-verify the finding at the new location.
- Content no longer present at cited location → RE-VERIFY (artifact changed). Re-run the specific check against the new content. Outcome: RESOLVED (problem fixed) or PERSISTS (still present, possibly shifted in the same file) or SHIFTED (content moved to a different artifact).
- **No ma