Counterexample Explainer
Overview
Analyze counterexamples that violate specifications and produce clear, structured explanations showing step-by-step how and why the violation occurs, with root cause analysis and impact assessment.
Workflow
1. Understand the Specification
Identify what property is being checked.
Questions to ask:
- What is the specification or requirement?
- Is it formal (invariant, temporal logic) or informal (requirement doc)?
- What should happen vs what actually happened?
- Is this from a test failure, model checker, or runtime error?
See specification-types.md for comprehensive specification catalog.
Common specification types:
Formal specifications:
- Invariants:
balance >= 0 - Temporal logic:
G(request → F grant) - Pre/postconditions:
@requires(x > 0),@ensures(result >= 0) - State machines: Valid state transitions
- Concurrency: Atomicity, deadlock freedom
Informal requirements:
- User stories with acceptance criteria
- Functional requirements
- API contracts
- Expected behavior descriptions
Test specifications:
- Assertions:
assert result == expected - Property-based tests
- Integration test expectations
2. Collect Counterexample Information
Gather all relevant data about the violation.
From test failures:
# Run test to get failure details
pytest test_file.py::test_name -v
# Get stack trace
pytest test_file.py::test_name -v --tb=long
# Get variable values at failure
pytest test_file.py::test_name -v -l
Information to extract:
- Input values that triggered failure
- Expected output/behavior
- Actual output/behavior
- Intermediate state changes
- Stack trace or execution path
- Error messages
From runtime violations:
# Assertion failure
AssertionError: balance must be non-negative
File "account.py", line 45
assert self.balance >= 0
# Extract:
# - Variable: self.balance
# - Expected: >= 0
# - Actual: (check value)
From model checkers:
Counterexample trace:
State 0: request=false, grant=false
State 1: request=true, grant=false
State 2: request=true, grant=false
...
State 50: request=true, grant=false (VIOLATION)
Property violated: G(request → F grant)
3. Identify Violation Point
Pinpoint exactly where and when specification is broken.
For invariants:
- Find the operation that breaks the invariant
- Identify the state transition that causes violation
- Note variable values before and after
For temporal properties:
- Identify the state where property fails
- Trace back to find root cause
- Show path through states
For assertions:
- Locate the assertion that fails
- Check values of variables in assertion
- Find operation that produced wrong value
Example analysis:
# Specification
assert balance >= 0 # Invariant
# Counterexample
initial_balance = 100
withdraw(150)
# balance is now -50
# Violation point: withdraw operation
# Before: balance = 100 (satisfies invariant)
# After: balance = -50 (violates invariant)
4. Generate Step-by-Step Trace
Show execution path leading to violation.
See explanation-patterns.md for detailed patterns.
Trace structure:
## Counterexample Trace
**Specification:** [What property should hold]
### Initial State
- [variable1]: [value]
- [variable2]: [value]
- Status: ✅ Satisfies specification
### Step 1: [Operation/Event]
**Action:** [What happened]
**State Changes:**
- [variable]: [old_value] → [new_value]
**Status:** ✅ Still satisfies specification
**Why this matters:** [Explanation of significance]
### Step 2: [Operation/Event]
**Action:** [What happened]
**State Changes:**
- [variable]: [old_value] → [new_value]
**Status:** ❌ VIOLATES specification
**Violation Details:**
- Expected: [what should be true]
- Actual: [what is actually true]
- Violated property: [specific clause/condition]
**Why it violates:** [Clear explanation]
### Final State
- [variable1]: [value]
- [variable2]: [value]
- Status: ❌ Specification violated
Example:
## Balance Invariant Violation
**Specification:** Account balance must remain non-negative (`balance >= 0`)
### Initial State
- account.balance: 100
- Status: ✅ Satisfies invariant (100 >= 0)
### Step 1: User initiates withdrawal
**Action:** `withdraw(150)` called
**Validation Check:**
- Amount to withdraw: 150
- Current balance: 100
- Sufficient funds? NO (150 > 100)
**Expected behavior:** Reject withdrawal, raise InsufficientFundsError
**Actual behavior:** Withdrawal proceeds (bug: no validation)
### Step 2: System processes withdrawal
**Action:** Balance updated
**State Changes:**
- account.balance: 100 → -50
**Status:** ❌ VIOLATES INVARIANT
**Violation Details:**
- Expected: balance >= 0
- Actual: balance = -50
- Violated property: balance >= 0 (non-negativity)
**Why it violates:**
-50 is NOT >= 0. The balance has gone negative, which violates the core
invariant that account balances must never be negative.
### Final State
- account.balance: -50
- Status: ❌ Overdraft occurred
### Root Cause
Missing validation in `withdraw` method:
```python
def withdraw(self, amount):
# BUG: No check if amount > balance
self.balance -= amount # This can go negative!
# Should be:
def withdraw(self, amount):
if amount > self.balance:
raise InsufficientFundsError(f"Cannot withdraw {amount}, balance is {self.balance}")
self.balance -= amount
### 5. Compare Expected vs Actual
Show side-by-side what should happen vs what happened.
```markdown
## Expected vs Actual Behavior
| Aspect | Expected (Specification) | Actual (Counterexample) | Match? |
|--------|-------------------------|-------------------------|--------|
| [Property 1] | [Expected value] | [Actual value] | ✅/❌ |
| [Property 2] | [Expected value] | [Actual value] | ✅/❌ |
| [Property 3] | [Expected value] | [Actual value] | ✅/❌ |
**Key Differences:**
- [Property X]: Expected [value] but got [value]
**Why this matters:**
[Explanation of impact]
6. Identify Root Cause
Find the underlying bug or design flaw.
Common root causes:
Missing validation:
# Root cause: No input validation
def divide(a, b):
return a / b # Crashes when b=0
# Fix: Add validation
def divide(a, b):
if b == 0:
raise ValueError("Cannot divide by zero")
return a / b
Wrong algorithm:
# Root cause: Using > instead of <
def sort_ascending(arr):
for i in range(len(arr)):
for j in range(i+1, len(arr)):
if arr[i] > arr[j]: # BUG: should be <
arr[i], arr[j] = arr[j], arr[i]
Race conditions:
# Root cause: Non-atomic update
def increment_counter():
count = get_count() # Read
count += 1 # Modify
set_count(count) # Write
# Another thread can interleave here
Off-by-one errors:
# Root cause: Loop doesn't process last element
for i in range(len(arr) - 1): # BUG: should be len(arr)
process(arr[i])
# Last element never processed
7. Assess Impact
Explain consequences of the violation.
## Impact Analysis
**What Went Wrong:**
[Brief description of violation]
**Immediate Consequences:**
- [Direct effect 1]
- [Direct effect 2]
**Downstream Effects:**
1. [Cascading effect 1]
2. [Cascading effect 2]
**Severity:** [Critical/High/Medium/Low]
**Real-World Impact:**
[What this means for users/system]
**Example Scenario:**
[Concrete example of harm]
8. Present Complete Explanation
Combine all analysis into clear explanation.
Explanation structure:
# Counterexample Explanation: [Title]
## Summary
[One-sentence description of violation]
## Specification
**What should happen:**
[Specification statement]
**In plain English:**
[Non-technical explanation]
## Counterexample
**What actually happened:**
[Description of v