Bug Localization
Precisely identify the location of bugs in source code by analyzing error messages, stack traces, failing tests, and code patterns. Provides ranked suspect locations with confidence scores and evidence.
Core Capabilities
1. Error Analysis
Extract information from error sources:
- Stack traces - Parse and analyze call stacks
- Error messages - Interpret exception details
- Failing tests - Analyze test failures and assertions
- Crash reports - Process crash dumps and core files
- Log messages - Trace execution through logs
- Debugger output - Interpret breakpoint and watch data
2. Code Analysis
Examine code for bug indicators:
- Data flow - Trace variables and values
- Control flow - Analyze execution paths
- Type mismatches - Detect type-related issues
- Null/undefined access - Find potential null dereferences
- Boundary violations - Detect array/buffer overflows
- Concurrency issues - Identify race conditions
3. Suspect Ranking
Prioritize likely bug locations:
- Confidence scores - Rank suspects by likelihood
- Evidence strength - Quantify supporting evidence
- Historical data - Consider past bug patterns
- Code complexity - Factor in cyclomatic complexity
- Recent changes - Weigh recent modifications
- Code churn - Consider frequently modified code
4. Investigation Guidance
Provide actionable next steps:
- Verification steps - How to confirm the bug
- Debugging strategies - Where to set breakpoints
- Test cases - Tests to reproduce the bug
- Related code - Other potentially affected areas
Bug Localization Workflow
Step 1: Gather Evidence
Collect all available information:
From stack trace:
Traceback (most recent call last):
File "app.py", line 45, in process_order
total = calculate_total(items)
File "billing.py", line 23, in calculate_total
price = item['price'] * item['quantity']
KeyError: 'price'
Extract:
- Error type:
KeyError - Missing key:
'price' - Exception location:
billing.py:23 - Call chain:
app.py:45→billing.py:23 - Function:
calculate_total - Context: Processing items dict
From failing test:
FAILED tests/test_auth.py::test_login_with_valid_credentials
AssertionError: assert False is True
Expected: User logged in successfully
Actual: Login failed with invalid credentials
Extract:
- Test file:
tests/test_auth.py - Test function:
test_login_with_valid_credentials - Failure type: Assertion mismatch
- Expected behavior: Successful login
- Actual behavior: Failed login
Step 2: Analyze Error Context
Understand what caused the error:
For KeyError example:
- Direct cause: Accessing non-existent 'price' key
- Root causes (hypotheses):
- Item dict missing 'price' field
- Key name mismatch ('price' vs 'Price')
- Item is None or wrong type
- Data corruption in item dict
For login test failure:
- Direct cause: Login returned False
- Root causes (hypotheses):
- Credential validation logic incorrect
- Database query failing
- Password hashing mismatch
- Session creation failure
Step 3: Locate Suspect Code
Identify likely buggy locations:
Primary suspects (KeyError):
1. billing.py:23 (95% confidence)
- Direct location of exception
- Line: price = item['price'] * item['quantity']
- Issue: No validation before dict access
2. app.py:40-45 (70% confidence)
- Calls calculate_total with items
- Possible: Items data structure incorrect
- Need to verify items content
3. Data source (50% confidence)
- Where items are created/loaded
- Possible: Missing field in data
- Check database schema or API response
Code locations to examine:
# billing.py:20-25 (Primary suspect)
def calculate_total(items):
total = 0
for item in items:
price = item['price'] * item['quantity'] # Line 23 - BUG HERE
total += price
return total
# app.py:40-45 (Secondary suspect)
def process_order(order_id):
order = get_order(order_id)
items = order.get('items', [])
total = calculate_total(items) # Line 45
return total
Step 4: Rank Suspects
Assign confidence scores:
Ranking factors:
- Stack trace depth - Closer to exception = higher confidence
- Error message - Directly mentioned code = higher
- Code complexity - More complex = more likely
- Recent changes - Recently modified = higher
- Test coverage - Low coverage = higher risk
Example ranking:
Rank 1: billing.py:23 in calculate_total() - 95%
Evidence:
- Direct exception location
- No null/existence check before dict access
- Simple fix: Add key validation
Rank 2: app.py:45 in process_order() - 70%
Evidence:
- Calls buggy function
- items might be malformed
- Check: order.get('items') might return bad data
Rank 3: models.py:78 in get_order() - 50%
Evidence:
- Data source for items
- Possible missing fields in database
- Check: Database schema and migrations
Rank 4: api.py:112 in create_order() - 30%
Evidence:
- Creates order data
- Might not include all required fields
- Check: API contract validation
Step 5: Provide Investigation Plan
Guide debugging efforts:
Immediate actions:
- Add validation in
billing.py:23 - Add logging before line 23 to inspect
item - Check what
itemscontains atapp.py:45
Verification steps:
- Add print:
print(f"Item: {item}")before line 23 - Run failing test again
- Check if 'price' exists in item dict
Long-term fixes:
- Add schema validation for items
- Use type hints and static analysis
- Add integration test for full order flow
Localization Patterns
Pattern 1: Stack Trace Analysis
Error:
Traceback (most recent call last):
File "main.py", line 100, in run
result = processor.execute()
File "processor.py", line 45, in execute
data = self.transform(input_data)
File "processor.py", line 78, in transform
return data.upper()
AttributeError: 'NoneType' object has no attribute 'upper'
Analysis:
Stack trace (bottom to top):
1. processor.py:78 - data.upper() fails (EXCEPTION POINT)
2. processor.py:45 - self.transform(input_data) called
3. main.py:100 - processor.execute() triggered
Primary suspect: processor.py:78
- Directly referenced in error
- Calls .upper() on None
- Confidence: 95%
Secondary suspect: processor.py:45
- Passes input_data to transform
- input_data might be None
- Confidence: 80%
Tertiary suspect: main.py:100
- Initial call site
- Processor initialization might be wrong
- Confidence: 40%
Localization:
# processor.py:75-80 (PRIMARY SUSPECT - 95%)
def transform(self, data):
if data is None: # MISSING CHECK
return ""
return data.upper() # Line 78 - Bug location
# processor.py:43-46 (SECONDARY - 80%)
def execute(self):
input_data = self.get_input() # Might return None
data = self.transform(input_data) # Line 45
return self.process(data)
# main.py:98-101 (TERTIARY - 40%)
def run(self):
processor = Processor(config)
result = processor.execute() # Line 100
return result
Evidence:
- Direct:
AttributeErroron line 78 - Contextual:
dataisNone - Root cause: No null check before
.upper()
Pattern 2: Assertion Failure Localization
Failing test:
def test_calculate_discount():
# Given
price = 100.0
discount_percent = 20.0
# When
result = calculate_discount(price, discount_percent)
# Then
assert result == 80.0 # FAILS: result is 120.0
Analysis:
Test expectation: 80.0
Actual result: 120.0
Difference: +40 (opposite direction)
Hypothesis:
- Expected: price - (price * discount / 100) = 100 - 20 = 80
- Actual: price + (price * discount / 100) = 100 + 20 = 120
- Likely bug: Using + instead of -
**Localizati