Vulnerability Analysis
Every vulnerability you miss is one an attacker can find. Systematic analysis traces untrusted data from source to sink, evaluates filters for bypass, and questions every trust boundary assumption.
When to Activate
- Reviewing any code for security vulnerabilities
- Auditing authentication, authorization, or session logic
- Evaluating input handling and output encoding
- Assessing cryptographic implementations
- Reviewing file operations, command execution, or deserialization
- Checking for race conditions in concurrent code
- Analyzing dependency security and supply chain risks
Core Methodology
Taint Analysis: Mark untrusted data at origin (source), track propagation to dangerous operations (sink). Vulnerability exists when tainted data reaches sink without adequate sanitization.
Source-Forward: Start from data entry points, trace every path to sinks. Comprehensive but time-consuming.
Sink-Backward: Start from dangerous operations (eval, exec, SQL, innerHTML), trace backward to sources. Faster and targeted.
Hybrid Approach: Sink-backward for rapid high-risk identification, then source-forward for complete coverage.
Rule Categories by Priority
| Priority | Category | Impact |
|---|---|---|
| 1 | Taint Analysis | CRITICAL |
| 2 | Memory Safety | CRITICAL |
| 3 | Injection Attacks | CRITICAL |
| 4 | Authentication & Authorization | HIGH |
| 5 | Cryptographic Vulnerabilities | HIGH |
| 6 | Concurrency & Race Conditions | HIGH |
| 7 | Web & API Security | MEDIUM-HIGH |
| 8 | Supply Chain & Dependencies | MEDIUM |
Audit Protocol
- Reconnaissance: Identify language, frameworks, trust boundaries, sensitive data, high-value targets. Establish threat model.
- Attack Surface Enumeration: Map all entry points — HTTP endpoints, CLI args, file inputs, IPC, deserialization points.
- Systematic Analysis: Apply hybrid taint analysis across all source-sink paths.
- False Positive Reduction:
- Trace validation chains upstream — is the value bounded before reaching sink?
- Confirm reachability — can attacker actually trigger this path?
- Evaluate against threat model — does exploitation require capabilities attacker doesn't have?
- Check for established patterns — recognized safe idioms in the domain?
- Exploitability Gate: Before reporting ANY finding:
- Are you certain this isn't expected functionality?
- Is this a valid vulnerability worth reporting?
- Is it actually exploitable in production?
- Findings: Document CWE, severity, root cause, exploitation scenario, remediation.
- Variant Hunting: Generalize each finding into a pattern and search for variants.
Vulnerability Classes
Injection Attacks
- SQL Injection: string concat in queries, ORM raw methods, second-order injection
- Command Injection: user input in system(), exec(), backticks, $()
- XSS: reflected, stored, DOM-based, template injection
- SSTI: user input in template engines (Jinja2, Twig, Freemarker)
- XXE: XML parsing with external entities enabled
- SSRF: user-controlled URLs in server-side requests
- Deserialization: untrusted data in pickle, Java ObjectInputStream, PHP unserialize
- Path Traversal: ../../../etc/passwd in file operations
- ReDoS: catastrophic backtracking in regex with user input
Memory Safety
- Buffer overflow: unbounded copies, integer overflow in size calculations
- Use-after-free: dangling pointers, double-free
- Integer overflow: unchecked arithmetic in size/offset calculations
- Null pointer dereference: missing null checks on fallible operations
- Format string: user-controlled format specifiers
Authentication & Authorization
- Auth bypass: missing checks, JWT algorithm confusion, middleware ordering
- IDOR: direct object references without ownership verification
- Privilege escalation: role checks on client side only
- Session fixation: predictable tokens, missing regeneration
Cryptographic Issues
- Weak algorithms: MD5, SHA1 for security, DES, RC4
- ECB mode: pattern-preserving encryption
- Missing HMAC: encryption without authentication
- Hardcoded keys/IVs: secrets in source code
- Insufficient randomness: Math.random() for security tokens
Concurrency
- TOCTOU: check-then-act without atomicity
- Race conditions: shared state without proper locking
- Double-spend: financial operations without idempotency
Security Review Checklist
- All user inputs validated server-side with allowlists
- Database queries use parameterized statements exclusively
- Command execution avoids shell interpretation
- Output encoding matches rendering context (HTML, JS, CSS, URL)
- Authentication checks on every sensitive endpoint
- Authorization verifies ownership, not just authentication
- Cryptographic operations use modern algorithms with proper key management
- Session tokens have sufficient entropy with Secure, HttpOnly, SameSite
- File operations validate paths against traversal
- Deserialization never operates on untrusted data without safe loaders
- Race conditions mitigated with atomic operations or proper locking
- Dependencies pinned, audited, free of known CVEs
Advanced: Variant Analysis Methodology
Pattern Generalization
# When you find a vulnerability, generalize it into a pattern:
# 1. Identify the root cause (not the symptom)
# 2. Abstract the pattern: what makes this exploitable?
# 3. Search for the same pattern across the codebase
# 4. Check related codepaths (same developer, same module, same framework)
# Example: Found SQL injection in getUserById()
# Root cause: string concatenation in query builder
# Pattern: any function using raw() or format() with user input in DB layer
# Search: grep -rn "\.raw\(.*\+\|\.format\(" --include="*.py" src/
# Variants found: getOrdersByUser(), searchProducts(), adminLookup()
Taint Propagation Rules
# Define how taint flows through operations:
# Direct propagation (output is tainted if input is):
# - String concatenation: tainted + clean = tainted
# - String formatting: f"{tainted}" = tainted
# - Array indexing: arr[tainted] = tainted index (potential OOB)
# - Assignment: x = tainted → x is tainted
# Indirect propagation:
# - Length: len(tainted) = clean (integer, bounded)
# - Type conversion: int(tainted) = tainted (may throw, but value is bounded)
# - Hash: hash(tainted) = clean (one-way, fixed output)
# - Comparison result: tainted == x → clean boolean
# Sanitization (removes taint if correct):
# - Parameterized queries: cursor.execute("SELECT * WHERE id=%s", (tainted,))
# - HTML encoding: html.escape(tainted) → safe for HTML context
# - URL encoding: urllib.parse.quote(tainted) → safe for URL context
# - Input validation: if re.match(r'^[a-z0-9]+$', tainted) → bounded
# FALSE sanitization (does NOT remove taint):
# - Blacklist filtering: tainted.replace("'", "") → bypassable
# - Client-side validation: JavaScript checks → attacker skips
# - WAF rules: can often be bypassed with encoding
# - Type casting without range check: (int)tainted → overflow possible
Source-Sink Mapping by Language
# Python
sources:
- request.args, request.form, request.json, request.headers
- sys.argv, os.environ
- open().read(), socket.recv()
sinks:
sql: [cursor.execute(f"..."), engine.execute(text(...))]
command: [os.system(), subprocess.call(shell=True), os.popen()]
xss: [render_template_string(), Markup(), |safe filter]
ssrf: [requests.get(user_url), urllib.urlopen()]
deserialization: [pickle.loads(), yaml.load(), jsonpickle.decode()]
path: [open(user_path), send_file(user_path)]
# JavaScript/Node.js
sources:
- req.params, req.query, req.body, req.headers
- process.argv, process.env
sinks:
sql: [connection.query(`...${input}`), sequelize.literal()]
command: [child_process.exec(), child_process.spawn({shell:true})]
xss: [innerHTML, document.write(), dangerouslySe