Reasoning Tracer
Anti-black-box engine that makes reasoning chains visible, auditable, and decomposable.
Addresses the cognitive failure mode of black-box reasoning -- Claude gives an answer but the user cannot see what assumptions were relied on, what alternatives were rejected, or which part of the reasoning is weakest.
Rules (Absolute)
- Never present a single-path narrative. Every trace must show at least one rejected alternative at a meaningful decision fork. "I considered X but chose Y because Z" is the minimum; two rejected alternatives is preferred.
- Confidence decomposition requires 3+ sub-components. Overall confidence is always broken into at least three independent dimensions, each with its own percentage and justification.
- Every assumption gets rated. Each assumption must have an explicit criticality rating (High/Medium/Low) and verifiability rating (Directly Verifiable / Indirectly Verifiable / Unverifiable). No unrated assumptions.
- Weakest Link is MANDATORY. Never skip it. This is the highest-value section -- it tells the user exactly where to focus their own verification effort.
- No confidence theater. Do not assign high confidence (>80%) without specific justification. Vague appeals to "experience" or "common knowledge" are banned. Every confidence level must cite a concrete basis.
- Distinguish evidence types. Separate empirical evidence (benchmarks, data, test results) from theoretical reasoning (design principles, heuristics) from authority (docs, expert consensus). Label which type supports each claim.
- Trace must be falsifiable. Every conclusion must include conditions under which it would be wrong. If you cannot state what would disprove your conclusion, the reasoning is insufficiently rigorous.
Mode Selection
Quick Mode (Default)
When invoked without --full, execute only:
- Stage 1: Claim Isolation — break into atomic claims
- Stage 2: Assumption Inventory — enumerate assumptions with criticality/verifiability
- Stage 5: Weakest Link & Alternative Conclusion — identify the single most fragile assumption
Skip Stages 3 (Decision Tree) and 4 (Confidence Decomposition).
Quick mode output format:
## Reasoning Trace: [Claim]
### Atomic Claims
1. [Claim 1]
2. [Claim 2]
### Assumption Inventory
| # | Assumption | Criticality | Verifiability |
|---|-----------|-------------|---------------|
| A1 | ... | High/Med/Low | Direct/Indirect/Unverifiable |
### Weakest Link
**Assumption [A#]:** [restate]
- **Why weakest:** [explanation]
- **If wrong:** [alternative conclusion]
- **How to verify:** [concrete steps]
Full Mode (--full)
When invoked with --full, execute all 5 stages as documented below.
Process
Execute these 5 stages sequentially. Do NOT skip stages.
Stage 1: Claim Isolation
Identify the exact claim(s) being traced. Separate compound questions into atomic claims.
Input: "Why did you recommend microservices over a monolith?"
Atomic claims:
1. Microservices are a better architectural fit for this project
2. The team can handle microservices operational complexity
3. The migration cost is justified by long-term benefits
Each atomic claim gets its own assumption inventory and confidence score.
Stage 2: Assumption Inventory
For each atomic claim, enumerate every assumption the reasoning depends on. Each assumption gets three attributes:
| # | Assumption | Criticality | Verifiability |
|---|---|---|---|
| A1 | [Statement] | High -- conclusion changes if wrong | Directly Verifiable -- can test/measure |
| A2 | [Statement] | Medium -- conclusion weakens if wrong | Indirectly Verifiable -- can infer from proxy data |
| A3 | [Statement] | Low -- conclusion survives if wrong | Unverifiable -- must be accepted or rejected on judgment |
Criticality scale:
- High: If this assumption is wrong, the conclusion flips or becomes unjustifiable.
- Medium: If wrong, the conclusion weakens significantly but may still hold with caveats.
- Low: If wrong, the conclusion is largely unaffected.
Verifiability scale:
- Directly Verifiable: Can be tested, measured, or confirmed from authoritative sources.
- Indirectly Verifiable: Can be inferred from related data, benchmarks, or analogies.
- Unverifiable: Requires judgment, prediction, or depends on future unknowns.
Stage 3: Decision Tree with Branch Justifications
At each significant fork in the reasoning, document:
- Decision point: What question needed answering?
- Options considered: At least 2 (the chosen path + minimum 1 rejected alternative).
- Evaluation criteria: What factors determined the choice?
- Chosen path: Which option was selected?
- Rejection rationale: Why each alternative was rejected -- with specifics, not hand-waving.
- Reversal condition: What would need to be true for the rejected alternative to become the better choice?
Decision Point: Database selection
├─ Option A: PostgreSQL [CHOSEN]
│ Strengths: ACID compliance, JSON support, ecosystem maturity
│ Evidence type: Empirical (benchmarks) + Authority (industry adoption data)
│
├─ Option B: SQLite [REJECTED]
│ Strengths: Zero-config, embedded, fast for reads
│ Rejection: Write concurrency limit (~5 writers) incompatible with
│ multi-instance deployment requirement (Assumption A2)
│ Reversal: If deployment is single-instance AND write volume < 100/sec,
│ SQLite becomes the simpler, better choice
│
└─ Option C: MongoDB [REJECTED]
Strengths: Schema flexibility, horizontal scaling
Rejection: Data has strong relational structure (7 FK relationships);
denormalization cost outweighs flexibility benefit
Reversal: If schema changes weekly or data is primarily document-shaped
Stage 4: Confidence Decomposition
Break overall confidence into independent sub-components. Minimum 3, recommended 4-6.
Each sub-component gets:
- A percentage (0-100%)
- The evidence type supporting it (Empirical / Theoretical / Authority / Mixed)
- A 1-2 sentence justification citing the specific basis
Overall Confidence: 72%
Sub-components:
Technical Feasibility: 90% [Empirical] -- proven in similar systems (refs: X, Y benchmarks)
Timeline Estimate: 45% [Theoretical] -- based on analogy to past project, but team composition differs
Cost Projection: 60% [Mixed] -- infrastructure costs are empirical, opportunity cost is estimated
Team Capability Match: 75% [Authority] -- based on stated team skills; not independently verified
Risk Assessment: 80% [Theoretical] -- standard failure modes well-understood; novel integration untested
Weighted Overall: (90*0.3 + 45*0.25 + 60*0.15 + 75*0.15 + 80*0.15) = 71.5% ≈ 72%
The overall confidence is NOT the average. Weight sub-components by their importance to the conclusion.
Stage 5: Weakest Link & Alternative Conclusion
Weakest Link Identification: Which single assumption or sub-conclusion, if wrong, would MOST change the final answer?
Criteria for selecting the weakest link:
- Highest criticality among assumptions
- Lowest verifiability (hardest to confirm)
- Lowest confidence among sub-components
- The intersection of these three factors is the weakest link
Alternative Conclusion: "If [weakest link assumption] is wrong, then the conclusion changes to [X]."
This is not hypothetical filler -- it must be a genuinely reasoned alternative conclusion that follows logically from negating the weakest assumption.
Output Format
## Reasoning Trace: [Claim/Question]
### Atomic Claims
1. [Claim 1]
2. [Claim 2]
3. [Claim N]
### Assumption Inventory
| # | Assumption | Criticality | Verifiability | Tied to Claim |
|---|-----------|-------------|---------------|---------------|
| A1 | [Statement] | Hi