Codebase Onboarding
Systematic orientation. Stop guessing. Build the right mental model before touching anything — then keep it live as you work.
How this works: Claude runs the investigation — executes commands, reads files, traces paths — and writes CODEBASE.md as a living orientation document. The human provides the repository and answers questions that can't be found in the code. Think of it as pair programming where Claude does the archaeology and you provide context that only humans have.
When to Use
| Situation | Mode |
|---|---|
| Joining a new team or repo for the first time | join |
| Returning to your own code after 3+ months away | return |
| Evaluating an OSS project before contributing | audit |
| Need "what do I avoid" in 15 minutes — no time for full investigation | quick |
| Leaving this codebase — write the handoff document you wish existed | sunset |
| About to modify a specific file mid-ramp | touch |
| About to push a PR — catch issues before review | preflight |
| Assigned a ticket or feature — map it to the codebase | task |
Default to join if unclear. quick is a triage tool — not a substitute
for full orientation. touch, preflight, and task are ongoing modes —
they require an existing CODEBASE.md from a prior session.
Intake: Ask First
Before running any orientation phase (join / return / audit), ask two questions. The answers reshape every phase that follows.
Question 1: Technical profile
Ask:
"Are you a developer who can read code and run terminal commands, or are you non-technical — a PM, designer, analyst, or executive who needs to understand the system without diving into the code itself?"
Then explain the difference:
If you're technical: I'll run shell commands, read source files, trace execution paths, and map git history. Output includes code snippets, file paths, and conventions — things you can act on directly. You'll also get a local dev guide and PR pre-flight support.
If you're non-technical: I'll run all the same investigation but translate everything into plain language. No code in the output. You'll get a visual architecture diagram, priority-ranked questions for your next engineering meeting, and an executive brief you can share with stakeholders.
Question 2: Goal
Wait for the answer to Question 1, then tailor the examples:
If technical:
- Make a contribution or fix a specific bug
- Take ownership — become the go-to maintainer
- Review for quality, security, or architecture concerns
- Evaluate an OSS project before contributing
- Get up to speed after being away for months
If non-technical:
- Understand what the system does and how it fits together
- Assess risk before a launch, acquisition, or vendor decision
- Identify what's slowing the team down
- Have a more informed conversation with engineers
- Prepare for a roadmap, sprint planning, or board conversation
Profile + Goal → what changes:
| Profile + Goal | What changes |
|---|---|
| Technical + contribute | Full workflow: Phases 0–7, local dev guide, Phase 8 |
| Technical + own/maintain | Full depth; extra attention to Danger Zones and authorship |
| Technical + review | Phases 0–6; security/quality lens; skip Phase 8 |
| Technical + evaluate OSS | audit mode — contributor signal, merge rate, PR velocity |
| Non-technical + understand | Phases 0–6; plain language; diagram; executive brief |
| Non-technical + decide | Phases 0–6 + recommendation section in executive brief |
| Non-technical + evaluate | audit mode; go/no-go framing in executive brief |
Large codebases (>100k LOC): After Phase 0, ask: "Which subsystem or area is most relevant to your goal?" Scope Phases 1–4 to that area. Investigating a 500k-line Rails monolith end-to-end produces noise, not orientation.
Phase Order by Mode
| Phase | join | return | audit | quick | sunset |
|---|---|---|---|---|---|
| 0 — Bootstrap | ✓ first | ✓ first | ✓ first | ✓ first | ✓ first |
| 1 — Critical Paths | ✓ | ✓ | ✓ | skip | skip |
| 2 — Conventions | ✓ | ✓ after Phase 9 | ✓ | skip | skip |
| 3 — Danger Zones | ✓ | ✓ after Phase 9 | ✓ | ✓ | ✓ |
| 4 — Gotcha Detector | ✓ | ✓ | ✓ | ✓ | ✓ |
| 5 — Local Dev Guide | technical only | technical only | skip | skip | skip |
| 6 — Team Questions | technical: 1:1 format | technical: 1:1 format | technical: 1:1 format | skip | skip |
| non-technical: meeting format | non-technical: meeting format | non-technical: meeting format | |||
| 7 — Executive Brief | non-technical only | non-technical only | non-technical only | skip | skip |
| 8 — First Contribution | technical only | technical only | skip | skip | skip |
| 8b — Ramp-up Timeline | technical only | technical only | skip | skip | skip |
| 9 — Archaeology | skip | ✓ before Phase 2 | skip | skip | skip |
| 10 — Contributor Signal | skip | skip | ✓ | skip | skip |
| 11 — Sunset | skip | skip | skip | skip | ✓ |
In return mode: run Phase 9 (Archaeology) immediately after Phase 1. In quick mode: no CODEBASE.md written — output is a single briefing. In sunset mode: produces a Handoff Document, not a CODEBASE.md update.
Output: CODEBASE.md
CODEBASE.md
├── What This Is # one-paragraph system description
├── Architecture Map # Mermaid diagram + component description
├── Critical Paths # entry points → processing → exit
├── External Integrations # third-party APIs, queues, webhooks — what needs mocking locally
├── Local Dev Guide # technical only: step-by-step to get it running
├── Conventions # implicit rules the README doesn't mention
├── Danger Zones # what not to touch first, and why
├── Gotchas # what silently burns new contributors
├── Team Questions # technical: 1:1 format | non-technical: meeting format
├── Executive Brief # non-technical only: one-page health summary
├── Ramp-up Timeline # technical only: week-by-week gates derived from findings
├── Open Questions # still unclear — actively maintained
└── Contribution Log # join/return: changes + learnings
# audit: merge rate, PR velocity, go/no-go
Confidence calibration
Every section carries a confidence tag:
| Tag | Meaning |
|---|---|
| ✅ Verified | Based on CI config, git history, or explicit documentation |
| ⚠️ Inferred | Based on patterns — likely but not confirmed |
| ❓ Gap | Couldn't assess from code — needs human confirmation |
Gap sections automatically feed into Team Questions. If you wrote ❓, there must be a corresponding question.
Update CODEBASE.md at the end of each phase. Do not defer.
Phase 0: Bootstrap
1. README.md / README.rst → what does it claim to do?
2. CLAUDE.md / AGENTS.md → what has an AI already learned here?
3. CONTRIBUTING.md → what does the team care about?
4. package.json / go.mod /
pyproject.toml / Cargo.toml → language, deps, run scripts
5. Makefile / justfile → available commands
6. .github/workflows/ → what CI runs — the ground truth
CI is the most honest documentation. If it conflicts with the README, CI wins.
AI-Generated Codebase Detection
Run these signals before the rest of Phase 0. AI-generated codebases have different failure modes — surface quality looks fine, but error handling is thin, edge cases aren't covered, and tests pass because they only test the happy path.
# Thin or compressed history
git log --format="%ad" --date=short | wc -l # total commits
git log --format="%ae" | sort -u | wc -l # distinct authors
git log --format="%ad" --date=short | tail -1 # first commit date
# Generic commit message signatures
git log --format="%s" | grep -ciE \
"^(add|fix|update|initial commit|feat|implement|create|ref