Repo Estimator
This skill produces a thorough, defensible estimate of the human time and cost required to build a codebase from scratch. It's designed for founders evaluating acquisitions, engineers scoping migrations, freelancers pricing projects, or anyone who wants to understand the real effort baked into a repository.
The output should feel like something a senior engineering consultant would hand over — not a naive line-count calculation, but a judgment-call-rich analysis that accounts for complexity, architecture, rework cycles, and team dynamics.
What You're Estimating
You're answering: "If a competent team started from zero today, how long would it take and what would it cost to build what's in this repo?"
This is subtly different from:
- "How long did it take the original team?" (You don't know their pace, mistakes, or false starts)
- "How many lines of code are there?" (LOC is a famously poor proxy for effort)
- "What's the maintenance cost?" (That's a different question — don't conflate it unless asked)
Assume the hypothetical team is competent but unfamiliar with the domain. They need to design, build, test, and document — not just type.
Step 1: Understand the Request
Before diving in, understand what the user actually needs:
- Target: Is this a local path, a GitHub URL, or a zip? Handle accordingly.
- Purpose: Are they buying/selling, scoping a rewrite, hiring, or just curious? This shapes how you frame the output.
- Team assumption: Should you estimate for a solo developer, a small startup team (2–4), or a mid-size engineering org? Ask if unclear — it affects hours significantly.
- Rate card: Do they want costs in USD? A specific region or seniority mix? Default to US market rates if unspecified (see references/rate-cards.md).
If the user provides a GitHub URL, use shell tools to clone it to a temp directory. If they provide a local path, work from there directly.
Step 2: Repository Reconnaissance
Run both analysis scripts to gather hard data before making any judgment calls:
# Source code analysis
python scripts/analyze_repo.py <repo_path>
# Log, validation, and artifact analysis
python scripts/scan_logs_and_validation.py <repo_path>
The first script covers source code composition and complexity signals. The second surfaces evidence of effort that lives outside the code itself: test output, compliance documents, migration histories, CI/CD configs, changelogs, ADRs, and more. Both are needed for a complete picture.
Also do a manual walkthrough of the repo structure. The scripts catch what they can measure; you catch what requires judgment:
- Read the README, any architecture docs, and top-level config files
- Note the overall architecture pattern (monolith, microservices, monorepo, etc.)
- Identify the primary language(s) and frameworks
- Look for evidence of complexity: auth systems, payment integrations, real-time features, ML pipelines, complex state management, multi-tenancy, etc.
- Check test coverage and quality — well-tested code represents more total work than the implementation alone
- Note infrastructure-as-code, CI/CD pipelines, and DevOps configuration
- Identify any non-obvious work: data migrations, seed scripts, custom tooling, generated code
- Note whether this appears to be a product run by a human team (multiple contributors, release history, PM artifacts) vs. a solo side project
Step 3: Complexity Classification + Rebuild Difficulty Rating
This step produces two distinct outputs that serve different purposes. Do both before moving on.
3a. Component Tier Classification
Use the complexity taxonomy in references/complexity-guide.md to classify each major component of the codebase.
Every component falls into one of four tiers:
| Tier | Label | Description |
|---|---|---|
| 1 | Boilerplate | Standard scaffolding, CRUD, config files, generated code |
| 2 | Moderate | Custom business logic, non-trivial integrations, standard auth |
| 3 | Complex | Custom algorithms, real-time systems, complex state, multi-service orchestration |
| 4 | Specialized | ML/AI pipelines, custom protocols, novel architecture, research-grade work |
Be honest about tier assignment. The biggest estimation errors come from miscategorizing Tier 3 work as Tier 2. When in doubt, round up.
3b. Rebuild Difficulty Rating
This is not the same as component complexity. Rebuild difficulty answers: How hard would it be for a competent team to reconstruct the knowledge encoded in this repo from scratch?
LOC and tier classifications measure volume and technical sophistication. Rebuild difficulty measures knowledge density — the specialized understanding, institutional context, and hard-won production experience that can't be acquired by reading the code alone.
Read references/rebuild-difficulty.md for the full scoring model. The analyze_repo.py script outputs a rebuild_difficulty block in its JSON — use that as a starting point, but apply your own judgment based on what you observed in the manual walkthrough.
Score the repository across five dimensions:
| Dimension | What It Measures | Source |
|---|---|---|
| Domain Knowledge | Specialized domains (fintech, healthcare, crypto, compilers, etc.) | Script + manual review |
| Infrastructure Coupling | Depth of infra-as-code, k8s, Terraform, GitOps | Script + key files |
| Data Model Complexity | Tables, migrations, schema evolution depth | Script counts + migrations dir |
| Integration Surface Area | External API count, enterprise API weight | Script + package files |
| Operational Maturity | SLOs, runbooks, load tests, chaos engineering | Script + scan_logs output |
Compute the composite score and assign a rating:
| Score | Rating | Effort Multiplier |
|---|---|---|
| 0–2 | LOW | 1.0× |
| 3–4 | MODERATE | 1.1–1.2× |
| 5–7 | HIGH | 1.25–1.45× |
| 8–10 | VERY HIGH | 1.5–1.7× |
| 11–14 | EXTREME | 1.9–2.5× |
| 15+ | EXCEPTIONAL | 2.5–4.0× |
This multiplier is applied at the end of Step 9, after all other multipliers, as a final adjustment to total hours. It represents the knowledge ramp-up cost that component-level estimation systematically misses.
Always be specific about what drives the rating. "VERY HIGH (score: 9) — fintech payment domain (+2), Kubernetes+Terraform (+2), 67-table data model with 182 migrations (+3), 14 external integrations (+2)" is a defensible finding. "VERY HIGH" alone is not.
Step 4: Component Breakdown
Decompose the codebase into logical components. Good components are things a project manager would actually track — not individual files, not vague categories like "backend."
Examples of good component granularity:
- User authentication & authorization system
- Payment processing integration (Stripe, etc.)
- Admin dashboard UI
- REST API layer
- Real-time notification system
- Data pipeline / ETL jobs
- Infrastructure & deployment configuration
- Test suite
- Documentation
For each component, estimate:
- Tier: 1–4 (from above)
- Raw hours: Core implementation time for a competent solo developer
- Complexity multiplier: From
references/complexity-guide.md - Adjusted hours: Raw × multiplier
Don't round aggressively. "80 hours" feels more credible than "80–120 hours" for a component you can actually analyze.
Step 5: Apply Estimation Multipliers
Raw component hours are never the full story. Apply these multipliers to the total adjusted hours:
Rework & iteration factor: 1.3–1.6× Real development isn't linear. Design changes, bugs, PRs, re-architecting decisions. Use 1.3× for simple projects, 1.6× for complex or novel ones.
Testing & QA factor: depends on test coverage observed
- No tests: add 0% (but note it in caveats — the hours are "artificially low")
- Light tests: add 15%
- Thorough unit tests: add 25%
- Full test suite with integration/e2e: