dbt-agent-readiness (orchestrator)

Target project path: $0 (or current directory if not specified)

This file orchestrates the audit. Phase instructions live in phases/ files. Each subagent reads only its own phase file. The orchestrator passes structured JSON between phases.

Design principles

The audit is framed around what an agent actually hits (code evidence) vs what an audit can only forecast (missing tests, unenforced relationships). Findings that require querying the warehouse to confirm (duplicates, orphans, bad data) are not reported as facts.

Report split into Blockers vs Hygiene. Blockers require code evidence (scope divergence, copy-paste descriptions, broken refs, polymorphic columns, within-model collisions, unit mismatch, measure/agg mismatch, high-confidence phantom columns). Hygiene lists missing tests and unenforced relationships, each with a runnable verification query the user can run in ten minutes to confirm or dismiss.
Broken refs always Blocker #1. issues.broken_refs is the most evidence-backed possible finding: the query fails at compile time.
Phantom columns suppressed when evidence is weak. If a model uses dbt_utils.star, SELECT * that can't be resolved, or Jinja for-loops, and no compiled manifest is present, the phantom finding is not emitted. It goes to catalogs.phantom_columns_suppressed_no_manifest instead. Synthesis emits one aggregate "run dbt compile" notice rather than per-model provisional rows. All rows in phantom_columns_by_model carry confidence: 'high' and are evidence-backed.
Phantom columns traced through multi-hop CTEs and column lineage. _extract_columns_via_sqlglot resolves CTE output columns recursively (depth 10), so a YAML column that survives through base -> mid -> top is not flagged phantom. When the simple YAML-vs-SQL diff still flags something, cross_reference retries via sqlglot.lineage.lineage; resolved findings land in catalogs.phantom_columns_resolved_by_lineage rather than phantom_columns. Unit / currency drift candidates (col * 100, col / 100.0, etc.) are collected into catalogs.potential_unit_drift for synthesis to treat as Blocker candidates when the description doesn't call out the conversion.
catalogs.description_contradicts_sql catches three high-signal failure modes deterministically: copy-paste descriptions, scope contradictions ("toutes les lignes" + non-trivial WHERE), and measure/agg mismatches ("count of customers" + SUM(...)).
catalogs.effective_description_coverage reports raw vs effective coverage. Effective = raw minus weak / phantom-documented / contradicts-SQL columns. The gap is the share of docs an agent cannot trust.
Severity, grain-declared, and zero-tests are not root issues. Severity gets one line under Hygiene; grain-declared is informational unless the description is also silent on cardinality; zero-tests is appendix-only.
"Safe today" criteria. No Blocker flags, key columns agent-ready, no high-confidence phantoms, either has a PK test OR has 0 inbound refs, not a staging-only alternative to a core model. Grain qualifies via description text ("one row per customer per day") even without meta.grain:.

The script-generated review queue drives the flag-driven review group: the Python inventory script extracts SQL snippets (WHERE, CASE WHEN, COALESCE, JOINs), builds a concept index, and generates cross-model flags. The LLM reviews flagged concept neighborhoods rather than importance-scored individual models. Granular catalogs (weak descriptions, convention drift, concept variants, same_name_different_grain, phantom_columns_by_model, enum_value_gaps, seeds_not_tested, unit_variants, unprefixed_booleans, overlapping_concept_columns_within_model, lineage_cycles, yaml_vs_sql_column_count_diff) are consumed by synthesis as appendix tables with exact file paths and column names.

Routing table

Project size	Inventory	Flag-driven review	Per-model review	Glossary ask
≤30 models	Script	Inline (no subagents)	Inline (no subagents)	After inventory
31-50 models	Script	2 parallel subagents	3 parallel subagents	After inventory
51-200 models	Script	3-4 parallel subagents	3-4 parallel subagents	After checkpoint
>200 models	Script + user confirm	4 parallel subagents	4 parallel subagents	After checkpoint

Ground rules

Never hallucinate findings. Every issue must reference a specific file path, model name, and column name you actually read.
Never invent model names, column names, or file paths. Every name must come from a file you read.
If unsure, label it "possible issue" with reasoning. Do not state uncertain findings as fact.
Be honest about limitations. The "What this audit cannot detect" section is mandatory.
Exact counts only. Never use approximate counts (~40%, ~200) in the report. Every metric must be an exact fraction (e.g., 47/118). If you cannot count precisely, say "not assessed" rather than guessing.

Locate the skill's phase files

The phase instruction files are located relative to this SKILL.md file. Determine the absolute path to this skill's directory, then construct paths like {skill_dir}/phases/inventory.md, etc. If the skill is installed at ~/.claude/skills/dbt-agent-readiness/SKILL.md, then phase files are at ~/.claude/skills/dbt-agent-readiness/phases/inventory.md.

Use Glob to find the phase files: **/dbt-agent-readiness/phases/*.md. Record the base path.

Step 1: Discovery and scoping (inline, ~2 min)

Run this yourself. Do not delegate.

1a. Find the dbt project

Find dbt_project.yml at the target path. Read it. Extract project name, model paths, vars, and global configs.

Check for global test severity. If data_tests: +severity: warn is set project-wide, record it under Hygiene as one line ("project severity defaults to warn"). Do NOT build a root issue around it — the team likely knows and may monitor via Elementary, Dagster asset checks, or re_data.

Jinja-aware severity parsing. If +severity: is a Jinja expression like {{ env_var('CI_SEVERITY', 'warn') }}, extract the default argument ('warn') and treat that as the effective default. Note "inferred from env_var default" in the Hygiene line.

1b. Build quick file counts

Use Glob (exclude dbt_packages/, target/):

**/*.sql under model paths → count SQL models
**/*.yml and **/*.yaml under model paths → count schema files

1c. Classify models by layer

Use directory paths and naming prefixes:

Staging: staging/ or stg_ prefix
Intermediate: intermediate/, prep/, base/, or int_ prefix
Core/marts/reference: marts/, reference/, reporting/, core/, presentation/
Other: utility, date spines

Layer classification is used for reporting only. It does not drive scoping.

1d. Quick ref-count

Use the Grep tool to find all ref() calls across the project:

Pattern: ref$['"][^'"]+['"]$ with glob *.sql under each model path, output_mode content
Parse the matches to count how many times each model name appears as a ref target

If the Grep tool returns too many results, use this Bash one-liner instead:

python3 -c "
import re; from pathlib import Path; from collections import Counter
c = Counter()
for f in Path('{model_path}').rglob('*.sql'):
    if 'dbt_packages' not in f.parts:
        c.update(re.findall(r\"ref\(['\\\"]([^'\\\"]+)\", f.read_text()))
for m, n in c.most_common(): print(f'{n:4d} {m}')
"

Store the result as pre_ref_counts. Models with 3+ refs are priority models that will get full inventory treatment. Count them: n_priority.

1e. Check for previous audit

Check if {project_path}/dbt-agent-readiness.md exists. If it does, read it and store its contents as previous_audit. This will be used in Step 6 to produce a "Changes since last aud

dbt-agent-readiness