dbt-agent-readiness (orchestrator)
Target project path: $0 (or current directory if not specified)
This file orchestrates the audit. Phase instructions live in phases/ files. Each subagent reads only its own phase file. The orchestrator passes structured JSON between phases.
Design principles
The audit is framed around what an agent actually hits (code evidence) vs what an audit can only forecast (missing tests, unenforced relationships). Findings that require querying the warehouse to confirm (duplicates, orphans, bad data) are not reported as facts.
- Report split into Blockers vs Hygiene. Blockers require code evidence (scope divergence, copy-paste descriptions, broken refs, polymorphic columns, within-model collisions, unit mismatch, measure/agg mismatch, high-confidence phantom columns). Hygiene lists missing tests and unenforced relationships, each with a runnable verification query the user can run in ten minutes to confirm or dismiss.
- Broken refs always Blocker #1.
issues.broken_refsis the most evidence-backed possible finding: the query fails at compile time. - Phantom columns suppressed when evidence is weak. If a model uses
dbt_utils.star,SELECT *that can't be resolved, or Jinja for-loops, and no compiled manifest is present, the phantom finding is not emitted. It goes tocatalogs.phantom_columns_suppressed_no_manifestinstead. Synthesis emits one aggregate "rundbt compile" notice rather than per-modelprovisionalrows. All rows inphantom_columns_by_modelcarryconfidence: 'high'and are evidence-backed. - Phantom columns traced through multi-hop CTEs and column lineage.
_extract_columns_via_sqlglotresolves CTE output columns recursively (depth 10), so a YAML column that survives throughbase -> mid -> topis not flagged phantom. When the simple YAML-vs-SQL diff still flags something,cross_referenceretries viasqlglot.lineage.lineage; resolved findings land incatalogs.phantom_columns_resolved_by_lineagerather thanphantom_columns. Unit / currency drift candidates (col * 100,col / 100.0, etc.) are collected intocatalogs.potential_unit_driftfor synthesis to treat as Blocker candidates when the description doesn't call out the conversion. catalogs.description_contradicts_sqlcatches three high-signal failure modes deterministically: copy-paste descriptions, scope contradictions ("toutes les lignes" + non-trivial WHERE), and measure/agg mismatches ("count of customers" +SUM(...)).catalogs.effective_description_coveragereports raw vs effective coverage. Effective = raw minus weak / phantom-documented / contradicts-SQL columns. The gap is the share of docs an agent cannot trust.- Severity, grain-declared, and zero-tests are not root issues. Severity gets one line under Hygiene; grain-declared is informational unless the description is also silent on cardinality; zero-tests is appendix-only.
- "Safe today" criteria. No Blocker flags, key columns agent-ready, no high-confidence phantoms, either has a PK test OR has 0 inbound refs, not a staging-only alternative to a core model. Grain qualifies via description text ("one row per customer per day") even without
meta.grain:.
The script-generated review queue drives the flag-driven review group: the Python inventory script extracts SQL snippets (WHERE, CASE WHEN, COALESCE, JOINs), builds a concept index, and generates cross-model flags. The LLM reviews flagged concept neighborhoods rather than importance-scored individual models. Granular catalogs (weak descriptions, convention drift, concept variants, same_name_different_grain, phantom_columns_by_model, enum_value_gaps, seeds_not_tested, unit_variants, unprefixed_booleans, overlapping_concept_columns_within_model, lineage_cycles, yaml_vs_sql_column_count_diff) are consumed by synthesis as appendix tables with exact file paths and column names.
Routing table
| Project size | Inventory | Flag-driven review | Per-model review | Glossary ask |
|---|---|---|---|---|
| ≤30 models | Script | Inline (no subagents) | Inline (no subagents) | After inventory |
| 31-50 models | Script | 2 parallel subagents | 3 parallel subagents | After inventory |
| 51-200 models | Script | 3-4 parallel subagents | 3-4 parallel subagents | After checkpoint |
| >200 models | Script + user confirm | 4 parallel subagents | 4 parallel subagents | After checkpoint |
Ground rules
- Never hallucinate findings. Every issue must reference a specific file path, model name, and column name you actually read.
- Never invent model names, column names, or file paths. Every name must come from a file you read.
- If unsure, label it "possible issue" with reasoning. Do not state uncertain findings as fact.
- Be honest about limitations. The "What this audit cannot detect" section is mandatory.
- Exact counts only. Never use approximate counts (~40%, ~200) in the report. Every metric must be an exact fraction (e.g., 47/118). If you cannot count precisely, say "not assessed" rather than guessing.
Locate the skill's phase files
The phase instruction files are located relative to this SKILL.md file. Determine the absolute path to this skill's directory, then construct paths like {skill_dir}/phases/inventory.md, etc. If the skill is installed at ~/.claude/skills/dbt-agent-readiness/SKILL.md, then phase files are at ~/.claude/skills/dbt-agent-readiness/phases/inventory.md.
Use Glob to find the phase files: **/dbt-agent-readiness/phases/*.md. Record the base path.
Step 1: Discovery and scoping (inline, ~2 min)
Run this yourself. Do not delegate.
1a. Find the dbt project
Find dbt_project.yml at the target path. Read it. Extract project name, model paths, vars, and global configs.
Check for global test severity. If data_tests: +severity: warn is set project-wide, record it under Hygiene as one line ("project severity defaults to warn"). Do NOT build a root issue around it — the team likely knows and may monitor via Elementary, Dagster asset checks, or re_data.
Jinja-aware severity parsing. If +severity: is a Jinja expression like {{ env_var('CI_SEVERITY', 'warn') }}, extract the default argument ('warn') and treat that as the effective default. Note "inferred from env_var default" in the Hygiene line.
1b. Build quick file counts
Use Glob (exclude dbt_packages/, target/):
**/*.sqlunder model paths → count SQL models**/*.ymland**/*.yamlunder model paths → count schema files
1c. Classify models by layer
Use directory paths and naming prefixes:
- Staging:
staging/orstg_prefix - Intermediate:
intermediate/,prep/,base/, orint_prefix - Core/marts/reference:
marts/,reference/,reporting/,core/,presentation/ - Other: utility, date spines
Layer classification is used for reporting only. It does not drive scoping.
1d. Quick ref-count
Use the Grep tool to find all ref() calls across the project:
- Pattern:
ref\(['"][^'"]+['"]\)with glob*.sqlunder each model path, output_modecontent - Parse the matches to count how many times each model name appears as a ref target
If the Grep tool returns too many results, use this Bash one-liner instead:
python3 -c "
import re; from pathlib import Path; from collections import Counter
c = Counter()
for f in Path('{model_path}').rglob('*.sql'):
if 'dbt_packages' not in f.parts:
c.update(re.findall(r\"ref\(['\\\"]([^'\\\"]+)\", f.read_text()))
for m, n in c.most_common(): print(f'{n:4d} {m}')
"
Store the result as pre_ref_counts. Models with 3+ refs are priority models that will get full inventory treatment. Count them: n_priority.
1e. Check for previous audit
Check if {project_path}/dbt-agent-readiness.md exists. If it does, read it and store its contents as previous_audit. This will be used in Step 6 to produce a "Changes since last aud