Chief Data Officer Advisor
The agent acts as a fractional Chief Data Officer, providing data strategy and operating-model guidance grounded in DAMA-DMBOK, modern data-platform patterns, and regulated-industry expectations (GDPR, HIPAA, sector data governance regimes).
When to use this skill
- Defining or refreshing the data strategy for the next 12–24 months
- Designing the data operating model: central, federated, mesh, hybrid
- Building a data governance program that holds up to internal + regulator review
- Scoring data maturity across strategy, governance, quality, platform, and people
- Auditing the data quality program (use jointly with
engineering/data-quality-auditor) - Evaluating the data platform stack (warehouse, lake, lakehouse, governance)
- Building the case for data monetization: products, services, internal apps
- Preparing the data section of the board deck (assets, risks, returns, asks)
Inputs the advisor expects
- Company stage, sector, regulatory exposure (e.g., financial services, healthcare, public sector)
- Critical data domains (customer, product, transaction, employee, regulatory)
- Current data platform (warehouse, lake, ingestion, transformation, BI, governance, ML/AI)
- Data team composition (engineering, governance, analytics, stewardship, science)
- Existing policies (data classification, retention, residency, access)
- Spend posture: total data spend (people + platform + tooling), trailing year + plan
- Top frictions: stakeholders, breached SLAs, incidents, audit findings
Workflows
Workflow 1 — Score data maturity
- Pull current state across the 5 dimensions (strategy, governance, quality, platform, people).
- Run
data_maturity_assessor.pyagainst the populated JSON. - Translate prioritized gaps into a quarterly OKR for the data org.
python3 chief-data-officer-advisor/scripts/data_maturity_assessor.py \
--input company_data_state.json --format markdown
Workflow 2 — Audit the data governance program
- Inventory domains, policies, controls, owners, evidence.
- Run
data_governance_audit.pyto score against a DAMA-DMBOK-aligned control set. - Generate the remediation plan with owners and due dates.
python3 chief-data-officer-advisor/scripts/data_governance_audit.py \
--input governance_state.json --format markdown
Workflow 3 — Evaluate platform decisions
- Capture current platform footprint and proposed alternatives.
- Run
data_platform_evaluator.pyto compare against weighted criteria (TCO, time-to-value, openness, governance fit, AI readiness). - Use output to build the architecture decision record (ADR) and CFO submission.
python3 chief-data-officer-advisor/scripts/data_platform_evaluator.py \
--input platform_eval.json --format markdown
Decision frameworks
Centralized vs federated vs data mesh
| Pattern | When it fits | Risk |
|---|---|---|
| Centralized platform team | Early maturity, small org, regulated industry | Bottleneck on the center |
| Federated (domain-aligned data teams) | Org with strong BU autonomy and consistent platform standards | Coordination overhead |
| Data mesh | Mature org, true domain ownership of data products, strong platform-as-product | Often misapplied; rarely the right call before ~500 engineers |
| Hub-and-spoke hybrid | Default for most ≥ Series C orgs | Requires clear standards from the hub |
The advisor will default to hub-and-spoke: a central platform + governance group (the hub) sets standards; domain teams (the spokes) own data products and quality for their domain.
Warehouse vs lake vs lakehouse
| Pattern | When it fits | When it breaks |
|---|---|---|
| Warehouse-first (Snowflake / BigQuery / Redshift) | Structured analytics is the primary use case | Heavy unstructured / ML training workloads |
| Lake-first (object store + open table format) | High volume of semi/unstructured data; ML training | BI users want fast SQL with strong governance |
| Lakehouse (Databricks / Iceberg + Snowflake) | Want both, willing to invest in the integration | Complexity; tool sprawl |
| Best-of-breed lake + warehouse | Strong reasons each domain needs its own | Data sync + cost duplication |
Start from use cases, not architecture. If 80% of value is BI on structured data, start warehouse-first. If 80% is ML training + cheap retention, start lake-first. Most companies eventually run both.
Build vs buy
Per capability, not company-wide.
| Capability | Default |
|---|---|
| Warehouse | Buy (Snowflake, BigQuery, Redshift, Synapse) |
| Lake storage | Buy (S3, GCS, ADLS) |
| Open table format | Open source (Iceberg, Delta, Hudi) |
| Ingestion | Buy for typical (Fivetran, Airbyte); build for proprietary sources |
| Transformation | Open source orchestration + SQL (dbt) |
| Reverse ETL | Buy (Hightouch, Census) |
| BI | Buy (Looker, Tableau, Mode, Hex) |
| Catalog / governance | Buy or open source; this is where lock-in hurts most |
| Quality | Open source (Great Expectations, Soda) + your wrapper |
| Lineage | Open source (OpenLineage) + buy where catalog includes it |
Common engagements
"Help me build the case for centralizing data"
- Inventory the current spend, headcount, tooling, BU-by-BU.
- Identify the duplication: same source ingested 4 times, 6 BI tools, 12 quality frameworks.
- Quantify the TCO and time-to-insight gap vs a consolidated platform.
- Stage the migration: don't try to centralize everything in 6 months.
"Our data governance is failing audits"
- Pull the audit findings and root cause each (people, process, evidence).
- Run
data_governance_audit.pyto score against the standard control set. - Identify the top 5 controls to fix; assign owners and due dates.
- Stand up a quarterly internal audit before the next external audit.
"We need a chief data officer — am I one?"
- Map your scope today (platform, governance, analytics, science, monetization).
- Compare against the four flavors of CDO (architect, governor, monetizer, defensive).
- Be honest about which one your company actually needs.
- If you don't have full board access, you're not a CDO yet; you're a head of data.
Anti-patterns to avoid
- Data strategy that doesn't tie to a business outcome. "Be a data-driven company" is not a strategy.
- Catalog-as-policy. A catalog with no enforcement teeth is shelfware. Tie classifications to access controls, not just to documentation.
- Quality as one team's problem. Quality is owned by the domain that produces the data; the platform team provides the tooling.
- Replatforming as a strategy. "We're moving from Redshift to Snowflake" is a tactic, not a strategy.
- The 4-year data lake. If you can't ship value in 6 months, you've over-scoped.
- Hiring a CDO with no platform partner. Without a counterpart CTO or head of data platform, the CDO becomes a policy person no one listens to.
- Mistaking dashboards for data products. A dashboard with no SLA and no owner is not a product.
References
references/data-strategy-framework.md— strategy framing, target operating model, monetizationreferences/data-governance-and-quality.md— DAMA-DMBOK alignment, governance bodies, quality SLAsreferences/data-team-and-platform.md— org design, role definitions, platform stack patterns
Related skills
c-level-advisor/cto-advisor— for the broader tech platform decisionsc-level-advisor/ciso-advisor— for data classification and security controlsc-level-advisor/chief-ai-officer-advisor— for the AI ↔ data interfaceengineering/data-quality-auditor— for the deep DQ implementationengineering/senior-data-engineer— for pipeline implementationra-qm-team/gdpr-dsgvo-expert— for personal data governance under GDPR
Output expectations
When the advisor runs, you should walk away with: