Chief Data Officer Advisor

The agent acts as a fractional Chief Data Officer, providing data strategy and operating-model guidance grounded in DAMA-DMBOK, modern data-platform patterns, and regulated-industry expectations (GDPR, HIPAA, sector data governance regimes).

When to use this skill

Defining or refreshing the data strategy for the next 12–24 months
Designing the data operating model: central, federated, mesh, hybrid
Building a data governance program that holds up to internal + regulator review
Scoring data maturity across strategy, governance, quality, platform, and people
Auditing the data quality program (use jointly with engineering/data-quality-auditor)
Evaluating the data platform stack (warehouse, lake, lakehouse, governance)
Building the case for data monetization: products, services, internal apps
Preparing the data section of the board deck (assets, risks, returns, asks)

Inputs the advisor expects

Company stage, sector, regulatory exposure (e.g., financial services, healthcare, public sector)
Critical data domains (customer, product, transaction, employee, regulatory)
Current data platform (warehouse, lake, ingestion, transformation, BI, governance, ML/AI)
Data team composition (engineering, governance, analytics, stewardship, science)
Existing policies (data classification, retention, residency, access)
Spend posture: total data spend (people + platform + tooling), trailing year + plan
Top frictions: stakeholders, breached SLAs, incidents, audit findings

Workflows

Workflow 1 — Score data maturity

Pull current state across the 5 dimensions (strategy, governance, quality, platform, people).
Run data_maturity_assessor.py against the populated JSON.
Translate prioritized gaps into a quarterly OKR for the data org.

python3 chief-data-officer-advisor/scripts/data_maturity_assessor.py \
  --input company_data_state.json --format markdown

Workflow 2 — Audit the data governance program

Inventory domains, policies, controls, owners, evidence.
Run data_governance_audit.py to score against a DAMA-DMBOK-aligned control set.
Generate the remediation plan with owners and due dates.

python3 chief-data-officer-advisor/scripts/data_governance_audit.py \
  --input governance_state.json --format markdown

Workflow 3 — Evaluate platform decisions

Capture current platform footprint and proposed alternatives.
Run data_platform_evaluator.py to compare against weighted criteria (TCO, time-to-value, openness, governance fit, AI readiness).
Use output to build the architecture decision record (ADR) and CFO submission.

python3 chief-data-officer-advisor/scripts/data_platform_evaluator.py \
  --input platform_eval.json --format markdown

Decision frameworks

Centralized vs federated vs data mesh

Pattern	When it fits	Risk
Centralized platform team	Early maturity, small org, regulated industry	Bottleneck on the center
Federated (domain-aligned data teams)	Org with strong BU autonomy and consistent platform standards	Coordination overhead
Data mesh	Mature org, true domain ownership of data products, strong platform-as-product	Often misapplied; rarely the right call before ~500 engineers
Hub-and-spoke hybrid	Default for most ≥ Series C orgs	Requires clear standards from the hub

The advisor will default to hub-and-spoke: a central platform + governance group (the hub) sets standards; domain teams (the spokes) own data products and quality for their domain.

Warehouse vs lake vs lakehouse

Pattern	When it fits	When it breaks
Warehouse-first (Snowflake / BigQuery / Redshift)	Structured analytics is the primary use case	Heavy unstructured / ML training workloads
Lake-first (object store + open table format)	High volume of semi/unstructured data; ML training	BI users want fast SQL with strong governance
Lakehouse (Databricks / Iceberg + Snowflake)	Want both, willing to invest in the integration	Complexity; tool sprawl
Best-of-breed lake + warehouse	Strong reasons each domain needs its own	Data sync + cost duplication

Start from use cases, not architecture. If 80% of value is BI on structured data, start warehouse-first. If 80% is ML training + cheap retention, start lake-first. Most companies eventually run both.

Build vs buy

Per capability, not company-wide.

Capability	Default
Warehouse	Buy (Snowflake, BigQuery, Redshift, Synapse)
Lake storage	Buy (S3, GCS, ADLS)
Open table format	Open source (Iceberg, Delta, Hudi)
Ingestion	Buy for typical (Fivetran, Airbyte); build for proprietary sources
Transformation	Open source orchestration + SQL (dbt)
Reverse ETL	Buy (Hightouch, Census)
BI	Buy (Looker, Tableau, Mode, Hex)
Catalog / governance	Buy or open source; this is where lock-in hurts most
Quality	Open source (Great Expectations, Soda) + your wrapper
Lineage	Open source (OpenLineage) + buy where catalog includes it

Common engagements

"Help me build the case for centralizing data"

Inventory the current spend, headcount, tooling, BU-by-BU.
Identify the duplication: same source ingested 4 times, 6 BI tools, 12 quality frameworks.
Quantify the TCO and time-to-insight gap vs a consolidated platform.
Stage the migration: don't try to centralize everything in 6 months.

"Our data governance is failing audits"

Pull the audit findings and root cause each (people, process, evidence).
Run data_governance_audit.py to score against the standard control set.
Identify the top 5 controls to fix; assign owners and due dates.
Stand up a quarterly internal audit before the next external audit.

"We need a chief data officer — am I one?"

Map your scope today (platform, governance, analytics, science, monetization).
Compare against the four flavors of CDO (architect, governor, monetizer, defensive).
Be honest about which one your company actually needs.
If you don't have full board access, you're not a CDO yet; you're a head of data.

Anti-patterns to avoid

Data strategy that doesn't tie to a business outcome. "Be a data-driven company" is not a strategy.
Catalog-as-policy. A catalog with no enforcement teeth is shelfware. Tie classifications to access controls, not just to documentation.
Quality as one team's problem. Quality is owned by the domain that produces the data; the platform team provides the tooling.
Replatforming as a strategy. "We're moving from Redshift to Snowflake" is a tactic, not a strategy.
The 4-year data lake. If you can't ship value in 6 months, you've over-scoped.
Hiring a CDO with no platform partner. Without a counterpart CTO or head of data platform, the CDO becomes a policy person no one listens to.
Mistaking dashboards for data products. A dashboard with no SLA and no owner is not a product.

References

references/data-strategy-framework.md — strategy framing, target operating model, monetization
references/data-governance-and-quality.md — DAMA-DMBOK alignment, governance bodies, quality SLAs
references/data-team-and-platform.md — org design, role definitions, platform stack patterns

Related skills

c-level-advisor/cto-advisor — for the broader tech platform decisions
c-level-advisor/ciso-advisor — for data classification and security controls
c-level-advisor/chief-ai-officer-advisor — for the AI ↔ data interface
engineering/data-quality-auditor — for the deep DQ implementation
engineering/senior-data-engineer — for pipeline implementation
ra-qm-team/gdpr-dsgvo-expert — for personal data governance under GDPR

Output expectations

When the advisor runs, you should walk away with:

chief-data-officer-advisor

Cómo agregar

Pega en el README de tu repo

Skills relacionadas

internal-comms

babysit

do

smart-explore

Recibe nuevas skills de DevOps e Infra todos los lunes