When this skill is activated, always start your first response with the 🧢 emoji.

Data Warehousing

A practical framework for designing, building, and optimizing analytical data warehouses using dimensional modeling. This skill covers star and snowflake schema design, slowly changing dimension (SCD) patterns, and platform-specific guidance for Snowflake and BigQuery. The focus is on making the right modeling decisions that balance query performance, storage cost, and maintainability for downstream analytics consumers.

When to use this skill

Trigger this skill when the user:

Designs a star schema or snowflake schema for analytical workloads
Implements slowly changing dimensions (Type 1, 2, 3, or hybrid)
Builds fact tables (transactional, periodic snapshot, or accumulating snapshot)
Writes analytical SQL targeting Snowflake or BigQuery
Plans ETL/ELT pipelines that load data into a warehouse
Creates conformed dimensions shared across multiple fact tables
Optimizes warehouse query performance (clustering, partitioning, materialized views)
Chooses between Snowflake and BigQuery for a new project

Do NOT trigger this skill for:

OLTP schema design or transactional database tuning (use database-engineering)
Data pipeline orchestration tools like Airflow or dbt (those have their own skills)

Key principles

Model for the query, not the source - Warehouse schemas exist to make analytical queries fast and intuitive. Denormalize aggressively compared to OLTP. If analysts need to join seven tables to answer a basic question, the model is wrong.
Grain is the single most important decision - Every fact table must have a clearly declared grain (one row = one transaction, one day per customer, etc.). Mixing grains in a single fact table causes double-counting and broken aggregations that are extremely hard to debug.
Conformed dimensions enable cross-process analysis - A shared dim_customer or dim_date table used across all fact tables lets analysts drill across business processes without reconciliation headaches. Build conformed dimensions first.
Slowly changing dimensions must be an explicit design choice - Every dimension attribute changes over time. Decide upfront whether to overwrite (Type 1), track history (Type 2), or store previous value (Type 3). Defaulting to Type 1 and later needing history is a painful migration.
Partition and cluster for your access patterns - Cloud warehouses charge by data scanned. Partitioning by date and clustering by high-cardinality filter columns can reduce costs and query times by 10-100x. Design these at table creation time.

Core concepts

Dimensional modeling entities

Entity	Role	Example
Fact table	Stores measurable business events (metrics)	`fct_orders`, `fct_page_views`
Dimension table	Stores descriptive context for facts	`dim_customer`, `dim_product`, `dim_date`
Surrogate key	Warehouse-generated integer/hash PK for dimensions	`customer_sk` (vs natural key `customer_id`)
Degenerate dimension	Dimension attribute stored directly on the fact table	`order_number` on `fct_order_items`
Conformed dimension	A dimension shared identically across multiple fact tables	`dim_date`, `dim_geography`

Star schema vs snowflake schema

A star schema has fact tables at the center with denormalized dimension tables radiating outward - one join from fact to any dimension. A snowflake schema normalizes dimensions into sub-dimensions (e.g., dim_product -> dim_category -> dim_department). Star schemas are preferred for most analytical workloads because they minimize joins and are easier for BI tools to consume. Snowflake schemas save storage but add join complexity - only use them when dimension tables are extremely large (100M+ rows) and share sub-dimensions across many parents.

Fact table types

Type	Grain	Example	When to use
Transaction	One row per event	`fct_orders`	Most common; captures atomic events
Periodic snapshot	One row per entity per period	`fct_daily_inventory`	Regular status measurements
Accumulating snapshot	One row per process lifetime	`fct_order_fulfillment`	Track milestones (ordered, shipped, delivered)
Factless fact	No measures, only dimension keys	`fct_student_attendance`	Record that an event occurred

Slowly changing dimensions (SCD)

Type	Behavior	Trade-off
Type 0	Never changes (fixed attributes)	Use for birth date, original sign-up date
Type 1	Overwrite old value	Simple but loses history
Type 2	Add new row with version tracking	Preserves full history; most common for analytics
Type 3	Add column for previous value	Tracks one prior value only; rarely sufficient
Type 6 (hybrid 1+2+3)	Type 2 rows + current value column	Best of both: history + easy current-state queries

Common tasks

Design a star schema

Model a retail sales domain with conformed dimensions.

-- Date dimension (conformed - used by all fact tables)
CREATE TABLE dim_date (
  date_sk       INT         PRIMARY KEY,  -- YYYYMMDD integer
  full_date     DATE        NOT NULL,
  day_of_week   VARCHAR(10) NOT NULL,
  month_name    VARCHAR(10) NOT NULL,
  quarter       INT         NOT NULL,
  fiscal_year   INT         NOT NULL,
  is_weekend    BOOLEAN     NOT NULL,
  is_holiday    BOOLEAN     NOT NULL
);

-- Customer dimension
CREATE TABLE dim_customer (
  customer_sk   INT         PRIMARY KEY,  -- surrogate key
  customer_id   VARCHAR(50) NOT NULL,     -- natural key
  name          VARCHAR(200),
  segment       VARCHAR(50),
  region        VARCHAR(100),
  -- SCD Type 2 tracking
  effective_date DATE       NOT NULL,
  expiry_date    DATE       NOT NULL DEFAULT '9999-12-31',
  is_current     BOOLEAN    NOT NULL DEFAULT TRUE
);

-- Product dimension
CREATE TABLE dim_product (
  product_sk    INT         PRIMARY KEY,
  product_id    VARCHAR(50) NOT NULL,
  product_name  VARCHAR(200),
  category      VARCHAR(100),
  subcategory   VARCHAR(100),
  brand         VARCHAR(100),
  unit_cost     DECIMAL(12,2)
);

-- Sales fact table (transaction grain: one row per line item)
CREATE TABLE fct_sales (
  sale_sk       BIGINT      PRIMARY KEY,
  date_sk       INT         NOT NULL REFERENCES dim_date(date_sk),
  customer_sk   INT         NOT NULL REFERENCES dim_customer(customer_sk),
  product_sk    INT         NOT NULL REFERENCES dim_product(product_sk),
  quantity      INT         NOT NULL,
  unit_price    DECIMAL(12,2) NOT NULL,
  discount_amt  DECIMAL(12,2) NOT NULL DEFAULT 0,
  net_amount    DECIMAL(12,2) NOT NULL,
  order_number  VARCHAR(50) NOT NULL  -- degenerate dimension
);

Declare the grain explicitly in a comment or documentation: "One row per order line item per day." Every team member must agree on the grain before building downstream reports.

Implement SCD Type 2 in Snowflake

Track full history of customer attribute changes using MERGE.

-- Snowflake MERGE for SCD Type 2
MERGE INTO dim_customer AS target
USING staging_customers AS source
  ON target.customer_id = source.customer_id
     AND target.is_current = TRUE

-- Existing row where attributes changed: expire it
WHEN MATCHED
  AND (target.segment  != source.segment
    OR target.region   != source.region)
THEN UPDATE SET
  target.expiry_date = CURRENT_DATE - 1,
  target.is_current  = FALSE

-- No match: brand new customer
WHEN NOT MATCHED THEN INSERT (
  customer_sk, customer_id, name, segment, region,
  effective_date, expiry_date, is_current
) VALUES (
  dim_customer_seq.NEXTVAL,
  source.customer_id, source.name, source.segment, source.region,
  CURRENT_DATE, '9999-12-31', TRUE
);

-- Second pass: insert the new current row for changed records
INSERT INTO dim_customer
SELECT dim_customer_seq.NEXTVAL,
       s.customer_id, s.n

data-warehousing

How to add

Drop this on your repo README

Related skills

xlsx

how-it-works

mem-search

weekly-digests

Get new Dados e Análise skills every Monday