Semantic Layer - Capture

This skill covers when and how to capture new patterns into the semantic layer. For schema reference and query grammar, load the sl skill first.

When the current turn produces a reusable pattern (business metric, derived view, join pattern, computed dimension), capture it so future queries can reach for it instead of rediscovering it.

SQL dialect

The user-facing prompt includes a Warehouse: line under the SL Sources index (e.g. Warehouse: BIGQUERY). All expr strings - measure expressions, segment predicates, computed-column SQL - execute on that warehouse and must use its syntax. Date arithmetic in particular varies by dialect:

BigQuery: transaction_date >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 90 DAY) (when the column is TIMESTAMP); event_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 90 DAY) (when DATE).
Postgres / Redshift: transaction_date >= current_date - interval '90 days'.
Snowflake: transaction_date >= dateadd(day, -90, current_timestamp()).

Match the column's manifest type (type: time → TIMESTAMP/DATETIME on the warehouse) - comparing TIMESTAMP to a DATE-arithmetic result fails on BigQuery. After every sl_edit_source/sl_write_source, the inline validator runs a LIMIT 1 warehouse probe per measure and surfaces dialect mismatches; if you see an error trailer, fix the expression and retry rather than leaving the source for the post-squash gate to revert.

What's worth capturing

Business metric aggregations (ARR, MRR, revenue, churn, retention, conversion, LTV, CAC).
Derived calculations combining multiple signals (risk scores, health scores, composite KPIs).
Multi-table join patterns producing a reusable analytical view.
Computed categories or flags useful as reusable dimensions (case when num_protocols >= 3 then 'power' else 'regular' end).
Missing joins between two sources that both exist but aren't connected in the join graph.

Skip:

Simple SELECT * LIMIT 10 previews.
Trivial COUNT(*) on one table with no business filtering.
One-off ad-hoc explorations unlikely to repeat.
Equivalent measures that already exist (cite the existing one as source.measure_name).

When in doubt, capture. Measures are easy to remove but impossible to recover from a lost conversation.

Generalization rules

The SL must stay small and general over time. Before adding a measure, decide whether it belongs as a generic pattern or a specific constant.

Prefer one generic measure with query-time filters over N hardcoded variants.

Anti-pattern:

- name: revenue_us_region
  expr: sum(case when region = 'US' then amount end)
- name: revenue_eu_region
  expr: sum(case when region = 'EU' then amount end)

Preferred:

- name: total_revenue
  expr: sum(amount)

Callers filter region = 'US' at query time.

Bake constants in only when the filter has named business meaning that won't change (enterprise_arr for a contractually defined tier), cannot be expressed via the source's dimensions, or comes from a regulated/fixed list.

Time anchors and value lists belong in callers' filters, not in measure expressions or source SQL.

Anti-pattern (date anchor inlined): expr: count(distinct case when transaction_date >= '2026-04-12' then customer_id end) - the date will need editing every time the question shifts, and every reader has to discover it.
Anti-pattern (value list inlined in source SQL): WHERE product_category_1 IN ('Testosterone', 'Weight Loss', …) - locks the source to today's catalog and blocks callers from broadening or narrowing.
Preferred: a generic measure (count(distinct customer_id)) plus either a named segment that captures the meaning of the anchor (gh_new_products_since_launch) or a query-time filter. Callers compose; the source stays small.
A date is durable to bake in only when it represents a regulatory cutover, a contractually fixed boundary, or a one-time event that reshapes how the source itself is read.

If you create a segment whose expr matches a measure's filter, the measure MUST reference the segment via segments: [segment_name] rather than re-inlining the predicate. This is the canonical pattern even with a single measure - duplicating the predicate inline defeats the purpose of naming it.

Anti-pattern:

segments:
  - name: engaged_subscriber
    expr: "is_paid = true AND <date-window-90-days-on-transaction_date>"
measures:
  - name: engaged_subscriber_count
    expr: "count(distinct case when is_paid = true and transaction_date >= current_date - interval '90 day' then admin_user_id end)"

Preferred:

segments:
  - name: engaged_subscriber
    expr: "is_paid = true AND <date-window-90-days-on-transaction_date>"
measures:
  - name: engaged_subscriber_count
    expr: "count(distinct admin_user_id)"
    segments: [engaged_subscriber]

Use computed dimensions for derived categories. A flag like is_power_user belongs on columns[] with expr, not inlined into every measure.

Extract repeated filter bundles into named segments. If the same predicate appears on multiple measures of the same source, lift it to a segments[] entry and have each measure reference it. One edit updates every measure that depends on it.

Never write a standalone file on a manifest-backed name. If sl_discover({ query: "<table-or-source-name>" }) finds an existing schema for that name, you MUST write an overlay. A standalone with sql: or table: on a manifest-backed name clobbers the inherited columns and joins; sl_write_source and sl_validate both reject this shape with a clear fix hint. Always run sl_discover before your first write on any existing name.

Overlay before/after examples:

# Wrong: patches an inherited manifest column through columns:
name: fct_orders
columns:
  - name: status
    descriptions:
      user: "Order lifecycle status."

# Right: patch inherited columns with column_overrides:
name: fct_orders
column_overrides:
  - name: status
    descriptions:
      user: "Order lifecycle status."
columns:
  - name: is_large_order
    type: boolean
    expr: "amount > 1000"

Overlay YAML may include measures:, segments:, descriptions:, joins:, disable_joins:, exclude_columns:, column_overrides:, and computed-only columns: entries with expr and type. Do not include sql:, table:, grain:, or base-table columns:.

Prefer overlay decomposition over standalone SQL sources. Before reaching for source_type: sql, check whether the metric decomposes into measures on existing overlays (including cross-source derived measures). Use source_type: sql only when:

The metric requires per-user/per-entity derivation that cannot be expressed as a single expr (e.g., EXISTS over a time-windowed subset), OR
The metric requires multi-step CTEs whose intermediate grain is not a column in any existing source.

When an sql source is unavoidable, note in its descriptions map which SL gap forced the choice so it can be retired once the primitive ships. It must target a name NOT in the manifest - pick a distinct one (e.g. mrr_waterfall_rollup, not fct_orders).

Slim standalone sources via `inherits_columns_from`

When a standalone SQL source filters or projects from a single manifest-backed base table (the common pattern for derived views like aav_consignments over MARTS.CONSIGNMENTS), set inherits_columns_from: to the base table's manifest key and list only column names in columns:. Compose-time enrichment fills type, descriptions, and role from the matching manifest column.

Discover the manifest key with sl_discover - pass the bare name (CONSIGNMENTS), the fully-qualified path (ANALYTICS.MARTS.CONSIGNMENTS), or any suffix; the tool resolves all forms and prints the canonical key in its output.

name: aav_consignments
descriptions:
  user: AAV consignments - fi

sl_capture

Cómo agregar

Pega en el README de tu repo

Skills relacionadas

xlsx

mem-search

weekly-digests

how-it-works

Recibe nuevas skills de Dados e Análise todos los lunes

Semantic Layer - Capture

SQL dialect

What's worth capturing

Generalization rules

Slim standalone sources via `inherits_columns_from`

Comentarios · Sin comentarios

Cómo agregar

Pega en el README de tu repo

Skills relacionadas

xlsx

mem-search

weekly-digests

how-it-works

Recibe nuevas skills de Dados e Análise todos los lunes

Semantic Layer - Capture

SQL dialect

What's worth capturing

Generalization rules

Slim standalone sources via inherits_columns_from

Comentarios · Sin comentarios

Slim standalone sources via `inherits_columns_from`