Semantic Layer
KTX's semantic layer (SL) is a structured catalog. Each source represents a table, a SQL view, or an overlay that enriches a manifest-backed table with measures, computed columns, joins, and named segments. The catalog is the single source of truth for reusable business metrics.
This skill covers two parts:
- Part 1 - Schema reference (what an SL source looks like).
- Part 2 - Querying via
sl_query.
Capture (when and how to add new patterns to the SL) is a separate concern handled by the memory-agent - see the sl_capture skill if you are running in capture mode. The research agent reads and queries the SL via the tools described here; it does not write to it.
For capture-time identifier verification, load sl_capture. Synthesis writer
skills must verify warehouse identifiers with discover_data,
entity_details, and sql_execution before emitting table or column names.
Part 1 - Schema reference
An SL source is a YAML file at semantic-layer/<connectionId>/<source_name>.yaml. There are three flavors:
Overlay sources
Enrich a manifest-backed table with measures, computed columns, joins, and segments. No table or sql field. The base table's columns and grain are inherited from the manifest.
name: fct_orders # must match an existing manifest table
descriptions:
user: "Overlay adding business measures to the orders fact table."
measures:
- name: total_revenue
expr: sum(amount)
description: Total order revenue - filter by status or region at query time
columns: # computed dimensions only
- name: is_large_order
type: boolean
expr: "amount > 1000"
column_overrides: # metadata patches for inherited columns
- name: status
descriptions:
user: "Order lifecycle status."
segments:
- name: paid_non_refunded
expr: "is_paid = true AND is_refunded = false"
joins:
- to: customers
on: "customer_id = customers.id"
relationship: many_to_one
Rules:
- Do not repeat base-table columns, grain,
table, orsource_typein an overlay - those are inherited. - Overlay columns MUST be computed (
expr+type). - Use
column_overridesto add descriptions or metadata to inherited manifest columns. Do not puttypeorexprincolumn_overrides. exclude_columnshides specific manifest columns;disable_joinssuppresses specific auto-detected joins.
Standalone table sources
Self-contained; own their schema. Has source_type: table and table:.
name: account_health_scores
source_type: table
table: "analytics.account_health_scores"
grain: [account_id, snapshot_date]
columns:
- name: account_id
type: string
- name: snapshot_date
type: time
role: time
- name: health_score
type: number
measures:
- name: avg_health_score
expr: avg(health_score)
Standalone SQL sources
Self-contained; schema derived from a SQL query. Has source_type: sql and sql:.
name: monthly_cancellations
source_type: sql
sql: |
SELECT
date_trunc('month', cancelled_at) AS month,
customer_id,
plan_name,
mrr_amount
FROM subscriptions
WHERE status = 'cancelled'
grain: [customer_id, month]
columns:
- name: month
type: time
role: time
- name: customer_id
type: string
- name: plan_name
type: string
- name: mrr_amount
type: number
measures:
- name: cancellation_count
expr: count(*)
An SQL source is a one-shot answer: the aggregation is frozen, callers cannot re-group or re-filter by columns the SQL has collapsed, and the source is disconnected from the join graph. Prefer overlays + measures over SQL sources when possible - the sl_capture skill covers when SQL is justified.
Columns
Every standalone column requires name and type. Overlays have computed columns in columns: and manifest column metadata patches in column_overrides:.
type: one ofstring,number,boolean,time. Map LookMLdate/datetime/timestamp→time. Map LookMLyesno→boolean.role(optional):timeenables time-granularity queries (month, week, day).defaultis the implicit fallback.visibility(optional):public,internal, orhidden.expr(optional for standalone, required for overlay columns): SQL expression that computes the value. Expanded by sqlglot before generating SQL, so you can reference other columns on the same source.
Grain
grain: [col_a, col_b] - the set of columns that uniquely identify one row. The query engine uses grain to prevent fanout in joins. Overlays inherit grain from the manifest unless they override.
Joins
joins:
- to: customers # target source name
on: "customer_id = customers.id" # local_col = TARGET.target_col
relationship: many_to_one # or one_to_many, one_to_one
alias: primary_customer # optional - lets you join the same target twice
onformat:local_col = TARGET.target_col. Always qualify the right side with the target source name.relationshipis the cardinality from this source to the target. Most joins aremany_to_one(FK → PK on the parent).
Measures
measures:
- name: total_arr
expr: sum(arr_amount)
description: Sum of ARR - filter by plan_name at query time
filter: "is_active = true"
segments: [paid_non_refunded]
name(required, snake_case).expr(required): any valid SQL aggregate -sum(x),count(*),count(distinct user_id),avg(score).description(required on capture): what the measure computes and how to use it.filter(optional): SQL predicate applied as a WHERE clause specific to this measure.segments(optional): names of segments defined on the same source. The engine AND-composes each segment'sexprinto this measure's effective filter.
Use safe_divide(num, den) for ratio measures to avoid division by zero.
Segments
segments:
- name: paid_non_refunded
expr: "is_paid = true AND is_refunded = false"
description: Orders that were paid and not refunded
Named, reusable boolean predicates scoped to one source. Reference by bare name in a measure's segments: [], or by dotted form source.segment_name in an sl_query. Segments are predicates only - they are NOT selectable as dimensions. If you need to group by the predicate, add a columns[] entry instead.
Cross-references with the wiki
The reverse edge (wiki pages that cite this source) is derived automatically from each wiki's sl_refs: - you don't emit anything on the SL side. Author the edge once on the wiki via sl_refs:; the post-write reconciler populates the knowledge↔SL index.
Part 2 - Querying via sl_query
The sl_query tool generates correct SQL from a structured query. It handles joins, fanout prevention, aggregation correctness, and filter classification automatically. Prefer it over writing raw SQL whenever the SL has the relevant sources.
When to prefer sl_query over raw SQL
- A pre-defined measure already exists (
source.measure_nameappears in the catalog). - The question combines fields from multiple sources - the engine resolves the join path automatically.
- The question asks for a standard metric (revenue, ARR, churn, retention, LTV, conversion, MAU, etc.) - even if no pre-defined measure exists, a runtime aggregation over a catalog column is usually correct.
Use raw SQL (sql_execution) only when:
- The computation requires multi-step CTEs whose intermediate grain is not a column in any source.
- The question explicitly asks for a one-off exploration that will never be asked again.
Input shape
{
"connectionId": "uuid-of-the-connection",
"measures": ["orders.total_revenue", "sum(orders.amount)"],
"dimensions": ["customers.segment", { "field": "orders.created_at", "granularity":