ggsql Query Writer

ggsql is a SQL extension for declarative data visualization based on Grammar of Graphics principles. It lets users combine SQL data queries with visualization specifications in a single, composable syntax.

When the user describes a visualization they want, write a valid ggsql query. Use ONLY syntax documented below. NEVER invent clauses, settings, aesthetics, or layer types.

Query structure

A ggsql query has two parts:

SQL part (optional): Standard SQL executed on the backend. Any tables, CTEs, or SELECT results are available to the visualization.
VISUALISE part (required): Begins with VISUALISE (or VISUALIZE). Everything after this is the visualization query.

There are two patterns for combining SQL with VISUALISE:

Pattern A: SELECT → VISUALISE

The last SQL statement is a SELECT. Data flows from its result set into VISUALISE, which has no FROM clause.

SELECT name, score_a, score_b FROM 'dataset.csv' WHERE value > 50
VISUALISE score_a AS x, score_b AS y
[DRAW / PLACE / SCALE / FACET / PROJECT / LABEL clauses]

Works with any SQL that ends in a SELECT: bare SELECT, WITH...SELECT, UNION/INTERSECT/EXCEPT.

Pattern B: VISUALISE FROM

VISUALISE provides its own data source via FROM. Use when referencing a table, file, CTE, or built-in dataset directly without a trailing SELECT.

VISUALISE score_a AS x, score_b AS y FROM 'dataset.csv'
DRAW point

WITH summary AS (SELECT category, COUNT(*) AS n FROM 'dataset.csv' GROUP BY category)
VISUALISE category AS x, n AS y FROM summary
DRAW bar

Data sources

Data sources can appear in VISUALISE ... FROM or DRAW ... MAPPING ... FROM:

Table/CTE name (unquoted): FROM sales, FROM my_cte
File path (single-quoted string): FROM 'data.parquet', FROM 'data.csv'
Built-in datasets: FROM ggsql:penguins, FROM ggsql:airquality

VISUALISE clause

Marks the start of the visualization. Optionally defines global mappings inherited by all layers.

VISUALISE <mapping>, ... FROM <data-source>

Mapping forms

Explicit: column AS aesthetic — e.g. revenue AS y
Implicit: column — column name must match aesthetic name, e.g. x maps to x
Wildcard: * — all columns with names matching aesthetics are mapped
Constants: 'red' AS fill, 42 AS size — literal values mapped to aesthetic

VISUALISE bill_len AS x, bill_dep AS y, species AS fill FROM ggsql:penguins
VISUALISE * FROM my_table
VISUALISE FROM ggsql:penguins

DRAW clause

Defines a layer. Multiple DRAW clauses stack layers (first = bottom, last = top).

DRAW <layer-type>
  MAPPING <mapping>, ... FROM <data-source>
  REMAPPING <stat-property> AS <aesthetic>, ...
  SETTING <param> => <value>, ...
  FILTER <condition>
  PARTITION BY <column>, ...
  ORDER BY <column>, ...

All subclauses are optional if VISUALISE provides global mappings and data.

MAPPING

Same syntax as VISUALISE mappings. Layer mappings merge with global mappings (layer takes precedence). Can include FROM for layer-specific data.

Use null to prevent inheriting a global mapping: MAPPING null AS color

REMAPPING

For statistical layers (histogram, density, boxplot, violin, smooth, bar without y). Maps calculated statistics to aesthetics. Each layer documents its available stats and default remapping.

DRAW histogram
  MAPPING body_mass AS x
  REMAPPING density AS y  -- use density instead of default count

SETTING

Set literal aesthetic values or layer parameters. Aesthetics set here bypass scales.

DRAW point
  SETTING size => 5, opacity => 0.7, stroke => 'red'

Position adjustment is a special setting:

SETTING position => 'identity'   -- no adjustment (default for most)
SETTING position => 'stack'      -- stack (default for bar, histogram, area)
SETTING position => 'dodge'      -- side by side (default for boxplot, violin)
SETTING position => 'jitter'     -- random offset

FILTER

SQL WHERE condition applied to layer data. Content is passed to the database:

DRAW point
  FILTER sex = 'female' AND body_mass > 4000

PARTITION BY

Additional grouping columns beyond mapped discrete aesthetics:

DRAW line
  MAPPING Day AS x, Temp AS y
  PARTITION BY Month

ORDER BY

Controls record order (important for path layers):

DRAW path
  ORDER BY timestamp

PLACE clause

Creates annotation layers with literal values only (no data mappings). Supports tuples for multiple annotations.

PLACE <layer-type>
  SETTING <aesthetic/param> => <value>, ...

PLACE point SETTING x => 5, y => 10, color => 'red'
PLACE rule SETTING y => 70, linetype => 'dotted'
PLACE text SETTING x => (34, 44), y => (66, 49), label => ('Mean = 34', 'Mean = 44')

SCALE clause

Controls how data values are translated to aesthetic values. Sensible defaults are always provided.

SCALE <type> <aesthetic> FROM <input-range> TO <output-range> VIA <transform>
  SETTING <param> => <value>, ...
  RENAMING <value> => <label>, ...

All parts except aesthetic are optional.

Scale types (optional, placed before aesthetic)

CONTINUOUS — continuous numeric/temporal data
DISCRETE — categorical/string data
BINNED — bin continuous data into discrete groups (never auto-selected, must be explicit)
ORDINAL — ordered discrete data (never auto-selected, must be explicit)
IDENTITY — pass data through unchanged (no legend created)

If omitted, type is inferred from data.

Aesthetic names

Use the base name: x, y, fill, stroke, color (sets both fill and stroke), opacity, size, linewidth, linetype, shape, panel (facet), row, column.

For position families (xmin/xmax/xend/ymin/ymax/yend), scale with the base name: SCALE x ...

FROM (input range)

Continuous: FROM (min, max) — use null to infer from data: FROM (0, null)
Discrete: FROM ('A', 'B', 'C') — controls order, omitted values are nulled
Include null explicitly: FROM ('Torgersen', 'Biscoe', null)

TO (output range)

Array of values: TO ('red', 'blue', 'green'), TO (1, 6)
Named palette: TO viridis, TO dark2, TO tableau10

VIA (transform)

Continuous transforms: linear, log, log2, ln, exp10, exp2, exp, sqrt, square, asinh, pseudo_log, pseudo_log2, pseudo_ln, integer

Temporal transforms: date, datetime, time — automatically chosen for date/datetime/time columns.

Discrete transforms: string, bool

SCALE x VIA date        -- treat x as temporal
SCALE y VIA log         -- log transform
SCALE size VIA square   -- scale by radius not area

SETTING

Continuous/binned scales:

expand — expansion factor, scalar or (mult, add). Default 0.05. Only for x/y.
oob — out-of-bounds: 'keep' (default for x/y), 'censor' (default for others), 'squish'
breaks — integer count, array of values, or interval string for temporal (e.g. '2 months', 'week')
pretty — boolean, default true. Use Wilkinson's algorithm for nice breaks.
reverse — boolean, default false. Reverse scale direction.

Binned scales additionally:

closed — 'left' (default) or 'right'

Discrete/ordinal scales:

reverse — boolean

SCALE x SETTING breaks => '2 months'
SCALE y FROM (0, 100) SETTING oob => 'squish'
SCALE BINNED x SETTING breaks => 10, pretty => false

RENAMING

Rename break labels. Direct renaming, wildcard formatting, or both (direct takes priority):

RENAMING 'Adelie' => 'Pygoscelis adeliae', 'adelie' => null  -- direct / suppress
RENAMING * => '{} mm'                -- string interpolation
RENAMING * => '{:Title}'             -- formatters: Title, UPPER, lower, time %B %Y, num %.1f

FACET clause

Split data into small multiples.

FACET <column> BY <column>
  SETTING <param> => <value>, ...

ggsql

How to add

Drop this on your repo README

Related skills

xlsx

how-it-works

mem-search

weekly-digests

Get new Dados e Análise skills every Monday

ggsql Query Writer

Query structure

Pattern A: SELECT → VISUALISE

Pattern B: VISUALISE FROM

Data sources

VISUALISE clause

Mapping forms

DRAW clause

MAPPING

REMAPPING

SETTING

FILTER

PARTITION BY

ORDER BY

PLACE clause

SCALE clause

Scale types (optional, placed before aesthetic)

Aesthetic names

FROM (input range)

TO (output range)

VIA (transform)

SETTING

RENAMING

FACET clause

Comments · No comments