Anofox Forecast DuckDB Extension — Cheat Sheet
Extension: anofox_forecast v0.4.6 | DuckDB: v1.4.x+ | Dual naming: ts_* and anofox_fcst_ts_*
Installation
LOAD anofox_forecast;
-- All functions available as ts_* and anofox_fcst_ts_* (identical)
Critical Gotchas
-
Seasonality is NOT auto-detected. You must pass
seasonal_periodexplicitly. Detect first withts_detect_periods_by, then pass to forecasting. -
DO NOT chain
_bytable functions in CTEs. Returns 0 rows silently under parallel execution. AlwaysCREATE TABLEbetween pipeline steps:-- BROKEN (0 rows): WITH step1 AS (SELECT * FROM ts_fill_gaps_by(...)) SELECT * FROM ts_fill_nulls_const_by('step1', ...); -- CORRECT: CREATE TABLE step1 AS SELECT * FROM ts_fill_gaps_by(...); SELECT * FROM ts_fill_nulls_const_by('step1', ...); -
Model names are case-sensitive.
'AutoETS'works,'autoets'errors. -
ts_cv_forecast_byrequires pre-created folds. Input table must havefold_idandsplitcolumns (fromts_cv_folds_byorts_cv_split_by). Passing raw data throws a clear error. -
ts_forecast_byrequires frequency as 7th positional parameter. No default — you must specify it:-- WRONG: missing frequency SELECT * FROM ts_forecast_by('sales', id, date, val, 'Naive', 12); -- CORRECT: SELECT * FROM ts_forecast_by('sales', id, date, val, 'Naive', 12, '1d'); -
Metric
_bytable macros are deprecated. Use scalar functions withGROUP BY:-- Deprecated: SELECT * FROM ts_mae_by(...) -- Use instead: SELECT id, ts_mae(LIST(y ORDER BY ds), LIST(yhat ORDER BY ds)) AS mae FROM results GROUP BY id; -
Always use
ORDER BYinLIST()for temporal correctness:LIST(value ORDER BY date) -- correct LIST(value) -- wrong: order not guaranteed
Three API Styles
1. Table Macros (primary — use these)
Operate on table names as strings. Handle grouping automatically.
SELECT * FROM ts_forecast_by('sales', product_id, date, revenue, 'AutoETS', 14, '1d',
MAP{'seasonal_period': '7'});
2. Scalar Functions
Operate on arrays. Use with LIST() aggregation and GROUP BY.
SELECT product_id,
ts_mae(LIST(actual ORDER BY date), LIST(forecast ORDER BY date)) AS mae
FROM results GROUP BY product_id;
3. Aggregate Functions
Return structs. Access fields with (result).field_name.
SELECT product_id, (ts_stats(LIST(value ORDER BY date))).*
FROM sales GROUP BY product_id;
Parameter Syntax
STRUCT (recommended)
MAP{'seasonal_period': '7'}
MAP{'seasonal_periods': '[7, 365]'}
MAP{'method': 'autoperiod', 'max_period': '28'}
All param values are strings (even numbers). Arrays use JSON syntax: '[7, 365]'.
Frequency Strings
| Format | Examples |
|---|---|
| Polars style | '1d', '1h', '30m', '1w', '1mo', '1q', '1y' |
| DuckDB INTERVAL | '1 day', '1 hour' |
| Raw integer | '1', '7' (interpreted as days) |
Common Workflows
1. Basic Forecast
-- Forecast 14 days ahead with weekly seasonality
SELECT * FROM ts_forecast_by(
'sales', product_id, date, revenue,
'HoltWinters', 14, '1d',
MAP{'seasonal_period': '7'}
);
2. Data Preparation Pipeline (CREATE TABLE between steps!)
-- Step 1: Fill gaps
CREATE TABLE gaps_filled AS
SELECT * FROM ts_fill_gaps_by('raw_data', product_id, date, value, '1d');
-- Step 2: Impute NULLs
CREATE TABLE nulls_filled AS
SELECT * FROM ts_fill_nulls_const_by('gaps_filled', product_id, date, value, 0.0);
-- Step 3: Drop short series
CREATE TABLE clean_data AS
SELECT * FROM ts_drop_short_by('nulls_filled', product_id, 20);
3. Detect Seasonality → Forecast
-- Step 1: Detect
SELECT id, (periods).primary_period
FROM ts_detect_periods_by('sales', product_id, date, value, MAP{});
-- Returns e.g. primary_period = 7 (weekly)
-- Step 2: Forecast with detected period
SELECT * FROM ts_forecast_by(
'sales', product_id, date, value,
'AutoETS', 14, '1d', MAP{'seasonal_period': '7'}
);
4. Cross-Validation & Model Comparison
-- Step 1: Create folds
CREATE TABLE cv_folds AS
SELECT * FROM ts_cv_folds_by('data', unique_id, ds, y, 3, 12, MAP{});
-- Step 2: Forecast per fold (for each model)
CREATE TABLE cv_naive AS
SELECT * FROM ts_cv_forecast_by('cv_folds', unique_id, ds, y, 'Naive', MAP{});
CREATE TABLE cv_autoets AS
SELECT * FROM ts_cv_forecast_by('cv_folds', unique_id, ds, y, 'AutoETS',
MAP{'seasonal_period': '7'});
-- Step 3: Compare metrics
SELECT 'Naive' AS model,
ts_mae(LIST(y ORDER BY ds), LIST(yhat ORDER BY ds)) AS mae,
ts_rmse(LIST(y ORDER BY ds), LIST(yhat ORDER BY ds)) AS rmse
FROM cv_naive GROUP BY ALL
UNION ALL
SELECT 'AutoETS',
ts_mae(LIST(y ORDER BY ds), LIST(yhat ORDER BY ds)),
ts_rmse(LIST(y ORDER BY ds), LIST(yhat ORDER BY ds))
FROM cv_autoets GROUP BY ALL;
5. Full Production Pipeline
-- 1. Quality check
SELECT id, (stats).length, (stats).n_nulls, (stats).n_gaps
FROM ts_stats_by('raw', product_id, date, value, '1d');
-- 2. Prep (materialize each step!)
CREATE TABLE step1 AS
SELECT * FROM ts_fill_gaps_by('raw', product_id, date, value, '1d');
CREATE TABLE step2 AS
SELECT * FROM ts_fill_nulls_const_by('step1', product_id, date, value, 0.0);
CREATE TABLE clean AS
SELECT * FROM ts_drop_short_by('step2', product_id, 20);
-- 3. Detect seasonality
SELECT id, (periods).primary_period
FROM ts_detect_periods_by('clean', product_id, date, value, MAP{});
-- 4. Backtest
CREATE TABLE cv_folds AS
SELECT * FROM ts_cv_folds_by('clean', product_id, date, value, 5, 14, MAP{});
CREATE TABLE backtest AS
SELECT * FROM ts_cv_forecast_by('cv_folds', product_id, date, value, 'AutoETS',
MAP{'seasonal_period': '7'});
-- 5. Evaluate
SELECT product_id,
ts_mae(LIST(y ORDER BY date), LIST(yhat ORDER BY date)) AS mae,
ts_rmse(LIST(y ORDER BY date), LIST(yhat ORDER BY date)) AS rmse
FROM backtest GROUP BY product_id;
-- 6. Forecast
CREATE TABLE forecasts AS
SELECT * FROM ts_forecast_by('clean', product_id, date, value,
'AutoETS', 14, '1d', MAP{'seasonal_period': '7'});
-- 7. Conformal intervals
CREATE TABLE calibration AS
SELECT * FROM ts_conformal_calibrate('backtest', value, yhat, {'alpha': 0.1});
SELECT * FROM ts_conformal_apply_by(
'forecasts', product_id, yhat,
(SELECT conformity_score FROM calibration)
);
Model Quick Reference (32 Models)
Automatic Selection (6)
| Model | Optional Params | Best For |
|---|---|---|
AutoETS | seasonal_period | Unknown patterns (default pick) |
AutoARIMA | seasonal_period | Unknown patterns, ARIMA family |
AutoTheta | seasonal_period | Unknown patterns, Theta family |
AutoMFLES | seasonal_periods[] | Multiple seasonalities |
AutoMSTL | seasonal_periods[] | Multiple seasonalities |
AutoTBATS | seasonal_periods[] | Multiple seasonalities |
Basic (6)
| Model | Required | Optional | Best For |
|---|---|---|---|
Naive | — | — | Baseline benchmark |
SMA | — | window (def: 5) | Smoothed baseline |
SeasonalNaive | seasonal_period | — | Seasonal baseline |
SES | — | alpha (def: 0.3) | No trend, no seasonality |
SESOptimized | — | — | Optimized SES |
RandomWalkDrift | — | — | Trend without seasonality |
Exponential Smoothing (4)
| Model | Required | Optional |
|---|---|---|
Holt | — | alpha, beta |
HoltWinters | seasonal_period | alpha, beta, gamma |
SeasonalES | seasonal_period | alpha, gamma |
SeasonalESOptimized | seasonal_period | — |
Theta Methods (5)
| Model | Optional |
|---|---|
Theta | seasonal_period, theta |
OptimizedTheta | seasonal_period |
DynamicTheta | `sea |