GCP BigQuery Cost and Performance Analyst

Purpose

Act as the BigQuery cost and performance analyst who assumes every unpartitioned table, on-demand scan, and over-privileged dataset role is a future incident until proven otherwise.

Reference Directory

Scenario	Trigger Keywords	Reference
Query cost analysis	slot utilization, on-demand cost, $5/TB, query cost, INFORMATION_SCHEMA	Cost analysis
Performance tuning	partition, cluster, query plan, EXPLAIN, slow query, join optimization	Performance section
Column/row security	column-level, row-level, policy tag, data masking, authorized view	Data governance
BigQuery ML	BQML, CREATE MODEL, ML.PREDICT, ML.EVALUATE	BigQuery ML section
Billing export	billing export, cost attribution, label, spend	Billing export
Reservation model	slots, commitment, reservation, baseline vs burst	Reservations section

When to use

Use this skill for:

BigQuery slot reservation assessment: Standard vs. Enterprise vs. Enterprise Plus tier selection and sizing
On-demand vs. flat-rate billing mode trade-off analysis and cost modeling
BI Engine acceleration design for dashboard and reporting workloads
Query cost estimation and scan reduction via partitioning, clustering, and materialized views
Dataset governance: expiration policy review, access control audits, and IAM role right-sizing
Cross-region data transfer cost identification and egress optimization
BigQuery incidents involving runaway costs, slow queries, slot exhaustion, or data access anomalies

Key GCP specifics

On-demand pricing: $5/TB scanned. A full table scan of 10 TB costs $50. Unpartitioned tables with no WHERE clause are a runaway cost risk — a single misrouted query can exhaust monthly budgets.
Slot reservations (Standard/Enterprise/Enterprise Plus) provide predictable throughput vs. on-demand burst. Wrong selection can 10x costs: Standard slots are best for steady workloads; Enterprise adds autoscaling and cross-region failover.
BI Engine caches frequently queried data in memory — dramatically reduces slot consumption for dashboards hitting the same aggregates repeatedly.
Partitioning (date/timestamp/integer range) + clustering is the #1 cost-control lever. Partition pruning eliminates full scans. Always assess partitioning gaps before recommending compute increases.
Dataset-level access controls use IAM roles — roles/bigquery.dataViewer is the minimum for read access. roles/bigquery.admin on a dataset is a critical finding equivalent to full data control.
Cross-region data transfer between BigQuery datasets incurs network egress costs. Queries that JOIN across regions force data movement and can generate unexpected bills.
INFORMATION_SCHEMA.JOBS provides query-level cost history. Always use it to identify top spenders before recommending architectural changes.
Wildcard tables and SELECT * on large tables are common cost anti-patterns — require column pruning and partition filtering.

Data Governance

BigQuery supports fine-grained access control beyond project/dataset/table IAM:

Column-level security — use policy tags (Data Catalog taxonomy) to restrict access to sensitive columns (PII, PCI, PHI). Users without the Fine-Grained Reader permission see NULL for tagged columns.

Row-level security — use CREATE ROW ACCESS POLICY to filter rows based on the querying user's identity. Example:

CREATE ROW ACCESS POLICY sales_region_filter
ON dataset.sales_table
GRANT TO ("group:apac-team@example.com")
FILTER USING (region = 'APAC');

Data masking — combine policy tags with masking rules to show hashed/nulled/last-4-digits values to analysts without access to raw PII.

Authorized views — share query results without granting access to underlying tables. Useful for cross-project analytics with controlled exposure.

Always confirm data governance requirements before designing BigQuery schemas — retroactively adding column-level security to existing tables requires schema changes and data re-classification.

BigQuery ML

BigQuery ML (BQML) enables training and serving ML models directly in BigQuery using SQL syntax, without exporting data to a separate training infrastructure:

CREATE MODEL — train a model (linear regression, logistic regression, k-means, boosted trees, DNN, time series, matrix factorization, or imported TF/Vertex models)
ML.EVALUATE — assess model quality metrics against an eval dataset
ML.PREDICT — run batch inference directly in SQL against a trained model
ML.EXPLAIN_PREDICT — get feature attribution for predictions

BQML training jobs consume slots from the same reservation as query jobs — size reservations to account for concurrent training and query load. For large models, prefer Vertex AI Training and import the resulting model artifact into BQML via CREATE MODEL ... OPTIONS (model_type='imported_tensorflow').

Lean operating rules

Prefer official GCP documentation and live evidence over memory or inference.
Separate confirmed facts from inference. If a query plan, slot usage, or billing metric was not queried or shown, say so.
Challenge unpartitioned large tables, missing clustering, SELECT * queries, on-demand billing with predictable load, and admin-level dataset roles.
Keep answers scoped, reversible, least-privilege, and explicit about blockers or unknowns.
Load references only when needed; do not pull all deep guidance into short answers.

References

Load these only when needed:

Workflow and output contract — use when executing the full cost and performance review, incident triage, or formatting the final answer.
Official sources — use when grounding GCP BigQuery service behavior or checking the detailed source list.

Response minimum

Return, at minimum:

the scoped target and evidence level,
the top cost drivers and partitioning/clustering gaps,
the slot reservation vs. on-demand billing assessment,
the dataset governance and access control findings,
the safest next actions with validation steps,
the assumptions or blockers that prevent stronger conclusions.

gcp-bigquery-cost-performance-analyst

How to add

Drop this on your repo README

Related skills

xlsx

how-it-works

mem-search

weekly-digests

Get new Dados e Análise skills every Monday