DataRobot Model Explainability Skill
This skill covers SHAP insights, XEMP prediction explanations, anomaly explanations, and model diagnostics.
SDK version: Use
datarobot>=3.6.0for the full API set in this skill (ShapDistributionswas added in 3.6;ShapMatrix,ShapImpact, andShapPrevieware available indatarobot>=3.4.0). Usefrom datarobot.insights import ShapMatrix, ...withentity_id=model_id— not legacydatarobot.models.ShapMatrix(project_id/dataset_id).ShapMatrix,ShapImpact,ShapPreview, andShapDistributionsare the canonical SHAP API. The olderdr.PredictionExplanations(XEMP-based) remains available but is the secondary path.
Quick Start
| Goal | API to use | Prerequisites |
|---|---|---|
| SHAP values for all features, all rows | ShapMatrix.create(entity_id=model_id) | None - universal SHAP |
| Per-row top-feature explanations | ShapPreview.create(entity_id=model_id) | None |
| Aggregated feature importance via SHAP | ShapImpact.create(entity_id=model_id) | None |
| SHAP value distributions across features | ShapDistributions.create(entity_id=model_id) | None |
| SHAP for a filtered segment | dr.DataSlice.create(...) + ShapMatrix.create(..., data_slice_id=...) | Data slice definition |
| XEMP-based prediction explanations | dr.PredictionExplanations.create(...) | Feature Impact; PE initialization; dataset uploaded |
| Anomaly explanations (time series) | AnomalyAssessmentRecord.compute(project_id, model_id, ...) | Anomaly model |
| ROC / lift / confusion (insights) | RocCurve.create(...) / LiftChart.create(...) / ConfusionMatrix.create(...) | Validation data |
| ROC / lift / confusion (Model helpers) | model.get_roc_curve() / model.get_lift_chart() / model.get_confusion_chart() | Validation data |
Universal SHAP is the preferred path - no dataset pre-upload or Feature Impact step required.
When to use this skill
Use this skill when you need to explain leaderboard model behavior, compute SHAP insights, use XEMP prediction explanations, analyze anomaly explanations, or retrieve model diagnostics.
Key capabilities
1. SHAP insights
- Compute
ShapMatrix,ShapPreview,ShapImpact, andShapDistributions - Filter insights with
dr.DataSlice
2. XEMP and anomaly explanations
- Use XEMP
dr.PredictionExplanationswhen specifically required - Retrieve time series anomaly assessment records and explanations
3. Diagnostics
- Retrieve ROC, lift, and confusion insights
- Use Model helpers for ROC, lift, confusion, and feature effects
Setup
import os
import datarobot as dr
from datarobot.insights import ShapMatrix, ShapImpact, ShapPreview, ShapDistributions
dr.Client(
token=os.environ["DATAROBOT_API_TOKEN"],
endpoint=os.environ.get("DATAROBOT_ENDPOINT", "https://app.datarobot.com/api/v2"),
)
Core API: datarobot.insights
import pandas as pd
from datarobot.insights import ShapMatrix, ShapImpact, ShapPreview, ShapDistributions
model_id = "YOUR_MODEL_ID"
matrix = ShapMatrix.create(entity_id=model_id)
df = pd.DataFrame(matrix.matrix, columns=matrix.columns)
impact = ShapImpact.create(entity_id=model_id)
preview = ShapPreview.create(entity_id=model_id)
distributions = ShapDistributions.create(entity_id=model_id)
Use ShapMatrix for full row-by-feature SHAP values, ShapPreview for compact top-driver rows,
ShapImpact for aggregated SHAP importance, and ShapDistributions for per-feature SHAP
distributions. Use source="externalTestSet" plus external_dataset_id for external datasets.
See references/shap_api_reference.md for parameters, exports, and limitations.
Secondary path: XEMP Prediction Explanations
Use dr.PredictionExplanations when XEMP explanations are specifically required (e.g., certain
regulatory contexts, or when SHAP is unavailable for the model type).
Prerequisites (all required before calling .create()):
- Feature Impact must be computed:
model.request_feature_impact()and wait - Prediction explanations initialized:
dr.PredictionExplanationsInitialization.create(...) - Scoring dataset uploaded to the AI Catalog
import datarobot as dr
model = dr.Model.get(project=project_id, model_id=model_id)
model.request_feature_impact().wait_for_completion()
dr.PredictionExplanationsInitialization.create(project_id=project_id, model_id=model_id)
dataset = dr.Dataset.upload("./data/scoring_data.csv")
pe_job = dr.PredictionExplanations.create(
project_id=project_id,
model_id=model_id,
dataset_id=dataset.id,
max_explanations=5, # top N features per row, up to 50
threshold_high=0.5, # only explain rows with prediction >= threshold
threshold_low=0.1, # only explain rows with prediction <= threshold
)
pe_obj = pe_job.get_result_when_complete()
Use pe_obj.get_rows(), pe_obj.get_all_as_dataframe(), or pe_obj.download_to_csv(...) to
retrieve results. For parameters, multiclass modes, and exposure-adjusted predictions, see
references/xemp_pe_reference.md.
Data slices for filtered insights
Use dr.DataSlice when the user asks to explain model behavior for a segment, such as a
region, product line, target class, or high-risk cohort. Pass the resulting data_slice_id into
the datarobot.insights SHAP APIs.
import datarobot as dr
from datarobot.insights import ShapMatrix
data_slice = dr.DataSlice.create(
name="high_income_customers",
filters=[{"operand": "income", "operator": ">", "values": 100000}],
project=project_id,
)
shap_matrix = ShapMatrix.create(
entity_id=model_id,
source="validation",
data_slice_id=data_slice.id,
)
Anomaly assessment (time series models)
For time series anomaly detection models, use AnomalyAssessmentRecord.
from datarobot.models.anomaly_assessment import AnomalyAssessmentRecord
record = AnomalyAssessmentRecord.compute(
project_id=project_id,
model_id=model_id,
backtest=0, # backtest index (int) or "holdout"
source="validation", # "training" or "validation" only
series_id=None, # required for multiseries projects
)
records = AnomalyAssessmentRecord.list(project_id=project_id, model_id=model_id)
latest = record.get_latest_explanations()
regions = record.get_predictions_preview().find_anomalous_regions()
explanations = record.get_explanations_data_in_regions(regions=regions)
ranged = record.get_explanations(
start_date="2024-01-01T00:00:00.000000Z",
end_date="2024-06-01T00:00:00.000000Z",
)
Model diagnostics
Use the same entity_id=model_id pattern as SHAP insights. FeatureEffects / partial dependence
is still retrieved through Model helpers (not in datarobot.insights).
Insights diagnostics (preferred — matches SHAP API)
from datarobot.insights import RocCurve, LiftChart, ConfusionMatrix
roc = RocCurve.create(entity_id=model_id)
lift = LiftChart.create(entity_id=model_id)
confusion = ConfusionMatrix.create(entity_id=model_id)
Model helpers (alternative)
model = dr.Model.get(project=project_id, model_id=model_id)
roc = model.get_roc_curve(source="validation")
lift = model.get_lift_chart(source="validation")
confusion = model.get_confusion_chart(source="validation")
# Feature Impact (non-SHAP) and Feature Effects (partial dependence for top features)
fi = model.get_feature_impact()
feature_effects = model.get_feature_effect(source="validation")
Interpreting SHAP values
- Positive value: feature pushes prediction higher than baseline
- Negative value: feature pushes prediction lower than baseline
- Magnitude: size of influence; larger absolute value = stronger effect
- Sum: all SHAP values for a row sum to
prediction - base_valuein the link-function space base_value: the model's mean prediction (the "no information" basel