DataRobot Predictions Skill
This skill provides comprehensive guidance for working with DataRobot predictions, including real-time predictions, batch scoring, and generating prediction datasets.
Quick Start
Most common use case: Generate predictions for a deployment
- Get deployment features:
get_deployment_features(deployment_id)to understand required columns - Generate template:
generate_prediction_data_template(deployment_id, n_rows)to create CSV structure - Make predictions: Use
deployment.predict_batch(...)(works for both single-row “real-time” and batch scoring)
Example: "Generate a prediction dataset template for deployment abc123 with 10 rows"
To also explain predictions: pass --max-explanations N to make_prediction.py (or the
max_explanations=N kwarg in code). See Prediction Explanations below.
When to use this skill
Use this skill when you need to:
- Make predictions from deployed DataRobot models
- Explain individual predictions from a deployment (SHAP or XEMP, per-row)
- Generate prediction dataset templates
- Validate prediction data before scoring
- Understand deployment feature requirements
- Perform batch predictions on large datasets
- Get sample training data to understand expected formats
For post-hoc explanations against a training project / leaderboard model (not a deployment), use the
datarobot-model-explainabilityskill instead. This skill covers deployment-time explanations returned alongside scoring.
Key capabilities
1. Understanding Deployment Requirements
Before making predictions, you need to understand what features a deployment requires:
- Feature names and types: Know which columns are needed (numeric, categorical, text, date)
- Feature importance: Understand which features matter most
- Target information: Know what you're predicting
- Time series configuration: If applicable, understand datetime columns and series IDs
2. Generating Prediction Datasets
Create properly formatted prediction datasets:
- Generate CSV templates with all required columns
- Include sample values appropriate for each feature type
- Add metadata comments explaining the model structure
- Ensure correct column ordering
3. Validating Prediction Data
Validate datasets before making predictions:
- Check for missing required features
- Verify data types match expected types
- Identify missing low-importance features (warnings)
- Note extra columns that will be ignored
4. Making Predictions
Execute predictions using various methods:
- Real-time predictions: Fast, synchronous predictions for individual records
- Batch predictions: Process large datasets efficiently
- Time series predictions: Handle forecasting scenarios with proper datetime handling
Workflow examples
Example 1: Generate prediction dataset for a new scenario
User request: "I want to predict sales for next week for store_A with temperatures of 75°F each day and no promotions."
Agent workflow:
- Get deployment features to understand required columns
- Generate a prediction data template with 7 rows (one week)
- Fill in the template with user's specific values:
- Set temperature = 75 for all rows
- Set promotion = 0 for all rows
- Set store_id = "store_A" for all rows
- Set dates for next 7 days
- Validate the data to ensure it's correct
- Make predictions using the validated dataset
Example 2: Batch scoring a CSV file
User request: "Score all records in my prediction_data.csv file using deployment abc123."
Agent workflow:
- Validate the CSV file structure matches deployment requirements
- Upload the file or provide file path
- Submit batch prediction job
- Monitor job status
- Retrieve and return prediction results
Using DataRobot SDK
This skill guides you to use the DataRobot Python SDK directly. Install the SDK if needed:
pip install datarobot
Key SDK Operations
Use these DataRobot SDK methods to work with predictions:
Deployment Information:
dr.Deployment.get(deployment_id)- Get deployment detailsdeployment.get_features()- Get required features (name/type/importance)
Predictions:
deployment.predict_batch(source)- Convenience batch prediction API (CSV path, file object, or pandas DataFrame)dr.BatchPredictionJob.score(deployment=deployment, ...)- Advanced batch prediction controljob.get_result_when_complete()- Wait for batch scoring to finish and download results
Data Management:
dr.Dataset.create_from_file(file_path)- Upload datasetdr.Dataset.get(dataset_id)- Get dataset info
See the Common Patterns section below for complete examples.
Prediction Explanations
Deployments can return per-row explanations (top feature contributions) alongside predictions. Two algorithms are available depending on how the deployment was configured:
- SHAP (
shap): SHapley Additive exPlanations. Available on tree-based models when SHAP was enabled at deployment time. Returns signed contributions in the model's score space. - XEMP (
xemp): DataRobot's eXplainable AI for the eXact Model Prediction. Default when SHAP is not enabled. Returns top-N strongest features with a qualitative strength (+++,--, etc.).
If you omit explanation_algorithm, the deployment's default is used.
How to request explanations
Pass max_explanations=N (and any optional filters) when calling datarobot_predict.deployment.predict:
import datarobot as dr
import pandas as pd
from datarobot_predict.deployment import predict as dr_predict
dr.Client(token=..., endpoint=...)
deployment = dr.Deployment.get("abc123")
result = dr_predict(
deployment=deployment,
data_frame=pd.DataFrame([{"feature1": 10, "feature2": 20}]),
max_explanations=3, # top 3 contributors per row
explanation_algorithm="shap", # or "xemp"; omit for deployment default
# threshold_high=0.8, # optional: only explain rows scoring > 0.8
# threshold_low=0.2, # optional: only explain rows scoring < 0.2
# passthrough_columns="all", # optional: echo input columns through to output
)
print(result.dataframe.to_dict(orient="records"))
The result DataFrame includes columns like EXPLANATION_1_FEATURE_NAME,
EXPLANATION_1_ACTUAL_VALUE, EXPLANATION_1_STRENGTH, EXPLANATION_1_QUALITATIVE_STRENGTH for
each of the top-N contributors.
Parameter reference
| Parameter | Purpose |
|---|---|
max_explanations | Top-N contributors per row. 0 (default) disables explanations. |
max_ngram_explanations | Text models only: cap text-segment explanations per row. |
threshold_high | Only explain rows with prediction probability above this (0–1). |
threshold_low | Only explain rows with prediction probability below this (0–1). |
explanation_algorithm | "shap" or "xemp"; omit to use deployment default. |
passthrough_columns | "all" or set of input column names to echo through to output. |
CLI shortcut
python scripts/make_prediction.py abc123 '{"feature1": 10, "feature2": 20}' \
--max-explanations 3 --explanation-algorithm shap
When to use which threshold
threshold_highis useful when only positive (high-risk / fraud / churn-likely) predictions need explaining — saves compute on a large batch.threshold_lowis the mirror image for low-probability rows.- Setting both restricts explanations to rows outside the
[low, high]band.
Common errors
- "Prediction explanations not enabled": the deployment was created without explanations support. Re-deploy the model with explanations enabled, or use a deployment that has them.
max_explanationsignored / no explanation columns in output: confirm you're calling `datarobot_predict.deployment.pre