DataRobot Predictions Skill

This skill provides comprehensive guidance for working with DataRobot predictions, including real-time predictions, batch scoring, and generating prediction datasets.

Quick Start

Most common use case: Generate predictions for a deployment

Get deployment features: get_deployment_features(deployment_id) to understand required columns
Generate template: generate_prediction_data_template(deployment_id, n_rows) to create CSV structure
Make predictions: Use deployment.predict_batch(...) (works for both single-row “real-time” and batch scoring)

Example: "Generate a prediction dataset template for deployment abc123 with 10 rows"

To also explain predictions: pass --max-explanations N to make_prediction.py (or the max_explanations=N kwarg in code). See Prediction Explanations below.

When to use this skill

Use this skill when you need to:

Make predictions from deployed DataRobot models
Explain individual predictions from a deployment (SHAP or XEMP, per-row)
Generate prediction dataset templates
Validate prediction data before scoring
Understand deployment feature requirements
Perform batch predictions on large datasets
Get sample training data to understand expected formats

For post-hoc explanations against a training project / leaderboard model (not a deployment), use the datarobot-model-explainability skill instead. This skill covers deployment-time explanations returned alongside scoring.

Key capabilities

1. Understanding Deployment Requirements

Before making predictions, you need to understand what features a deployment requires:

Feature names and types: Know which columns are needed (numeric, categorical, text, date)
Feature importance: Understand which features matter most
Target information: Know what you're predicting
Time series configuration: If applicable, understand datetime columns and series IDs

2. Generating Prediction Datasets

Create properly formatted prediction datasets:

Generate CSV templates with all required columns
Include sample values appropriate for each feature type
Add metadata comments explaining the model structure
Ensure correct column ordering

3. Validating Prediction Data

Validate datasets before making predictions:

Check for missing required features
Verify data types match expected types
Identify missing low-importance features (warnings)
Note extra columns that will be ignored

4. Making Predictions

Execute predictions using various methods:

Real-time predictions: Fast, synchronous predictions for individual records
Batch predictions: Process large datasets efficiently
Time series predictions: Handle forecasting scenarios with proper datetime handling

Workflow examples

Example 1: Generate prediction dataset for a new scenario

User request: "I want to predict sales for next week for store_A with temperatures of 75°F each day and no promotions."

Agent workflow:

Get deployment features to understand required columns
Generate a prediction data template with 7 rows (one week)
Fill in the template with user's specific values:
- Set temperature = 75 for all rows
- Set promotion = 0 for all rows
- Set store_id = "store_A" for all rows
- Set dates for next 7 days
Validate the data to ensure it's correct
Make predictions using the validated dataset

Example 2: Batch scoring a CSV file

User request: "Score all records in my prediction_data.csv file using deployment abc123."

Agent workflow:

Validate the CSV file structure matches deployment requirements
Upload the file or provide file path
Submit batch prediction job
Monitor job status
Retrieve and return prediction results

Using DataRobot SDK

This skill guides you to use the DataRobot Python SDK directly. Install the SDK if needed:

pip install datarobot

Key SDK Operations

Use these DataRobot SDK methods to work with predictions:

Deployment Information:

dr.Deployment.get(deployment_id) - Get deployment details
deployment.get_features() - Get required features (name/type/importance)

Predictions:

deployment.predict_batch(source) - Convenience batch prediction API (CSV path, file object, or pandas DataFrame)
dr.BatchPredictionJob.score(deployment=deployment, ...) - Advanced batch prediction control
job.get_result_when_complete() - Wait for batch scoring to finish and download results

Data Management:

dr.Dataset.create_from_file(file_path) - Upload dataset
dr.Dataset.get(dataset_id) - Get dataset info

See the Common Patterns section below for complete examples.

Prediction Explanations

Deployments can return per-row explanations (top feature contributions) alongside predictions. Two algorithms are available depending on how the deployment was configured:

SHAP (shap): SHapley Additive exPlanations. Available on tree-based models when SHAP was enabled at deployment time. Returns signed contributions in the model's score space.
XEMP (xemp): DataRobot's eXplainable AI for the eXact Model Prediction. Default when SHAP is not enabled. Returns top-N strongest features with a qualitative strength (+++, --, etc.).

If you omit explanation_algorithm, the deployment's default is used.

How to request explanations

Pass max_explanations=N (and any optional filters) when calling datarobot_predict.deployment.predict:

import datarobot as dr
import pandas as pd
from datarobot_predict.deployment import predict as dr_predict

dr.Client(token=..., endpoint=...)
deployment = dr.Deployment.get("abc123")

result = dr_predict(
    deployment=deployment,
    data_frame=pd.DataFrame([{"feature1": 10, "feature2": 20}]),
    max_explanations=3,                  # top 3 contributors per row
    explanation_algorithm="shap",        # or "xemp"; omit for deployment default
    # threshold_high=0.8,                # optional: only explain rows scoring > 0.8
    # threshold_low=0.2,                 # optional: only explain rows scoring < 0.2
    # passthrough_columns="all",         # optional: echo input columns through to output
)
print(result.dataframe.to_dict(orient="records"))

The result DataFrame includes columns like EXPLANATION_1_FEATURE_NAME, EXPLANATION_1_ACTUAL_VALUE, EXPLANATION_1_STRENGTH, EXPLANATION_1_QUALITATIVE_STRENGTH for each of the top-N contributors.

Parameter reference

Parameter	Purpose
`max_explanations`	Top-N contributors per row. `0` (default) disables explanations.
`max_ngram_explanations`	Text models only: cap text-segment explanations per row.
`threshold_high`	Only explain rows with prediction probability above this (0–1).
`threshold_low`	Only explain rows with prediction probability below this (0–1).
`explanation_algorithm`	`"shap"` or `"xemp"`; omit to use deployment default.
`passthrough_columns`	`"all"` or set of input column names to echo through to output.

CLI shortcut

python scripts/make_prediction.py abc123 '{"feature1": 10, "feature2": 20}' \
    --max-explanations 3 --explanation-algorithm shap

When to use which threshold

threshold_high is useful when only positive (high-risk / fraud / churn-likely) predictions need explaining — saves compute on a large batch.
threshold_low is the mirror image for low-probability rows.
Setting both restricts explanations to rows outside the [low, high] band.

Common errors

"Prediction explanations not enabled": the deployment was created without explanations support. Re-deploy the model with explanations enabled, or use a deployment that has them.
max_explanations ignored / no explanation columns in output: confirm you're calling `datarobot_predict.deployment.pre

datarobot-predictions

How to add

Drop this on your repo README

Related skills

xlsx

mem-search

weekly-digests

how-it-works

Get new Dados e Análise skills every Monday