DataRobot Model Training Skill
This skill provides guidance for the complete model training workflow in DataRobot, from project creation through model selection and validation.
Quick Start
Most common use case: Create a project and train models
- Upload dataset:
upload_dataset(file_path, dataset_name)to upload training data - Create project:
create_project(dataset_id, project_name)to create new project - Start training:
start_automl(project_id, mode)to begin AutoML training
Example: "Create a new project with sales_data.csv, set 'revenue' as target, and start Quick AutoML training"
When to use this skill
Use this skill when you need to:
- Create new DataRobot projects
- Upload training datasets
- Configure AutoML experiments
- Monitor training progress
- Select and compare models
- Understand feature engineering results
- Export trained models
Key capabilities
1. Project Management
- Create new projects with appropriate settings
- Upload datasets (CSV, Parquet, database connections)
- Configure project settings (target, partitioning, time series)
- Manage multiple projects and experiments
2. AutoML Configuration
- Set training modes (Quick, Manual, Comprehensive)
- Configure feature engineering options
- Set time limits and resource constraints
- Choose algorithms and model types
3. Training Execution
- Start AutoML training runs
- Monitor training progress
- Handle training errors and warnings
- Pause/resume training if needed
4. Model Analysis
- Compare model performance metrics
- Review feature importance
- Analyze model insights and explanations
- Select best models for deployment
Workflow examples
Example 1: Create and train a new project
User request: "Create a new project using my sales_data.csv file, predict 'revenue' as the target, and start AutoML training."
Agent workflow:
- Upload the dataset to DataRobot
- Create a new project with the dataset
- Set 'revenue' as the target variable
- Configure project settings (detect partitioning, handle time series if needed)
- Start AutoML training with appropriate mode
- Monitor training progress
- Report when training completes with top model metrics
Example 2: Configure advanced training options
User request: "Train a model with time series settings: datetime column 'date', series ID 'store_id', forecast window 1-7 days."
Agent workflow:
- Create project with time series configuration
- Set datetime column and series ID columns
- Configure forecast window (1-7 days)
- Set appropriate time series validation
- Start training with time series-aware algorithms
- Monitor progress and report results
Using DataRobot SDK
This skill guides you to use the DataRobot Python SDK directly. Install the SDK if needed:
pip install datarobot
Key SDK Operations
Use these DataRobot SDK methods for model training:
Projects:
dr.Project.create_from_dataset(dataset_id, project_name)- Create projectdr.Project.get(project_id)- Get project detailsdr.Project.list()- List all projectsproject.set_target(target_column)- Set target variable
Training:
project.start(autopilot_on=True)- Start AutoML trainingproject.get_status()- Check training statusdr.Model.list(project_id)- List trained modelsdr.Model.get(model_id)- Get model details
Model Analysis:
model.get_metrics()- Get performance metricsmodel.get_feature_impact()- Get feature importance
See the Common Patterns section below for complete examples.
Helper Scripts
This skill includes executable helper scripts that Claude can run directly:
scripts/create_project.py- Create a new project from a datasetscripts/start_training.py- Start AutoML trainingscripts/list_models.py- List trained models with metrics
Usage example:
# Create project and set target
python scripts/create_project.py dataset_123 "Sales Prediction" revenue
# Start training
python scripts/start_training.py project_456 Quick
# List models
python scripts/list_models.py project_456 AUC
Claude can run these scripts directly or use them as reference when writing code.
Best practices
- Data preparation: Ensure data is clean and properly formatted before upload
- Target selection: Choose appropriate target variable (avoid leakage)
- Partitioning: Use proper partitioning for time-aware or grouped data
- Feature engineering: Let AutoML handle feature engineering, but review results
- Model selection: Compare multiple models, not just the top performer
- Validation: Review validation strategy and ensure it matches your use case
Common patterns
Pattern 1: Standard classification/regression
import datarobot as dr
import os
# Initialize client
client = dr.Client(
token=os.getenv("DATAROBOT_API_TOKEN"),
endpoint=os.getenv("DATAROBOT_ENDPOINT")
)
# Upload dataset
dataset = dr.Dataset.create_from_file(
file_path="training_data.csv",
name="Sales Data"
)
# Create project
project = dr.Project.create_from_dataset(
dataset_id=dataset.id,
project_name="Sales Prediction"
)
# Set target
project.set_target(
target="revenue",
mode=dr.AUTOPILOT_MODE.QUICK
)
# Start AutoML (Quick mode)
project.start(autopilot_on=True, max_wait=3600)
# Monitor training
while project.get_status()['status'] not in ['complete', 'error']:
import time
time.sleep(30)
project.get_status()
# Get trained models
models = dr.Model.list(project.id)
best_model = max(models, key=lambda m: m.metrics.get('AUC', 0))
print(f"Best model: {best_model.id}, AUC: {best_model.metrics.get('AUC')}")
Pattern 2: Time series forecasting
import datarobot as dr
# Upload dataset
dataset = dr.Dataset.create_from_file("sales_data.csv", "Sales Forecast Data")
# Create project
project = dr.Project.create_from_dataset(
dataset_id=dataset.id,
project_name="Sales Forecast"
)
# Configure time series settings
project.set_target(
target="sales",
mode=dr.AUTOPILOT_MODE.COMPREHENSIVE,
partitioning_method=dr.PARTITIONING_METHOD.DATETIME,
datetime_partition_column="date",
multiseries_id_columns=["store_id"],
forecast_window_start=1,
forecast_window_end=7
)
# Start training
project.start(autopilot_on=True, max_wait=7200)
# Wait for completion and get results
project.wait_for_completion()
models = dr.Model.list(project.id)
Model selection criteria
When selecting models, consider:
- Performance metrics: Accuracy, AUC, RMSE, MAPE (depending on problem type)
- Prediction speed: Important for real-time deployments
- Interpretability: Some models are more explainable
- Feature requirements: Some models need specific feature types
- Deployment constraints: Consider model size and resource requirements
Error handling
Common errors and solutions:
- Dataset upload failures: Check file format, size limits, encoding
- Target errors: Ensure target column exists and has appropriate values
- Training failures: Check data quality, feature types, missing values
- Timeout errors: Adjust time limits or use Quick mode for initial exploration
SDK Setup
Install DataRobot SDK
pip install datarobot
Initialize Client
import datarobot as dr
import os
client = dr.Client(
token=os.getenv("DATAROBOT_API_TOKEN"),
endpoint=os.getenv("DATAROBOT_ENDPOINT", "https://app.datarobot.com")
)
Resources
- DataRobot Python SDK Documentation
- DataRobot AutoML Documentation
- General Modeling Documentation – Time Series
- [General Modeling Documentation – Feature Engineering](https://docs.datarobot.com/en/docs/modeling/index.h