MLOps Patterns

Operationalize machine learning models from experimentation to production deployment and monitoring.

Purpose

Provide strategic guidance for ML engineers and platform teams to build production-grade ML infrastructure. Cover the complete lifecycle: experiment tracking, model registry, feature stores, deployment patterns, pipeline orchestration, and monitoring.

When to Use This Skill

Use this skill when:

Designing MLOps infrastructure for production ML systems
Selecting experiment tracking platforms (MLflow, Weights & Biases, Neptune)
Implementing feature stores for online/offline feature serving
Choosing model serving solutions (Seldon Core, KServe, BentoML, TorchServe)
Building ML pipelines for training, evaluation, and deployment
Setting up model monitoring and drift detection
Establishing model governance and compliance frameworks
Optimizing ML inference costs and performance
Migrating from notebooks to production ML systems
Implementing continuous training and automated retraining

Core Concepts

1. Experiment Tracking

Track experiments systematically to ensure reproducibility and collaboration.

Key Components:

Parameters: Hyperparameters logged for each training run
Metrics: Performance measures tracked over time (accuracy, loss, F1)
Artifacts: Model weights, plots, datasets, configuration files
Metadata: Tags, descriptions, Git commit SHA, environment details

Platform Comparison:

MLflow (Open-source standard):

Framework-agnostic (PyTorch, TensorFlow, scikit-learn, XGBoost)
Self-hosted or cloud-agnostic deployment
Integrated model registry
Basic UI, adequate for most use cases
Free, requires infrastructure management

Weights & Biases (SaaS, collaboration-focused):

Advanced visualization and dashboards
Integrated hyperparameter optimization (Sweeps)
Excellent team collaboration features
SaaS pricing scales with usage
Best-in-class UI

Neptune.ai (Enterprise-grade):

Enterprise features (RBAC, audit logs, compliance)
Integrated production monitoring
Higher cost than W&B
Good for regulated industries

Selection Criteria:

Open-source requirement → MLflow
Team collaboration critical → Weights & Biases
Enterprise compliance (RBAC, audits) → Neptune.ai
Hyperparameter optimization primary → Weights & Biases (Sweeps)

For detailed comparison and decision framework, see references/experiment-tracking.md.

2. Model Registry and Versioning

Centralize model artifacts with version control and stage management.

Model Registry Components:

Model artifacts (weights, serialized models)
Training metrics (accuracy, F1, AUC)
Hyperparameters used during training
Training dataset version
Feature schema (input/output signatures)
Model cards (documentation, use cases, limitations)

Stage Management:

None: Newly registered model
Staging: Testing in pre-production environment
Production: Serving live traffic
Archived: Deprecated, retained for compliance

Versioning Strategies:

Semantic Versioning for Models:

Major version (v2.0.0): Breaking change in input/output schema
Minor version (v1.1.0): New feature, backward-compatible
Patch version (v1.0.1): Bug fix, model retrained on new data

Git-Based Versioning:

Model code in Git (training scripts, configuration)
Model weights in DVC (Data Version Control) or Git-LFS
Reproducibility via commit SHA + data version hash

For model lineage tracking and registry patterns, see references/model-registry.md.

3. Feature Stores

Centralize feature engineering to ensure consistency between training and inference.

Problem Addressed: Training/serving skew

Training: Features computed with future knowledge (data leakage)
Inference: Features computed with only past data
Result: Model performs well in training but fails in production

Feature Store Solution:

Online Feature Store:

Purpose: Low-latency feature retrieval for real-time inference
Storage: Redis, DynamoDB, Cassandra (key-value stores)
Latency: Sub-10ms for feature lookup
Use Case: Real-time predictions (fraud detection, recommendations)

Offline Feature Store:

Purpose: Historical feature data for training and batch inference
Storage: Parquet files (S3/GCS), data warehouses (Snowflake, BigQuery)
Latency: Seconds to minutes (batch retrieval)
Use Case: Model training, backtesting, batch predictions

Point-in-Time Correctness:

Ensures no future data leakage during training
Feature values at time T only use data available before time T
Critical for avoiding overly optimistic training metrics

Platform Comparison:

Feast (Open-source, cloud-agnostic):

Most popular open-source feature store
Supports Redis, DynamoDB, Datastore (online) and Parquet, BigQuery, Snowflake (offline)
Cloud-agnostic, no vendor lock-in
Active community, growing adoption

Tecton (Managed, production-grade):

Feast-compatible API
Fully managed service
Integrated monitoring and governance
Higher cost, enterprise-focused

SageMaker Feature Store (AWS):

Integrated with AWS ecosystem
Managed online/offline stores
AWS lock-in

Databricks Feature Store (Databricks):

Unity Catalog integration
Delta Lake for offline storage
Databricks ecosystem lock-in

Selection Criteria:

Open-source, cloud-agnostic → Feast
Managed solution, production-grade → Tecton
AWS ecosystem → SageMaker Feature Store
Databricks users → Databricks Feature Store

For feature engineering patterns and implementation, see references/feature-stores.md.

4. Model Serving Patterns

Deploy models for synchronous, asynchronous, batch, or streaming inference.

Serving Patterns:

REST API Deployment:

Pattern: HTTP endpoint for synchronous predictions
Latency: <100ms acceptable
Use Case: Request-response applications
Tools: Flask, FastAPI, BentoML, Seldon Core

gRPC Deployment:

Pattern: High-performance RPC for low-latency inference
Latency: <10ms target
Use Case: Microservices, latency-critical applications
Tools: TensorFlow Serving, TorchServe, Seldon Core

Batch Inference:

Pattern: Process large datasets offline
Latency: Minutes to hours acceptable
Use Case: Daily/hourly predictions for millions of records
Tools: Spark, Dask, Ray

Streaming Inference:

Pattern: Real-time predictions on streaming data
Latency: Milliseconds
Use Case: Fraud detection, anomaly detection, real-time recommendations
Tools: Kafka + Flink/Spark Streaming

Platform Comparison:

Seldon Core (Kubernetes-native, advanced):

Advanced deployment strategies (canary, A/B testing, multi-armed bandits)
Multi-framework support
Integrated explainability (Alibi)
High complexity, steep learning curve

KServe (CNCF standard):

Standardized InferenceService API
Serverless scaling (scale-to-zero with Knative)
Kubernetes-native
Growing adoption, CNCF backing

BentoML (Python-first, simplicity):

Easiest to get started
Excellent developer experience
Local testing → cloud deployment
Lower complexity than Seldon/KServe

TorchServe (PyTorch official):

PyTorch-specific serving
Production-grade, optimized for PyTorch models
Less flexible for multi-framework use

TensorFlow Serving (TensorFlow official):

TensorFlow-specific serving
Production-grade, optimized for TensorFlow models
Less flexible for multi-framework use

Selection Criteria:

Kubernetes, advanced deployments → Seldon Core or KServe
Python-first, simplicity → BentoML
PyTorch-specific → TorchServe
TensorFlow-specific → TensorFlow Serving
Managed solution → SageMaker/Vertex AI/Azure ML

For model optimization and serving infrastructure, see references/model-serving.md.

implementing-mlops

How to add

Drop this on your repo README

Related skills

webapp-testing

brand-guidelines

frontend-design

mcp-builder

Get new Design e Frontend skills every Monday

MLOps Patterns

Purpose

When to Use This Skill

Core Concepts

1. Experiment Tracking

2. Model Registry and Versioning

3. Feature Stores

4. Model Serving Patterns

5. Deployment

Comments · No comments