MLOps Patterns
Operationalize machine learning models from experimentation to production deployment and monitoring.
Purpose
Provide strategic guidance for ML engineers and platform teams to build production-grade ML infrastructure. Cover the complete lifecycle: experiment tracking, model registry, feature stores, deployment patterns, pipeline orchestration, and monitoring.
When to Use This Skill
Use this skill when:
- Designing MLOps infrastructure for production ML systems
- Selecting experiment tracking platforms (MLflow, Weights & Biases, Neptune)
- Implementing feature stores for online/offline feature serving
- Choosing model serving solutions (Seldon Core, KServe, BentoML, TorchServe)
- Building ML pipelines for training, evaluation, and deployment
- Setting up model monitoring and drift detection
- Establishing model governance and compliance frameworks
- Optimizing ML inference costs and performance
- Migrating from notebooks to production ML systems
- Implementing continuous training and automated retraining
Core Concepts
1. Experiment Tracking
Track experiments systematically to ensure reproducibility and collaboration.
Key Components:
- Parameters: Hyperparameters logged for each training run
- Metrics: Performance measures tracked over time (accuracy, loss, F1)
- Artifacts: Model weights, plots, datasets, configuration files
- Metadata: Tags, descriptions, Git commit SHA, environment details
Platform Comparison:
MLflow (Open-source standard):
- Framework-agnostic (PyTorch, TensorFlow, scikit-learn, XGBoost)
- Self-hosted or cloud-agnostic deployment
- Integrated model registry
- Basic UI, adequate for most use cases
- Free, requires infrastructure management
Weights & Biases (SaaS, collaboration-focused):
- Advanced visualization and dashboards
- Integrated hyperparameter optimization (Sweeps)
- Excellent team collaboration features
- SaaS pricing scales with usage
- Best-in-class UI
Neptune.ai (Enterprise-grade):
- Enterprise features (RBAC, audit logs, compliance)
- Integrated production monitoring
- Higher cost than W&B
- Good for regulated industries
Selection Criteria:
- Open-source requirement → MLflow
- Team collaboration critical → Weights & Biases
- Enterprise compliance (RBAC, audits) → Neptune.ai
- Hyperparameter optimization primary → Weights & Biases (Sweeps)
For detailed comparison and decision framework, see references/experiment-tracking.md.
2. Model Registry and Versioning
Centralize model artifacts with version control and stage management.
Model Registry Components:
- Model artifacts (weights, serialized models)
- Training metrics (accuracy, F1, AUC)
- Hyperparameters used during training
- Training dataset version
- Feature schema (input/output signatures)
- Model cards (documentation, use cases, limitations)
Stage Management:
- None: Newly registered model
- Staging: Testing in pre-production environment
- Production: Serving live traffic
- Archived: Deprecated, retained for compliance
Versioning Strategies:
Semantic Versioning for Models:
- Major version (v2.0.0): Breaking change in input/output schema
- Minor version (v1.1.0): New feature, backward-compatible
- Patch version (v1.0.1): Bug fix, model retrained on new data
Git-Based Versioning:
- Model code in Git (training scripts, configuration)
- Model weights in DVC (Data Version Control) or Git-LFS
- Reproducibility via commit SHA + data version hash
For model lineage tracking and registry patterns, see references/model-registry.md.
3. Feature Stores
Centralize feature engineering to ensure consistency between training and inference.
Problem Addressed: Training/serving skew
- Training: Features computed with future knowledge (data leakage)
- Inference: Features computed with only past data
- Result: Model performs well in training but fails in production
Feature Store Solution:
Online Feature Store:
- Purpose: Low-latency feature retrieval for real-time inference
- Storage: Redis, DynamoDB, Cassandra (key-value stores)
- Latency: Sub-10ms for feature lookup
- Use Case: Real-time predictions (fraud detection, recommendations)
Offline Feature Store:
- Purpose: Historical feature data for training and batch inference
- Storage: Parquet files (S3/GCS), data warehouses (Snowflake, BigQuery)
- Latency: Seconds to minutes (batch retrieval)
- Use Case: Model training, backtesting, batch predictions
Point-in-Time Correctness:
- Ensures no future data leakage during training
- Feature values at time T only use data available before time T
- Critical for avoiding overly optimistic training metrics
Platform Comparison:
Feast (Open-source, cloud-agnostic):
- Most popular open-source feature store
- Supports Redis, DynamoDB, Datastore (online) and Parquet, BigQuery, Snowflake (offline)
- Cloud-agnostic, no vendor lock-in
- Active community, growing adoption
Tecton (Managed, production-grade):
- Feast-compatible API
- Fully managed service
- Integrated monitoring and governance
- Higher cost, enterprise-focused
SageMaker Feature Store (AWS):
- Integrated with AWS ecosystem
- Managed online/offline stores
- AWS lock-in
Databricks Feature Store (Databricks):
- Unity Catalog integration
- Delta Lake for offline storage
- Databricks ecosystem lock-in
Selection Criteria:
- Open-source, cloud-agnostic → Feast
- Managed solution, production-grade → Tecton
- AWS ecosystem → SageMaker Feature Store
- Databricks users → Databricks Feature Store
For feature engineering patterns and implementation, see references/feature-stores.md.
4. Model Serving Patterns
Deploy models for synchronous, asynchronous, batch, or streaming inference.
Serving Patterns:
REST API Deployment:
- Pattern: HTTP endpoint for synchronous predictions
- Latency: <100ms acceptable
- Use Case: Request-response applications
- Tools: Flask, FastAPI, BentoML, Seldon Core
gRPC Deployment:
- Pattern: High-performance RPC for low-latency inference
- Latency: <10ms target
- Use Case: Microservices, latency-critical applications
- Tools: TensorFlow Serving, TorchServe, Seldon Core
Batch Inference:
- Pattern: Process large datasets offline
- Latency: Minutes to hours acceptable
- Use Case: Daily/hourly predictions for millions of records
- Tools: Spark, Dask, Ray
Streaming Inference:
- Pattern: Real-time predictions on streaming data
- Latency: Milliseconds
- Use Case: Fraud detection, anomaly detection, real-time recommendations
- Tools: Kafka + Flink/Spark Streaming
Platform Comparison:
Seldon Core (Kubernetes-native, advanced):
- Advanced deployment strategies (canary, A/B testing, multi-armed bandits)
- Multi-framework support
- Integrated explainability (Alibi)
- High complexity, steep learning curve
KServe (CNCF standard):
- Standardized InferenceService API
- Serverless scaling (scale-to-zero with Knative)
- Kubernetes-native
- Growing adoption, CNCF backing
BentoML (Python-first, simplicity):
- Easiest to get started
- Excellent developer experience
- Local testing → cloud deployment
- Lower complexity than Seldon/KServe
TorchServe (PyTorch official):
- PyTorch-specific serving
- Production-grade, optimized for PyTorch models
- Less flexible for multi-framework use
TensorFlow Serving (TensorFlow official):
- TensorFlow-specific serving
- Production-grade, optimized for TensorFlow models
- Less flexible for multi-framework use
Selection Criteria:
- Kubernetes, advanced deployments → Seldon Core or KServe
- Python-first, simplicity → BentoML
- PyTorch-specific → TorchServe
- TensorFlow-specific → TensorFlow Serving
- Managed solution → SageMaker/Vertex AI/Azure ML
For model optimization and serving infrastructure, see references/model-serving.md.