Migration Architect

Tier: POWERFUL
Category: Engineering - Migration Strategy
Purpose: Zero-downtime migration planning, compatibility validation, and rollback strategy generation

Overview

The Migration Architect skill provides comprehensive tools and methodologies for planning, executing, and validating complex system migrations with minimal business impact. This skill combines proven migration patterns with automated planning tools to ensure successful transitions between systems, databases, and infrastructure.

Core Capabilities

1. Migration Strategy Planning

Phased Migration Planning: Break complex migrations into manageable phases with clear validation gates
Risk Assessment: Identify potential failure points and mitigation strategies before execution
Timeline Estimation: Generate realistic timelines based on migration complexity and resource constraints
Stakeholder Communication: Create communication templates and progress dashboards

2. Compatibility Analysis

Schema Evolution: Analyze database schema changes for backward compatibility issues
API Versioning: Detect breaking changes in REST/GraphQL APIs and microservice interfaces
Data Type Validation: Identify data format mismatches and conversion requirements
Constraint Analysis: Validate referential integrity and business rule changes

3. Rollback Strategy Generation

Automated Rollback Plans: Generate comprehensive rollback procedures for each migration phase
Data Recovery Scripts: Create point-in-time data restoration procedures
Service Rollback: Plan service version rollbacks with traffic management
Validation Checkpoints: Define success criteria and rollback triggers

Migration Patterns

Database Migrations

Schema Evolution Patterns

Expand-Contract Pattern
- Expand: Add new columns/tables alongside existing schema
- Dual Write: Application writes to both old and new schema
- Migration: Backfill historical data to new schema
- Contract: Remove old columns/tables after validation
Parallel Schema Pattern
- Run new schema in parallel with existing schema
- Use feature flags to route traffic between schemas
- Validate data consistency between parallel systems
- Cutover when confidence is high
Event Sourcing Migration
- Capture all changes as events during migration window
- Apply events to new schema for consistency
- Enable replay capability for rollback scenarios

Data Migration Strategies

Bulk Data Migration
- Snapshot Approach: Full data copy during maintenance window
- Incremental Sync: Continuous data synchronization with change tracking
- Stream Processing: Real-time data transformation pipelines
Dual-Write Pattern
- Write to both source and target systems during migration
- Implement compensation patterns for write failures
- Use distributed transactions where consistency is critical
Change Data Capture (CDC)
- Stream database changes to target system
- Maintain eventual consistency during migration
- Enable zero-downtime migrations for large datasets

Service Migrations

Strangler Fig Pattern

Intercept Requests: Route traffic through proxy/gateway
Gradually Replace: Implement new service functionality incrementally
Legacy Retirement: Remove old service components as new ones prove stable
Monitoring: Track performance and error rates throughout transition

graph TD
    A[Client Requests] --> B[API Gateway]
    B --> C{Route Decision}
    C -->|Legacy Path| D[Legacy Service]
    C -->|New Path| E[New Service]
    D --> F[Legacy Database]
    E --> G[New Database]

Parallel Run Pattern

Dual Execution: Run both old and new services simultaneously
Shadow Traffic: Route production traffic to both systems
Result Comparison: Compare outputs to validate correctness
Gradual Cutover: Shift traffic percentage based on confidence

Canary Deployment Pattern

Limited Rollout: Deploy new service to small percentage of users
Monitoring: Track key metrics (latency, errors, business KPIs)
Gradual Increase: Increase traffic percentage as confidence grows
Full Rollout: Complete migration once validation passes

Infrastructure Migrations

Cloud-to-Cloud Migration

Assessment Phase
- Inventory existing resources and dependencies
- Map services to target cloud equivalents
- Identify vendor-specific features requiring refactoring
Pilot Migration
- Migrate non-critical workloads first
- Validate performance and cost models
- Refine migration procedures
Production Migration
- Use infrastructure as code for consistency
- Implement cross-cloud networking during transition
- Maintain disaster recovery capabilities

On-Premises to Cloud Migration

Lift and Shift
- Minimal changes to existing applications
- Quick migration with optimization later
- Use cloud migration tools and services
Re-architecture
- Redesign applications for cloud-native patterns
- Adopt microservices, containers, and serverless
- Implement cloud security and scaling practices
Hybrid Approach
- Keep sensitive data on-premises
- Migrate compute workloads to cloud
- Implement secure connectivity between environments

Feature Flags for Migrations

Progressive Feature Rollout

# Example feature flag implementation
class MigrationFeatureFlag:
    def __init__(self, flag_name, rollout_percentage=0):
        self.flag_name = flag_name
        self.rollout_percentage = rollout_percentage
    
    def is_enabled_for_user(self, user_id):
        hash_value = hash(f"{self.flag_name}:{user_id}")
        return (hash_value % 100) < self.rollout_percentage
    
    def gradual_rollout(self, target_percentage, step_size=10):
        while self.rollout_percentage < target_percentage:
            self.rollout_percentage = min(
                self.rollout_percentage + step_size,
                target_percentage
            )
            yield self.rollout_percentage

Circuit Breaker Pattern

Implement automatic fallback to legacy systems when new systems show degraded performance:

class MigrationCircuitBreaker:
    def __init__(self, failure_threshold=5, timeout=60):
        self.failure_count = 0
        self.failure_threshold = failure_threshold
        self.timeout = timeout
        self.last_failure_time = None
        self.state = 'CLOSED'  # CLOSED, OPEN, HALF_OPEN
    
    def call_new_service(self, request):
        if self.state == 'OPEN':
            if self.should_attempt_reset():
                self.state = 'HALF_OPEN'
            else:
                return self.fallback_to_legacy(request)
        
        try:
            response = self.new_service.process(request)
            self.on_success()
            return response
        except Exception as e:
            self.on_failure()
            return self.fallback_to_legacy(request)

Data Validation and Reconciliation

Validation Strategies

Row Count Validation
- Compare record counts between source and target
- Account for soft deletes and filtered records
- Implement threshold-based alerting
Checksums and Hashing
- Generate checksums for critical data subsets
- Compare hash values to detect data drift
- Use sampling for large datasets
Business Logic Validation
- Run critical business queries on both systems
- Compare aggregate results (sums, counts, averages)
- Validate derived data and calculations

Reconciliation Patterns

Delta Detection

-- Example delta query for reconciliation
SELECT 'missing_in_target' as issue_type, source_id
FROM source_table s
WHERE NOT EXISTS (
    SELEC

migration-architect

How to add

Drop this on your repo README

Related skills

internal-comms

babysit

do

smart-explore

Get new DevOps e Infra skills every Monday