Platform Engineering
Purpose
Build Internal Developer Platforms (IDPs) that provide self-service infrastructure, reduce cognitive load, and accelerate developer productivity through golden paths and platform-as-product thinking.
Platform engineering represents the evolution beyond traditional DevOps, focusing on creating product-quality internal platforms that treat developers as customers. The discipline addresses the developer productivity crisis where engineers spend 30-40% of time on infrastructure and tooling instead of features.
When to Use This Skill
Trigger this skill when:
- Building or improving an internal developer platform
- Designing a developer portal (Backstage, Port, or commercial IDP)
- Implementing golden paths and software templates
- Establishing or restructuring a platform engineering team
- Measuring and improving developer experience (DevEx)
- Integrating IDP with infrastructure, CI/CD, observability, or security tools
- Driving platform adoption across an engineering organization
- Assessing platform maturity and identifying capability gaps
Core Concepts
Platform as Product
Treat internal platforms with the same rigor as customer-facing products:
Product Management Approach:
- Define platform vision, strategy, and roadmap
- Identify developer "customers" and their pain points
- Measure success via adoption metrics, satisfaction surveys, and business impact
- Iterate based on feedback loops and usage analytics
- Balance new capabilities with platform reliability and support
Key Differences from Traditional DevOps:
- DevOps focuses on delivery pipelines; platform engineering builds comprehensive developer experiences
- Platform teams operate as product teams (product managers, UX designers, engineers)
- Success measured by developer productivity and satisfaction, not just infrastructure metrics
- Self-service is the primary interface, not ticket queues
Internal Developer Platform (IDP) Architecture
Three-Layer Architecture:
1. Developer Portal (Frontend)
- Service catalog: Inventory of services with ownership, dependencies, health status
- Software templates: Project scaffolding with best practices baked in
- Documentation hub: Centralized, searchable, version-controlled docs
- Self-service workflows: Environment provisioning, deployments, access requests
2. Platform Orchestration (Backend)
- Infrastructure provisioning: Multi-cloud resource management
- Environment management: Dev, staging, production lifecycle
- Deployment automation: GitOps-based continuous delivery
- Configuration management: Separation of app and infrastructure concerns
3. Integration Layer (Glue)
- CI/CD integration: Pipeline visibility and triggering
- Observability: Metrics, logs, traces surfaced in portal
- Security: Vulnerability scanning, policy enforcement, secrets management
- FinOps: Cost visibility, budgets, optimization recommendations
For detailed architecture patterns and component breakdowns, see references/idp-architecture.md.
Golden Paths and Scaffolding
Golden Path Principle: Provide opinionated templates that handle 80% of use cases while allowing escape hatches for the remaining 20%.
Template Components:
- Repository structure and boilerplate code
- Infrastructure as code (Kubernetes manifests, Terraform)
- CI/CD pipeline configurations
- Observability instrumentation (metrics, logging, tracing)
- Security configurations (RBAC, network policies, secrets)
- Documentation templates (README, runbooks, architecture diagrams)
Constraint Mechanisms:
- Policy-as-code enforcement (OPA, Kyverno) for security and compliance
- Resource limits and quotas to prevent over-provisioning
- Required health checks and observability instrumentation
- Approved base images and dependency scanning
For template design patterns and examples, see references/golden-paths.md.
Developer Experience (DevEx) Optimization
Cognitive Load Reduction:
- Abstract infrastructure complexity without hiding necessary details
- Provide sensible defaults with clear override mechanisms
- Use progressive disclosure (simple for common cases, advanced options available)
- Consolidate tooling (single developer portal vs. 15+ separate tools)
Key Metrics:
DORA Metrics:
- Deployment frequency (how often code reaches production)
- Lead time for changes (commit to production duration)
- Mean time to recovery (MTTR for incidents)
- Change failure rate (percentage of deployments causing incidents)
SPACE Framework:
- Satisfaction: Developer happiness via surveys and NPS
- Performance: Throughput and efficiency of work completed
- Activity: Code commits, PRs, deployments (context, not raw counts)
- Communication: Collaboration quality, discoverability
- Efficiency: Minimize interruptions, reduce toil
Platform-Specific Metrics:
- Platform adoption rate (percentage of teams using platform)
- Self-service rate (actions completed without platform team tickets)
- Onboarding time (new developer to first production deployment)
- Template usage (which golden paths are adopted)
- Support ticket volume and resolution time
Platform Maturity Assessment
Assess current platform capabilities using a 5-level maturity model:
Level 0: Ad-Hoc - Manual provisioning, no standardization Level 1: Basic Automation - Some IaC and CI/CD, limited self-service Level 2: Paved Paths - Golden path templates, early portal, limited coverage Level 3: Self-Service Platform - Comprehensive portal, 80%+ self-service Level 4: Product-Driven Platform - Data-driven, product team structure, FinOps integration Level 5: AI-Augmented Platform - AI-assisted troubleshooting, predictive optimization
For detailed assessment framework, gap analysis, and improvement roadmap, see references/maturity-model.md.
Decision Frameworks
Build vs. Buy IDP
Choose Open Source (Backstage) when:
- Large enterprise (1000+ engineers)
- Dedicated platform team available (5-10 engineers)
- Deep customization required
- Open-source ecosystem preferred
- Long-term investment (3+ year horizon)
Choose Commercial IDP (Port, Humanitec, Cortex) when:
- Mid-size organization (100-1000 engineers)
- Faster time-to-value needed (3-6 months vs. 6-12 months)
- Prefer managed solution with vendor support
- Limited platform engineering resources (<5 engineers)
- Standard use cases (web apps, microservices, CI/CD)
Choose Hybrid Approach when:
- Large organization needing both flexibility and speed
- Complex infrastructure requiring orchestration backend
- Want best-in-class portal + orchestration components
- Willing to integrate multiple systems (e.g., Backstage + Humanitec)
For complete decision tree, selection criteria, and ROI calculations, see references/decision-frameworks.md.
Golden Path Design: Flexibility vs. Standardization
Spectrum of Control:
High Standardization (Regulated Industries):
- Limited technology choices, mandatory templates
- Policy enforcement via admission controllers (OPA, Kyverno)
- Escape hatches require approval process
Balanced Approach (Recommended for Most):
- Recommended golden paths (easy, well-documented, supported)
- Alternatives allowed with documentation
- Soft enforcement (defaults + education, not hard blocks)
- Clear ownership for deviations ("deviate and own")
High Flexibility (Innovative Organizations):
- Golden paths as suggestions (not requirements)
- Minimal policy enforcement (only critical security)
- "Build it, run it" ownership model
For detailed guidance on choosing the right balance and enforcement strategies, see references/decision-frameworks.md.
Platform Team Structure
Centralized Model:
- Single platform team (5-20 engineers) serving entire organization
- Best for: Small to mid-size orgs (100-500 engineers)
Federated Model:
- Central team (5-10 engineers) + embedded engineers (1-2 per business unit)
- Best