NW-DEVOPS: Platform Readiness and Infrastructure Design
Wave: DEVOPS (wave 4 of 6) | Agent: Apex (nw-platform-architect) | Command: /nw-devops
Overview
Execute DEVOPS wave: platform readiness|CI/CD pipeline setup|observability design|infrastructure preparation. Positioned between DESIGN and DISTILL (DISCOVER > DISCUSS > SPIKE > DESIGN > DEVOPS > DISTILL > DELIVER), ensures infrastructure is ready before acceptance tests and code.
Apex translates DESIGN architecture decisions into operational infrastructure: CI/CD pipelines|logging|monitoring|alerting|observability.
Output Tiers (per D2)
Provenance: feature lean-wave-documentation — D2 (schema-typed sections), D10 (one-line expansion descriptions). Tier-1 [REF] sections (always emitted) + Tier-2 EXPANSION CATALOG items (lazy, on-demand) are the two output bands. Full contract: nWave/skills/nw-density-resolution-contract/SKILL.md.
Tier-1 [REF] — always emitted
Under ## Wave: DEVOPS / [REF] <Section> headings:
- Environment matrix — table of target environments with platform + preconditions
- CI/CD pipeline outline — stage list with trigger rules per branch
- Monitoring contracts — KPI-to-instrument mapping (one row per outcome KPI)
- Deployment strategy — chosen strategy + rollback contract (one paragraph)
- Mutation testing strategy — selected mode (per-feature/nightly-delta/pre-release/disabled)
- Observability stack — chosen tools per signal class (logs/metrics/traces)
- Branching strategy — selected model + CI trigger alignment
- Coexistence matrix — tools that must continue to work alongside deployment
- Pre-requisites — DESIGN constraints the platform must satisfy
Tier-2 EXPANSION CATALOG — lazy, on-demand (per D10)
Rendered under ## Wave: DEVOPS / [WHY|HOW] <Section> only when requested via --expand <id> (DDD-2), the wave-end menu (expansion_prompt = "ask"), mode = "full" auto-expansion, or an ad-hoc user request mid-session.
| Expansion ID | Tier label | One-line description |
|---|---|---|
infra-cost-analysis | [WHY] | Per-environment monthly cost estimate with vendor pricing assumptions |
alternative-deploy-targets | [WHY] | Cloud/on-prem/hybrid options weighed and rejected with one-paragraph reason |
observability-deep-dive | [HOW] | Detailed metric/log/trace schemas, alert thresholds, dashboard layouts |
runbook-drafts | [HOW] | Incident response runbooks for the top failure modes |
kpi-instrumentation-recipes | [HOW] | Per-KPI data collection recipe (event names, log fields, metric labels) |
ci-pipeline-yaml | [HOW] | Full CI/CD pipeline YAML with comments per stage |
disaster-recovery-plan | [HOW] | Backup, restore, and DR procedures with RPO/RTO targets |
expansion-catalog-rationale | [WHY] | Why this set of expansions, why these defaults, why D10 enforces one-line descriptions |
Density resolution (per D12)
Call resolve_density(global_config) from scripts/shared/density_config.py after reading ~/.nwave/global-config.json (missing/malformed = empty dict). Returns mode ("lean" | "full") + expansion_prompt ("ask" | "always-skip" | "always-expand" | "smart") per the D12 cascade (resolver-internal, DDD-5 — do NOT replicate locally). Branch on density.mode for what to emit; branch on density.expansion_prompt at wave end for menu behaviour. Full cascade detail, branch semantics, ad-hoc override workflow: nWave/skills/nw-density-resolution-contract/SKILL.md.
Telemetry (per D4 + DDD-6)
Every expansion choice emits a DocumentationDensityEvent (dataclass at src/des/domain/telemetry/documentation_density_event.py) via event.to_audit_event() → JsonlAuditLogWriter().log_event(...). Schema fields per D4: feature_id, wave, expansion_id, choice, timestamp. For this wave the schema declares "wave": "DEVOPS". Use helper scripts/shared/telemetry.py:write_density_event(...) — do NOT write JSONL directly.
Wave-specific signal: DISTILL consuming a lean DEVOPS environment matrix — downstream --expand requests for runbook drafts or alternative deploy targets indicate the [REF] baseline was insufficient. Full emission rules: nWave/skills/nw-density-resolution-contract/SKILL.md.
Interactive Decision Points
Before proceeding, the orchestrator asks:
Decision 1: Deployment Target
Question: What is the deployment target? Options:
- Cloud-native -- AWS, GCP, Azure managed services
- On-premise -- self-hosted infrastructure
- Hybrid -- mix of cloud and on-premise
- Edge -- distributed edge deployment
- Other -- user provides custom input
Decision 2: Container Orchestration
Question: Container orchestration approach? Options:
- Kubernetes -- full orchestration
- Docker Compose -- lightweight container management
- Serverless -- function-as-a-service, no containers
- None -- bare metal or VM-based deployment
Decision 3: CI/CD Platform
Question: CI/CD platform preference? Options:
- GitHub Actions
- GitLab CI
- Jenkins
- Azure DevOps
- Other -- user provides custom input
Decision 4: Existing Infrastructure
Question: Is there existing infrastructure or CI/CD to integrate with? Options:
- Yes, both -- describe existing infrastructure and CI/CD (user provides details)
- Existing infra only -- infrastructure exists, CI/CD is greenfield
- Existing CI/CD only -- CI/CD exists, infrastructure is greenfield
- No -- greenfield, design everything from scratch
Decision 5: Observability and Logging
Question: What observability and logging approach? Options:
- Prometheus + Grafana (metrics) with structured JSON logs
- Datadog (full-stack observability including logs)
- ELK stack (Elasticsearch, Logstash, Kibana for logs and metrics)
- OpenTelemetry (vendor-agnostic telemetry) with provider of choice
- CloudWatch (AWS-native metrics and logging)
- Custom -- user provides details
- None -- defer observability setup
Decision 6: Deployment Strategy
Question: What deployment strategy? Options:
- Blue-green -- zero-downtime with environment swap
- Canary -- gradual traffic shifting
- Rolling -- incremental pod/instance replacement
- Recreate -- simple stop-and-replace
Decision 7: Continuous Learning (conditional)
Question: Is there existing monitoring/alerting infrastructure in place? Options:
- Yes -- include continuous learning and experimentation capabilities
- No -- focus on foundational monitoring setup first
If Yes to Decision 7: Follow-up: Which continuous learning capabilities to include? Options:
- A/B testing framework
- Feature flags (LaunchDarkly, Unleash, custom)
- Canary analysis (automated rollback on metrics)
- Progressive rollout (percentage-based deployment)
- All of the above
Decision 8: Git Branching Strategy
Question: What Git branching strategy should the project follow? Options:
- Trunk-Based Development -- single main branch, short-lived feature branches (<1 day), continuous integration. Requires robust CI gates on every commit.
- GitHub Flow -- feature branches from main, pull requests, merge to main after review. Balanced CI with PR-triggered pipelines.
- GitFlow -- develop/main branches, feature/release/hotfix branches, formal release process. Requires branch-specific pipelines (develop CI, release candidate, hotfix fast-track).
- Release Branching -- long-lived release branches, cherry-pick fixes between branches. Requires per-branch pipelines and cross-branch validation.
- Other -- user provides custom strategy
This directly influences CI/CD pipeline design: trigger rules|branch protection|environment promotion|release automation.
Decision 9: Mutation Testing Strategy
Question: When should mutation testing run? Options:
- per-feature (default) -- Runs after each feature delivery (refactoring + review), scoped to modified files. Best for small/medium projects where per-feature overhead