NW-DEVOPS: Platform Readiness and Infrastructure Design

Wave: DEVOPS (wave 4 of 6) | Agent: Apex (nw-platform-architect) | Command: /nw-devops

Overview

Execute DEVOPS wave: platform readiness|CI/CD pipeline setup|observability design|infrastructure preparation. Positioned between DESIGN and DISTILL (DISCOVER > DISCUSS > SPIKE > DESIGN > DEVOPS > DISTILL > DELIVER), ensures infrastructure is ready before acceptance tests and code.

Apex translates DESIGN architecture decisions into operational infrastructure: CI/CD pipelines|logging|monitoring|alerting|observability.

Output Tiers (per D2)

Provenance: feature lean-wave-documentation — D2 (schema-typed sections), D10 (one-line expansion descriptions). Tier-1 [REF] sections (always emitted) + Tier-2 EXPANSION CATALOG items (lazy, on-demand) are the two output bands. Full contract: nWave/skills/nw-density-resolution-contract/SKILL.md.

Tier-1 [REF] — always emitted

Under ## Wave: DEVOPS / [REF] <Section> headings:

Environment matrix — table of target environments with platform + preconditions
CI/CD pipeline outline — stage list with trigger rules per branch
Monitoring contracts — KPI-to-instrument mapping (one row per outcome KPI)
Deployment strategy — chosen strategy + rollback contract (one paragraph)
Mutation testing strategy — selected mode (per-feature/nightly-delta/pre-release/disabled)
Observability stack — chosen tools per signal class (logs/metrics/traces)
Branching strategy — selected model + CI trigger alignment
Coexistence matrix — tools that must continue to work alongside deployment
Pre-requisites — DESIGN constraints the platform must satisfy

Tier-2 EXPANSION CATALOG — lazy, on-demand (per D10)

Rendered under ## Wave: DEVOPS / [WHY|HOW] <Section> only when requested via --expand <id> (DDD-2), the wave-end menu (expansion_prompt = "ask"), mode = "full" auto-expansion, or an ad-hoc user request mid-session.

Expansion ID	Tier label	One-line description
`infra-cost-analysis`	[WHY]	Per-environment monthly cost estimate with vendor pricing assumptions
`alternative-deploy-targets`	[WHY]	Cloud/on-prem/hybrid options weighed and rejected with one-paragraph reason
`observability-deep-dive`	[HOW]	Detailed metric/log/trace schemas, alert thresholds, dashboard layouts
`runbook-drafts`	[HOW]	Incident response runbooks for the top failure modes
`kpi-instrumentation-recipes`	[HOW]	Per-KPI data collection recipe (event names, log fields, metric labels)
`ci-pipeline-yaml`	[HOW]	Full CI/CD pipeline YAML with comments per stage
`disaster-recovery-plan`	[HOW]	Backup, restore, and DR procedures with RPO/RTO targets
`expansion-catalog-rationale`	[WHY]	Why this set of expansions, why these defaults, why D10 enforces one-line descriptions

Density resolution (per D12)

Call resolve_density(global_config) from scripts/shared/density_config.py after reading ~/.nwave/global-config.json (missing/malformed = empty dict). Returns mode ("lean" | "full") + expansion_prompt ("ask" | "always-skip" | "always-expand" | "smart") per the D12 cascade (resolver-internal, DDD-5 — do NOT replicate locally). Branch on density.mode for what to emit; branch on density.expansion_prompt at wave end for menu behaviour. Full cascade detail, branch semantics, ad-hoc override workflow: nWave/skills/nw-density-resolution-contract/SKILL.md.

Telemetry (per D4 + DDD-6)

Every expansion choice emits a DocumentationDensityEvent (dataclass at src/des/domain/telemetry/documentation_density_event.py) via event.to_audit_event() → JsonlAuditLogWriter().log_event(...). Schema fields per D4: feature_id, wave, expansion_id, choice, timestamp. For this wave the schema declares "wave": "DEVOPS". Use helper scripts/shared/telemetry.py:write_density_event(...) — do NOT write JSONL directly.

Wave-specific signal: DISTILL consuming a lean DEVOPS environment matrix — downstream --expand requests for runbook drafts or alternative deploy targets indicate the [REF] baseline was insufficient. Full emission rules: nWave/skills/nw-density-resolution-contract/SKILL.md.

Interactive Decision Points

Before proceeding, the orchestrator asks:

Decision 1: Deployment Target

Question: What is the deployment target? Options:

Cloud-native -- AWS, GCP, Azure managed services
On-premise -- self-hosted infrastructure
Hybrid -- mix of cloud and on-premise
Edge -- distributed edge deployment
Other -- user provides custom input

Decision 2: Container Orchestration

Question: Container orchestration approach? Options:

Kubernetes -- full orchestration
Docker Compose -- lightweight container management
Serverless -- function-as-a-service, no containers
None -- bare metal or VM-based deployment

Decision 3: CI/CD Platform

Question: CI/CD platform preference? Options:

GitHub Actions
GitLab CI
Jenkins
Azure DevOps
Other -- user provides custom input

Decision 4: Existing Infrastructure

Question: Is there existing infrastructure or CI/CD to integrate with? Options:

Yes, both -- describe existing infrastructure and CI/CD (user provides details)
Existing infra only -- infrastructure exists, CI/CD is greenfield
Existing CI/CD only -- CI/CD exists, infrastructure is greenfield
No -- greenfield, design everything from scratch

Decision 5: Observability and Logging

Question: What observability and logging approach? Options:

Prometheus + Grafana (metrics) with structured JSON logs
Datadog (full-stack observability including logs)
ELK stack (Elasticsearch, Logstash, Kibana for logs and metrics)
OpenTelemetry (vendor-agnostic telemetry) with provider of choice
CloudWatch (AWS-native metrics and logging)
Custom -- user provides details
None -- defer observability setup

Decision 6: Deployment Strategy

Question: What deployment strategy? Options:

Blue-green -- zero-downtime with environment swap
Canary -- gradual traffic shifting
Rolling -- incremental pod/instance replacement
Recreate -- simple stop-and-replace

Decision 7: Continuous Learning (conditional)

Question: Is there existing monitoring/alerting infrastructure in place? Options:

Yes -- include continuous learning and experimentation capabilities
No -- focus on foundational monitoring setup first

If Yes to Decision 7: Follow-up: Which continuous learning capabilities to include? Options:

A/B testing framework
Feature flags (LaunchDarkly, Unleash, custom)
Canary analysis (automated rollback on metrics)
Progressive rollout (percentage-based deployment)
All of the above

Decision 8: Git Branching Strategy

Question: What Git branching strategy should the project follow? Options:

Trunk-Based Development -- single main branch, short-lived feature branches (<1 day), continuous integration. Requires robust CI gates on every commit.
GitHub Flow -- feature branches from main, pull requests, merge to main after review. Balanced CI with PR-triggered pipelines.
GitFlow -- develop/main branches, feature/release/hotfix branches, formal release process. Requires branch-specific pipelines (develop CI, release candidate, hotfix fast-track).
Release Branching -- long-lived release branches, cherry-pick fixes between branches. Requires per-branch pipelines and cross-branch validation.
Other -- user provides custom strategy

This directly influences CI/CD pipeline design: trigger rules|branch protection|environment promotion|release automation.

Decision 9: Mutation Testing Strategy

Question: When should mutation testing run? Options:

per-feature (default) -- Runs after each feature delivery (refactoring + review), scoped to modified files. Best for small/medium projects where per-feature overhead

nw-devops

Cómo agregar

Pega en el README de tu repo

Skills relacionadas

internal-comms

babysit

do

smart-explore

Recibe nuevas skills de DevOps e Infra todos los lunes