CI/CD Architecture
Framework-agnostic CI/CD principles. The body presents trade-offs and common defaults; concrete pipelines live in RECIPES.md (GitHub Actions). See STACK.md for pinned action/tool versions used in the recipes.
This skill is suggestion-mode: most CI/CD decisions depend on team size, deployment target, risk tolerance, and existing infra. Each section names the choice, the trade-off, and a common default — not a mandate. Override locally with an ADR when a decision diverges from the suggestion.
Image-level rules (Dockerfile, multi-arch, scanning) live in docker-architect; this skill covers only the workflow shape around them.
1. Pipeline taxonomy
Most projects need four pipeline shapes. Keeping them in separate workflow files is the common default — it makes "what triggers what" obvious and lets each evolve independently.
- CI — runs on every push and PR. Lint, type-check, test, build. Fast feedback (target under ~10 min).
- Release — runs on tag or main-branch merge. Produces versioned artifacts (binaries, images, packages). Must be idempotent.
- Deploy — promotes an existing artifact to an environment. Triggered manually or by release. Never rebuilds.
- Scheduled — periodic jobs: dependency scans, SBOM refresh, dead-link checks. Decoupled from the change cycle.
Trade-off: one mega-workflow is simpler at first but couples "what was built" to "where it ran" — rolling back gets harder. Splitting them lets you redeploy yesterday's image without re-running CI. Reach for the split once the project ships to more than one environment.
2. Trigger design
pushto main — CI + release. Path filters skip docs-only changes.pull_request— CI only. Required status checks live here.workflow_dispatch— deploys, ad-hoc reruns. Expose on every workflow you might ever need to retrigger manually.schedule— dependency / security scans. Usecronat non-peak times.- Concurrency — every workflow should set a
concurrency:group. For PRs, cancel-in-progress (don't waste CI on stale commits); for main/release, queue (don't race deploys).
GitHub Actions concurrency + path-filter examples in RECIPES.md §2.
3. Pipeline composition
DRY without overcomplicating. Three composition levels, in order of cost:
- Single workflow file — fine for small repos. Inline duplication is cheaper than abstraction here.
- Composite actions — share a single step block within a workflow. Reach for this when ≥3 jobs need the same setup steps.
- Reusable workflows (
workflow_call) — share a multi-job pipeline across repos. Reach for this when ≥3 repos need the same shape.
Trade-off: premature abstraction makes the pipeline harder to read and debug. Factor out at the third copy, not the second. Reusable workflows in particular have non-trivial debugging overhead — stay inline until the duplication is clearly painful.
4. Supply-chain hygiene
CI runners can produce anything that gets deployed — they are high-value targets. Defaults that meaningfully reduce risk:
- Pin third-party actions by SHA, not tag. Tags are mutable; SHAs aren't.
actions/checkout@<sha>with a# v5.0.0comment for humans. Renovate auto-bumps SHAs while preserving the comment. - First-party actions (
actions/*,docker/*) — pinning by major tag (@v5) is a common compromise; SHA-pin if your threat model warrants it. permissions:— set at workflow level to the most restrictive scope (contents: read). Elevate per-job as needed (packages: write,id-token: write). The defaultGITHUB_TOKENgrant is too broad in most repos.- Dependency Review action on PRs catches known-vulnerable additions before merge.
SHA-pinning + Renovate config snippet in RECIPES.md §4.
5. Secrets & cloud auth
- Prefer OIDC (workload identity federation) over long-lived credentials when the target supports it — AWS, GCP, Azure, HashiCorp Vault, npm + PyPI trusted publishing all do. The runner exchanges a short-lived JWT for a scoped token; no
AWS_ACCESS_KEY_IDlives in repo secrets. - When long-lived secrets are unavoidable, scope them to GitHub
environments:with required reviewers — not to the repo. - Never echo a secret in logs. Masking catches direct prints; derivative leaks (a token shaped into a URL, a hash printed for "debugging") happen anyway. Treat any code path that reads
${{ secrets.X }}as opaque. - Rotate scheduled secrets via a dedicated workflow; track rotation cadence in an ADR.
OIDC trust-policy snippets (AWS + GCP) in RECIPES.md §5.
6. Caching
Cache what's expensive to fetch and safe to reuse:
- Language deps —
setup-*actions have built-in cache flags (actions/setup-gowithcache: true,actions/setup-pythonwithcache: 'pip'or'uv'). Prefer those over hand-rolledactions/cacheblocks. - Build artifacts — only cache compiled outputs when builds are deterministic. Non-deterministic build → stale cache → mystery failures that waste hours.
- Docker layer cache —
docker/build-push-actionwithcache-from: type=gha+cache-to: type=gha,mode=maxis the GitHub-native option. A registry cache (type=registry,ref=<image>:cache) survives across runners and is the better default for prod images. - Key the cache on the lockfile hash. Collisions silently degrade builds.
Per-language caching examples in RECIPES.md §6.
7. Matrix strategy
Fan out only when the dimension genuinely matters:
- OS matrix — only if you ship cross-platform binaries or have OS-specific code paths.
- Language version matrix — only if you support multiple. A library targeting "Python 3.12+" tests 3.12, 3.13, 3.14; an app pinning 3.14 doesn't need a matrix.
fail-fast: falsewhen you want to see all failures (compatibility surveys); defaulttruesaves CI minutes when any failure is a blocker.
Trade-off: every matrix cell costs runner time + log volume. A 3×3×2 matrix is 18 jobs — each dimension should justify itself or come out.
8. Test gates + required status checks
- Required checks live on the branch protection rule, not in workflow YAML. The workflow runs the test; the rule decides whether main can merge.
- Name jobs explicitly and stably — required checks reference the job name. Renaming a job silently makes the rule unenforceable until someone notices and re-checks it.
- Soft gates (coverage delta, performance regression) post a PR comment; hard gates fail the job.
- Flaky tests — quarantine fast (mark
skipwith a tracking issue), don't normalize retry. Retry-as-default trains the team to ignore real flakiness.
9. Build & artifact publishing
This skill covers the workflow shape, not the Dockerfile — see docker-architect for image rules.
- Multi-arch images —
docker/setup-qemu-action+docker/setup-buildx-action+docker/build-push-actionwithplatforms: linux/amd64,linux/arm64. - Build once, push many — build the image once, push with multiple tags (
:<short-sha>,:<semver>,:latest) in one action. Parallel rebuilds for tags are an antipattern. - Sign with Cosign + OIDC for prod-bound images (no key material). Verify on pull in the deployment platform.
- SBOM — generated alongside the image via
docker/build-push-actionattestations (provenance: true,sbom: true).
Full build + push workflow in RECIPES.md §9.
10. Release automation
- Conventional Commits → version + changelog is the cheapest workable pattern. Tools:
release-please(GitHub-native, manifest-driven),semantic-release(Node ecosystem),git-cliff(Rust-based, language-agnostic). Pick one and own it; switching costs