When this skill is activated, always start your first response with the 🧢 emoji.
Docker & Kubernetes
A practical guide to containerizing applications and running them reliably in Kubernetes. This skill covers the full lifecycle from writing a production-ready Dockerfile to deploying with Helm, configuring traffic with Ingress, and debugging cluster issues. The emphasis is on correctness and operability - containers that are small, secure, and observable; Kubernetes workloads that self-heal, scale, and fail gracefully. Designed for engineers who know the basics and need opinionated guidance on production patterns.
When to use this skill
Trigger this skill when the user:
- Writes or reviews a Dockerfile (any language or runtime)
- Deploys or configures a Kubernetes workload (Deployment, StatefulSet, DaemonSet)
- Sets up Kubernetes networking (Services, Ingress, NetworkPolicy)
- Creates or maintains a Helm chart or values file
- Configures health probes, resource limits, or autoscaling (HPA/VPA)
- Debugs a failing pod (CrashLoopBackOff, OOMKilled, ImagePullBackOff)
- Configures a service mesh (Istio, Linkerd) or needs mTLS between services
Do NOT trigger this skill for:
- Cloud-provider infrastructure provisioning (use a Terraform/IaC skill instead)
- CI/CD pipeline authoring (use a CI/CD skill - container builds are a small part)
Key principles
-
One process per container - A container should do exactly one thing. Sidecar patterns (logging agents, proxies) are valid, but the main container must not run multiple application processes. This preserves independent restartability and clean signal handling.
-
Immutable infrastructure - Never patch a running container. Update the image tag, redeploy. Mutations to running pods are invisible to version control and create snowflakes. Pin image tags in production; never use
latest. -
Declarative configuration - All cluster state lives in YAML checked into git.
kubectl applyis the only allowed mutation path.kubectl editon a live cluster is a debugging tool, not a deployment method. -
Minimal base images - Use
alpine,distroless, or language-specific slim images. Fewer packages = smaller attack surface = faster pulls. Multi-stage builds eliminate build tooling from the final image. -
Health checks always - Every Deployment must define liveness and readiness probes. Without them, Kubernetes cannot distinguish a booting pod from a hung one, and will route traffic to pods that cannot serve it.
Core concepts
Docker layers and caching
Each RUN, COPY, and ADD instruction creates a layer. Layers are cached by
content hash. Cache is invalidated at the first changed layer and all layers after
it. Ordering matters: put rarely-changing instructions (installing OS packages) before
frequently-changing ones (copying application source). Copy dependency manifests and
install before copying source code.
Kubernetes object model
Pod -> smallest schedulable unit (one or more containers sharing network/storage)
|
Deployment -> manages ReplicaSets; handles rollouts and rollbacks
|
Service -> stable virtual IP and DNS name that routes to healthy pod IPs
|
Ingress -> HTTP/HTTPS routing rules from outside the cluster into Services
Namespaces provide soft isolation within a cluster. Use them to separate environments (staging, production) or teams. ResourceQuotas and NetworkPolicies scope to namespaces.
ConfigMaps and Secrets
- ConfigMap: non-sensitive configuration (feature flags, URLs, log levels). Mount as env vars or volume files.
- Secret: sensitive values (passwords, tokens, TLS certs). Stored base64-encoded in etcd (encrypt etcd at rest in production). Never bake secrets into images.
Common tasks
Write a production Dockerfile (multi-stage, Node.js)
# ---- build stage ----
FROM node:20-alpine AS builder
WORKDIR /app
# Copy manifests first - cached until dependencies change
COPY package.json package-lock.json ./
RUN npm ci --ignore-scripts
COPY . .
RUN npm run build
# ---- runtime stage ----
FROM node:20-alpine AS runtime
ENV NODE_ENV=production
WORKDIR /app
# Non-root user for security
RUN addgroup -S appgroup && adduser -S appuser -G appgroup
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
COPY package.json ./
USER appuser
EXPOSE 3000
# Use exec form to receive signals correctly
CMD ["node", "dist/server.js"]
Key decisions: alpine base, non-root user, npm ci (reproducible installs),
multi-stage to exclude dev dependencies, exec-form CMD for proper PID 1 signal
handling.
Create a Kubernetes Deployment + Service
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-server
namespace: production
labels:
app: api-server
spec:
replicas: 3
selector:
matchLabels:
app: api-server
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
maxSurge: 1
template:
metadata:
labels:
app: api-server
spec:
containers:
- name: api-server
image: registry.example.com/api-server:1.4.2 # pinned tag, never latest
ports:
- containerPort: 3000
envFrom:
- configMapRef:
name: api-config
- secretRef:
name: api-secrets
resources:
requests:
cpu: "100m"
memory: "128Mi"
limits:
cpu: "500m"
memory: "256Mi"
readinessProbe:
httpGet:
path: /healthz/ready
port: 3000
initialDelaySeconds: 5
periodSeconds: 10
livenessProbe:
httpGet:
path: /healthz/live
port: 3000
initialDelaySeconds: 15
periodSeconds: 20
topologySpreadConstraints:
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: api-server
---
apiVersion: v1
kind: Service
metadata:
name: api-server
namespace: production
spec:
selector:
app: api-server
ports:
- port: 80
targetPort: 3000
type: ClusterIP
Configure Ingress with TLS
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: api-ingress
namespace: production
annotations:
nginx.ingress.kubernetes.io/ssl-redirect: "true"
nginx.ingress.kubernetes.io/proxy-body-size: "10m"
cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
ingressClassName: nginx
tls:
- hosts:
- api.example.com
secretName: api-tls-cert # cert-manager populates this
rules:
- host: api.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: api-server
port:
number: 80
Write a Helm chart
Minimal chart structure and key files:
Chart.yaml
apiVersion: v2
name: api-server
description: API server Helm chart
type: application
version: 0.1.0 # chart version
appVersion: "1.4.2" # application image version
values.yaml
replicaCount: 3
image:
repository: registry.example.com/api-server
tag: "" # defaults to .Chart.AppVersion
pullPolicy: IfNotPresent
service:
type: ClusterIP
port: 80
ingress:
enabled: true
host: api.example.com
tlsSecretName: api-tls-cert
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 256Mi
autoscaling:
enabled: false
minReplicas: 2
maxReplicas: 10
targetCPUUtilizationPercentage: 70
templates/deployment.yaml (excerpt)
image: "{{ .Values.image.repository }}:{{ .Values.image.tag |