When this skill is activated, always start your first response with the 🧢 emoji.

Load Testing

A practitioner's guide to load testing production services. This skill covers test design, k6 implementation, CI integration, results analysis, and capacity planning with an emphasis on when each test type is appropriate and what to measure. Designed for engineers who need to validate performance before and after launches.

When to use this skill

Trigger this skill when the user:

Writes a k6, Artillery, JMeter, or Gatling test script
Plans a load, stress, soak, or spike test campaign
Benchmarks API throughput or latency
Defines performance SLOs or pass/fail thresholds
Integrates load tests into CI/CD pipelines
Analyzes load test results to find bottlenecks
Capacity plans for an upcoming traffic event (launch, sale, campaign)

Do NOT trigger this skill for:

Unit or integration tests that don't involve concurrent load (use a testing skill)
Frontend performance (Lighthouse, Core Web Vitals - use a frontend performance skill)

Key principles

Test in production-like environments - A load test against a single-instance staging box with seeded data tells you nothing about your production fleet. Match CPU/memory ratios, replica counts, and dataset sizes. Synthetic data that doesn't reflect production cardinality produces misleading results.
Define pass/fail criteria before testing - Decide what "passing" means before you run the first request. "P95 latency < 300ms, error rate < 0.1%, RPS >= 500" is a pass/fail criterion. "It felt fast" is not. Set thresholds in code so tests fail automatically in CI.
Ramp up gradually - Never go from 0 to peak load instantly. A sudden spike obscures whether failure was caused by the ramp itself or sustained load. Use stages: warm up, ramp to target, hold steady, ramp down. A gradual ramp mirrors real traffic and gives infrastructure time to autoscale.
Test with realistic data and scenarios - A test that hits a single cached endpoint with the same user ID is not a load test; it is a cache benchmark. Use parameterized data (real user IDs, varied payloads), model the full user journey, and include think time between requests to simulate realistic concurrency.
Automate load tests in CI - Load tests only provide value if they run consistently. Gate every deployment with a smoke-level load test. Run full stress and soak tests on a schedule (nightly or pre-release). Fail the build on threshold violations. Trends over time catch regressions earlier than one-off runs.

Core concepts

Test types

Type	Goal	Duration	VU shape
Smoke	Verify the test script works; baseline sanity	1-2 min	1-5 VUs, constant
Load	Validate behavior at expected production traffic	15-30 min	Ramp to target, hold
Stress	Find the breaking point; measure degradation curve	30-60 min	Ramp beyond expected until failure
Soak	Detect memory leaks, connection pool exhaustion, drift	2-24 hours	Hold at 70-80% capacity
Spike	Simulate sudden traffic surge (marketing event, viral post)	10-20 min	Instant jump to 5-10x, then drop

Choose the test type based on what question you're trying to answer - not habit. Most teams only run load tests and miss soak and spike scenarios where real incidents happen.

Key metrics

Metric	What it measures	Typical target
RPS / throughput	Requests per second the system handles	Depends on expected traffic
P50 / P95 / P99 latency	Response time distribution	P99 < 2x your SLO
Error rate	% of requests returning 4xx/5xx	< 0.1% under load
Time to first byte (TTFB)	Server processing latency	Proxy for backend work
Checks passed %	Business logic assertions in the test	100% expected

Always track percentiles (p95, p99), not averages. An average of 100ms with a p99 of 5000ms means 1 in 100 users waits 5 seconds - that is a bad service.

Think time

Think time (or "sleep") is the pause between requests a virtual user makes to simulate a real user reading a page or filling a form. Without think time, virtual users fire requests as fast as possible, which does not reflect real traffic patterns and saturates the system unrealistically. Use sleep(randomBetween(1, 3)) to add variance.

Virtual users vs RPS

Virtual users (VUs) model concurrent users - each VU executes the full scenario loop. RPS is a result of VU count, think time, and iteration duration.

Open vs closed workload models:

Closed (VU-based): Fixed pool of VUs, each completes a request before starting the next. System naturally caps throughput. Best for session-based applications.
Open (arrival rate): New requests arrive at a fixed rate regardless of system state. Queues build under saturation. Best for stateless APIs and microservices.

k6 supports both: vus/duration for closed, constantArrivalRate/ramping ArrivalRate executors for open.

Common tasks

Write a basic load test

// k6 basic load test - smoke then load
import http from 'k6/http';
import { sleep, check } from 'k6';

export const options = {
  stages: [
    { duration: '30s', target: 10 },  // ramp up
    { duration: '1m',  target: 10 },  // hold
    { duration: '15s', target: 0 },   // ramp down
  ],
  thresholds: {
    http_req_duration: ['p(95)<300'],   // 95% of requests under 300ms
    http_req_failed:   ['rate<0.01'],   // less than 1% errors
  },
};

export default function () {
  const res = http.get('https://api.example.com/health');

  check(res, {
    'status is 200':       (r) => r.status === 200,
    'response time < 500ms': (r) => r.timings.duration < 500,
  });

  sleep(1);
}

Run with: k6 run script.js. Add --out json=results.json to export raw data.

Implement ramping scenarios - stages

// k6 staged ramp - warm up, load, stress, cool down
import http from 'k6/http';
import { sleep, check } from 'k6';

export const options = {
  stages: [
    { duration: '2m',  target: 20  },  // warm up to expected load
    { duration: '5m',  target: 20  },  // hold at expected load
    { duration: '2m',  target: 100 },  // ramp to stress level
    { duration: '5m',  target: 100 },  // hold under stress
    { duration: '2m',  target: 200 },  // push further
    { duration: '3m',  target: 200 },  // hold to find saturation point
    { duration: '2m',  target: 0   },  // ramp down
  ],
  thresholds: {
    http_req_duration: ['p(99)<1000'],
    http_req_failed:   ['rate<0.05'],
  },
};

export default function () {
  http.get('https://api.example.com/products');
  sleep(Math.random() * 2 + 1);  // think time: 1-3s
}

Watch metrics during the stress phase. The point where p99 latency inflects upward or error rate climbs is your saturation point.

Test API endpoints with checks and thresholds

// k6 with structured checks and per-endpoint thresholds
import http from 'k6/http';
import { check, group, sleep } from 'k6';

export const options = {
  vus: 50,
  duration: '5m',
  thresholds: {
    'http_req_duration{endpoint:list}':   ['p(95)<200'],
    'http_req_duration{endpoint:detail}': ['p(95)<400'],
    'http_req_failed':                    ['rate<0.01'],
    'checks':                             ['rate>0.99'],
  },
};

const BASE_URL = 'https://api.example.com';

export default function () {
  group('list products', () => {
    const res = http.get(`${BASE_URL}/products`, {
      tags: { endpoint: 'list' },
    });
    check(res, {
      'list: status 200':    (r) => r.status === 200,
      'list: has items':     (r) => JSON.parse(r.body).items.length > 0,
    });
  });

  sleep(1);

  group('product detail', () => {
    const res = http.get(`${BASE_URL}/products/42`, {
      tags: { endpoint: 'detail' },
    });

load-testing

Cómo agregar

Pega en el README de tu repo

Skills relacionadas

MoneyPrinterTurbo

weather-svg-creator

telegram-bot-builder

segment-automation

Recibe nuevas skills de Automação todos los lunes