Skip to content

Web Development · Testing

k6 Load Testing: Performance Testing Your APIs Before Users Find the Problems

k6 is a developer-friendly load testing tool with JavaScript scripting, CI integration, and clear metrics. Here's how to write meaningful load tests, interpret the results, and catch performance regressions before they reach production.

Anurag Verma

Anurag Verma

7 min read

k6 Load Testing: Performance Testing Your APIs Before Users Find the Problems

Sponsored

Share

Most APIs get tested for correctness. Fewer get tested for what happens when a hundred users hit them at the same time. The usual discovery mechanism for that gap is a spike in traffic that reveals the problem: slow responses, 502 errors from an overwhelmed upstream, a database connection pool that exhausts under load.

k6 is the tool for closing that gap before deployment. It’s open-source, from Grafana Labs, and uses JavaScript for test scripts. Unlike JMeter with its XML configuration or Gatling with Scala, k6 feels like writing a test suite: code that’s readable, version-controlled, and runnable in CI.

Installing k6

# macOS
brew install k6

# Linux (Debian/Ubuntu)
sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys C5AD17C747E3415A3642D57D77C6C491D6AC1D69
echo "deb https://dl.k6.io/deb stable main" | sudo tee /etc/apt/sources.list.d/k6.list
sudo apt-get update && sudo apt-get install k6

# Docker
docker pull grafana/k6

Or download the binary from k6.io/docs/get-started/installation.

A Minimal Load Test

Every k6 script exports a default function. k6 calls it repeatedly, once per iteration, per virtual user (VU):

import http from "k6/http";
import { check, sleep } from "k6";

export const options = {
  vus: 10,          // 10 concurrent virtual users
  duration: "30s",  // run for 30 seconds
};

export default function () {
  const res = http.get("https://api.example.com/products");

  check(res, {
    "status 200": (r) => r.status === 200,
    "response under 500ms": (r) => r.timings.duration < 500,
  });

  sleep(1); // Wait 1 second between iterations (simulates real user think time)
}

Run it:

k6 run script.js

k6 prints a summary when the test finishes:

✓ status 200
✓ response under 500ms

checks.........................: 100.00% ✓ 600  ✗ 0
data_received..................: 2.3 MB 76 kB/s
data_sent......................: 47 kB  1.6 kB/s
http_req_duration..............: avg=45ms  min=28ms  med=41ms  max=387ms  p(90)=68ms  p(95)=89ms
http_req_failed................: 0.00%  ✓ 0    ✗ 600
http_reqs......................: 600    20.02/req/s
vus............................: 10     min=10    max=10

The line you’ll focus on: http_req_duration. The avg tells you average latency. The p(90) and p(95) tell you what 90% and 95% of requests experienced. Average latency can look healthy while your slowest requests are very slow. The percentiles expose that.

Ramping Load

Flat VU counts aren’t realistic. Real traffic ramps up. k6’s stages config models this:

export const options = {
  stages: [
    { duration: "2m", target: 20 },   // ramp up to 20 VUs over 2 minutes
    { duration: "5m", target: 20 },   // hold at 20 VUs for 5 minutes
    { duration: "2m", target: 50 },   // ramp up to 50 VUs
    { duration: "5m", target: 50 },   // hold at 50 VUs
    { duration: "2m", target: 0 },    // ramp down
  ],
};

Watch where latency and error rate start climbing as VUs increase. The point where p(95) latency spikes or error rate rises above 0% is your current throughput ceiling.

Testing a Full User Flow

Load testing a single endpoint is useful but incomplete. Real users follow sequences: login, browse, add to cart, checkout. k6 can model this:

import http from "k6/http";
import { check, sleep } from "k6";

const BASE_URL = "https://api.example.com";

export default function () {
  // Step 1: Login
  const loginRes = http.post(`${BASE_URL}/auth/login`, JSON.stringify({
    email: "testuser@example.com",
    password: "testpassword123",
  }), {
    headers: { "Content-Type": "application/json" },
  });

  check(loginRes, { "login succeeded": (r) => r.status === 200 });

  const token = loginRes.json("token");
  const authHeaders = {
    headers: {
      Authorization: `Bearer ${token}`,
      "Content-Type": "application/json",
    },
  };

  sleep(1);

  // Step 2: List products
  const productsRes = http.get(`${BASE_URL}/products`, authHeaders);
  check(productsRes, { "products loaded": (r) => r.status === 200 });

  sleep(2);

  // Step 3: Get a specific product
  const products = productsRes.json("items");
  if (!products || products.length === 0) return;
  const productId = products[0].id;

  const productRes = http.get(`${BASE_URL}/products/${productId}`, authHeaders);
  check(productRes, { "product detail loaded": (r) => r.status === 200 });

  sleep(1);

  // Step 4: Add to cart
  const cartRes = http.post(`${BASE_URL}/cart`, JSON.stringify({
    product_id: productId,
    quantity: 1,
  }), authHeaders);

  check(cartRes, { "added to cart": (r) => r.status === 201 });

  sleep(3);
}

This tells you whether your entire user flow holds up under concurrent load, not just individual endpoints.

Thresholds: Pass/Fail in CI

Raw numbers in a terminal aren’t useful for CI. Thresholds let k6 exit with a non-zero status code when performance criteria aren’t met:

export const options = {
  stages: [
    { duration: "1m", target: 30 },
    { duration: "3m", target: 30 },
    { duration: "1m", target: 0 },
  ],
  thresholds: {
    // Fail if more than 1% of requests fail
    http_req_failed: ["rate<0.01"],

    // Fail if 95th percentile latency exceeds 800ms
    http_req_duration: ["p(95)<800"],

    // Fail if any check fails at all
    checks: ["rate>0.99"],
  },
};

k6 exits with code 99 if any threshold is breached. In a CI pipeline:

# GitHub Actions example
- name: Run load test
  run: k6 run --quiet tests/load/api-flow.js
  # Step fails if k6 exits non-zero (threshold breached)

This catches performance regressions before deployment, the same way unit tests catch correctness regressions.

Custom Metrics

The built-in metrics cover HTTP requests. For application-specific measurements, k6 provides custom metric types:

import { Counter, Trend, Rate } from "k6/metrics";

const successfulLogins = new Counter("successful_logins");
const loginDuration = new Trend("login_duration", true);  // true = display as ms
const loginSuccessRate = new Rate("login_success_rate");

export default function () {
  const start = Date.now();
  const res = http.post(`${BASE_URL}/auth/login`, /* ... */);
  const duration = Date.now() - start;

  const success = res.status === 200;
  loginDuration.add(duration);
  loginSuccessRate.add(success);

  if (success) {
    successfulLogins.add(1);
  }
}

Custom metrics appear in the summary alongside built-in metrics. Thresholds can reference them too:

thresholds: {
  login_success_rate: ["rate>0.95"],
  login_duration: ["p(95)<1000"],
},

Environment Configuration

Test scripts shouldn’t hardcode URLs. Pass them as environment variables:

const BASE_URL = __ENV.API_URL || "http://localhost:3000";

Run with:

k6 run -e API_URL=https://staging.api.example.com script.js

This makes the same script usable against local, staging, and production environments without code changes.

Interpreting Results

When reviewing a load test run, look at these in order:

Error rate first: If http_req_failed is above 0%, the system is already failing. Latency numbers from failed requests are misleading.

p(95) and p(99) latency: Average latency hides the long tail. The users experiencing p(99) latency are the ones who will complain. If p(99) is 5x the p(50), you likely have a queue saturation problem or slow database queries on a subset of requests.

Latency over time: k6’s default summary shows overall stats. For time-series views, output to InfluxDB and view in Grafana, or use the k6 run --out json flag and parse the JSON output:

k6 run --out json=results.json script.js

Where it breaks: Run increasing VU stages and note where error rate or latency starts degrading. That’s your system’s current capacity for this workload pattern. Whether that’s acceptable depends on your expected traffic.

What k6 Doesn’t Do

k6 is a black-box tool. It tests the API from the outside. It won’t tell you why things slowed down, only that they did. Diagnosing the root cause requires correlating the k6 run timestamps with APM data (Datadog, New Relic, OpenTelemetry), database slow query logs, and infrastructure metrics.

k6 also doesn’t generate realistic user data variation out of the box. Test scripts that always use the same user credentials, same product IDs, and same endpoints may not represent production behavior accurately, especially if your system uses caching (the test might always hit cache) or has per-user data partitioning.

Use k6 to verify your API can handle the load you expect. Use application observability tools to understand what’s limiting performance when it can’t.

The best time to run a load test is before the feature ships, not after the on-call page fires at 2am.

Sponsored

Enjoyed it? Pass it on.

Share this article.

Sponsored

The dispatch

Working notes from
the studio.

A short letter twice a month — what we shipped, what broke, and the AI tools earning their keep.

No spam, ever. Unsubscribe anytime.

Discussion

Join the conversation.

Comments are powered by GitHub Discussions. Sign in with your GitHub account to leave a comment.

Sponsored