Load Testing: Validating System Performance at Scale

Load testing is the practice of simulating real-world traffic against your system to measure performance, identify bottlenecks, and validate that your infrastructure can handle expected (and unexpected) load. It is a critical practice before any major launch, scaling change, or architecture migration. This guide covers the major load testing tools, methodologies, key metrics, and practical examples.

Types of Performance Tests

Test Type	Purpose	Duration	Load Pattern
Load Test	Validate expected traffic levels	15-60 minutes	Steady at expected peak
Stress Test	Find the breaking point	Until failure	Gradually increasing
Spike Test	Test sudden traffic surges	5-15 minutes	Sudden jump to high load
Soak Test	Find memory leaks, resource exhaustion	4-24 hours	Sustained moderate load
Breakpoint Test	Determine maximum capacity	30-60 minutes	Step increase until failure

Tool Comparison

Tool	Language	Protocol Support	Best For
k6	JavaScript	HTTP, WebSocket, gRPC	Developer-friendly, CI/CD integration
Locust	Python	HTTP, custom protocols	Python teams, distributed testing
JMeter	Java (GUI)	HTTP, JDBC, JMS, FTP	Enterprise, database testing
Gatling	Scala/Java	HTTP, WebSocket	High-performance, detailed reports

k6 Load Test Example

import http from "k6/http";
import { check, sleep } from "k6";
import { Rate, Trend } from "k6/metrics";

const errorRate = new Rate("errors");
const latency = new Trend("api_latency");

export const options = {
  stages: [
    { duration: "2m", target: 100 },   // Ramp up to 100 users
    { duration: "5m", target: 100 },   // Hold at 100 users
    { duration: "2m", target: 500 },   // Ramp up to 500 users
    { duration: "5m", target: 500 },   // Hold at 500 users
    { duration: "2m", target: 0 },     // Ramp down
  ],
  thresholds: {
    http_req_duration: ["p(95)<500", "p(99)<1000"],
    errors: ["rate<0.01"],  // Error rate under 1%
  },
};

export default function () {
  // Simulate realistic user behavior
  const loginRes = http.post("https://api.example.com/login", {
    username: `user_${__VU}`,
    password: "test123",
  });
  check(loginRes, { "login success": (r) => r.status === 200 });
  errorRate.add(loginRes.status !== 200);

  const token = loginRes.json("token");
  const headers = { Authorization: `Bearer ${token}` };

  sleep(Math.random() * 3);

  const productsRes = http.get(
    "https://api.example.com/products?page=1",
    { headers }
  );
  latency.add(productsRes.timings.duration);
  check(productsRes, { "products loaded": (r) => r.status === 200 });

  sleep(Math.random() * 2);

  const productId = productsRes.json("products.0.id");
  const detailRes = http.get(
    `https://api.example.com/products/${productId}`,
    { headers }
  );
  check(detailRes, { "detail loaded": (r) => r.status === 200 });

  sleep(Math.random() * 5);
}

Locust Load Test Example

from locust import HttpUser, task, between, events
import random

class WebsiteUser(HttpUser):
    wait_time = between(1, 5)
    host = "https://api.example.com"

    def on_start(self):
        response = self.client.post("/login", json={
            "username": f"user_{self.environment.runner.user_count}",
            "password": "test123"
        })
        self.token = response.json()["token"]
        self.headers = {"Authorization": f"Bearer {self.token}"}

    @task(5)
    def browse_products(self):
        self.client.get("/products?page=1", headers=self.headers)

    @task(3)
    def view_product(self):
        product_id = random.randint(1, 1000)
        self.client.get(f"/products/{product_id}", headers=self.headers)

    @task(1)
    def add_to_cart(self):
        product_id = random.randint(1, 1000)
        self.client.post("/cart", json={
            "product_id": product_id,
            "quantity": 1
        }, headers=self.headers)

    @task(1)
    def search(self):
        query = random.choice(["laptop", "phone", "tablet", "headphones"])
        self.client.get(f"/search?q={query}", headers=self.headers)

Key Metrics to Track

Metric	What It Measures	Good Target
p50 Latency	Median response time	<100ms for APIs
p95 Latency	95th percentile response time	<500ms
p99 Latency	99th percentile (tail latency)	<1000ms
Throughput (RPS)	Requests per second handled	Depends on requirements
Error Rate	Percentage of failed requests	<0.1% under normal load
Concurrent Users	Active simultaneous users	Based on capacity plan

Bottleneck Identification

When load tests reveal performance problems, investigate these common bottlenecks:

Database: Slow queries, missing indexes, connection pool exhaustion, lock contention
Network: Bandwidth limits, DNS resolution delays, TLS handshake overhead
Application: Memory leaks, thread starvation, garbage collection pauses, inefficient algorithms
Infrastructure: CPU limits, disk I/O saturation, insufficient instances

# Correlate load test with system metrics
# During load test, monitor:

# Database
SELECT * FROM pg_stat_activity WHERE state = 'active';
SELECT * FROM pg_stat_user_tables ORDER BY seq_scan DESC;
SELECT query, calls, mean_time FROM pg_stat_statements
  ORDER BY mean_time DESC LIMIT 10;

# Application (if using Prometheus)
# rate(http_requests_total[5m])           -- request rate
# histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))
# process_resident_memory_bytes           -- memory usage
# go_goroutines                           -- goroutine count

Load testing validates your auto-scaling configuration, identifies performance optimization opportunities, and ensures your system handles high traffic events. Use the System Design Calculator to estimate your target RPS and concurrent user requirements.

Frequently Asked Questions

Q: How often should I run load tests?

Run load tests before every major release, after significant architecture changes, and on a regular schedule (weekly or monthly) as part of your CI/CD pipeline. k6 integrates easily into CI/CD pipelines for automated performance regression testing.

Q: How do I determine the right load level to test?

Start with your current peak traffic (from analytics/monitoring). Test at 1x, 2x, and 3x that level. For new systems, estimate based on expected user count, pages per session, and session duration. A good formula: target RPS = (daily active users x actions per session) / (seconds in peak hours).

Q: Should I load test in production or staging?

Test in an environment that matches production as closely as possible. Staging is safer but may have different hardware or data volumes. Some teams do production load testing during off-peak hours using shadow traffic or canary deployments. Never load test a shared production environment without coordination.

Q: What is the difference between virtual users and requests per second?

Virtual users (VUs) simulate concurrent users, each executing a scenario with think time between requests. RPS is the total request throughput. If 100 VUs each make 2 requests per second with think time, you get ~200 RPS. RPS is more useful for API testing; VUs are better for simulating realistic user behavior.

Load Testing: Validating System Performance at Scale