Skip to main content
📈Scalability

Load Testing: Validating System Performance at Scale

Load testing is the practice of simulating real-world traffic against your system to measure performance, identify bottlenecks, and validate that your infr...

📖 5 min read

Load Testing: Validating System Performance at Scale

Load testing is the practice of simulating real-world traffic against your system to measure performance, identify bottlenecks, and validate that your infrastructure can handle expected (and unexpected) load. It is a critical practice before any major launch, scaling change, or architecture migration. This guide covers the major load testing tools, methodologies, key metrics, and practical examples.

Types of Performance Tests

Test Type Purpose Duration Load Pattern
Load Test Validate expected traffic levels 15-60 minutes Steady at expected peak
Stress Test Find the breaking point Until failure Gradually increasing
Spike Test Test sudden traffic surges 5-15 minutes Sudden jump to high load
Soak Test Find memory leaks, resource exhaustion 4-24 hours Sustained moderate load
Breakpoint Test Determine maximum capacity 30-60 minutes Step increase until failure

Tool Comparison

Tool Language Protocol Support Best For
k6 JavaScript HTTP, WebSocket, gRPC Developer-friendly, CI/CD integration
Locust Python HTTP, custom protocols Python teams, distributed testing
JMeter Java (GUI) HTTP, JDBC, JMS, FTP Enterprise, database testing
Gatling Scala/Java HTTP, WebSocket High-performance, detailed reports

k6 Load Test Example

import http from "k6/http";
import { check, sleep } from "k6";
import { Rate, Trend } from "k6/metrics";

const errorRate = new Rate("errors");
const latency = new Trend("api_latency");

export const options = {
  stages: [
    { duration: "2m", target: 100 },   // Ramp up to 100 users
    { duration: "5m", target: 100 },   // Hold at 100 users
    { duration: "2m", target: 500 },   // Ramp up to 500 users
    { duration: "5m", target: 500 },   // Hold at 500 users
    { duration: "2m", target: 0 },     // Ramp down
  ],
  thresholds: {
    http_req_duration: ["p(95)<500", "p(99)<1000"],
    errors: ["rate<0.01"],  // Error rate under 1%
  },
};

export default function () {
  // Simulate realistic user behavior
  const loginRes = http.post("https://api.example.com/login", {
    username: `user_${__VU}`,
    password: "test123",
  });
  check(loginRes, { "login success": (r) => r.status === 200 });
  errorRate.add(loginRes.status !== 200);

  const token = loginRes.json("token");
  const headers = { Authorization: `Bearer ${token}` };

  sleep(Math.random() * 3);

  const productsRes = http.get(
    "https://api.example.com/products?page=1",
    { headers }
  );
  latency.add(productsRes.timings.duration);
  check(productsRes, { "products loaded": (r) => r.status === 200 });

  sleep(Math.random() * 2);

  const productId = productsRes.json("products.0.id");
  const detailRes = http.get(
    `https://api.example.com/products/${productId}`,
    { headers }
  );
  check(detailRes, { "detail loaded": (r) => r.status === 200 });

  sleep(Math.random() * 5);
}

Locust Load Test Example

from locust import HttpUser, task, between, events
import random

class WebsiteUser(HttpUser):
    wait_time = between(1, 5)
    host = "https://api.example.com"

    def on_start(self):
        response = self.client.post("/login", json={
            "username": f"user_{self.environment.runner.user_count}",
            "password": "test123"
        })
        self.token = response.json()["token"]
        self.headers = {"Authorization": f"Bearer {self.token}"}

    @task(5)
    def browse_products(self):
        self.client.get("/products?page=1", headers=self.headers)

    @task(3)
    def view_product(self):
        product_id = random.randint(1, 1000)
        self.client.get(f"/products/{product_id}", headers=self.headers)

    @task(1)
    def add_to_cart(self):
        product_id = random.randint(1, 1000)
        self.client.post("/cart", json={
            "product_id": product_id,
            "quantity": 1
        }, headers=self.headers)

    @task(1)
    def search(self):
        query = random.choice(["laptop", "phone", "tablet", "headphones"])
        self.client.get(f"/search?q={query}", headers=self.headers)

Key Metrics to Track

Metric What It Measures Good Target
p50 Latency Median response time <100ms for APIs
p95 Latency 95th percentile response time <500ms
p99 Latency 99th percentile (tail latency) <1000ms
Throughput (RPS) Requests per second handled Depends on requirements
Error Rate Percentage of failed requests <0.1% under normal load
Concurrent Users Active simultaneous users Based on capacity plan

Bottleneck Identification

When load tests reveal performance problems, investigate these common bottlenecks:

  • Database: Slow queries, missing indexes, connection pool exhaustion, lock contention
  • Network: Bandwidth limits, DNS resolution delays, TLS handshake overhead
  • Application: Memory leaks, thread starvation, garbage collection pauses, inefficient algorithms
  • Infrastructure: CPU limits, disk I/O saturation, insufficient instances
# Correlate load test with system metrics
# During load test, monitor:

# Database
SELECT * FROM pg_stat_activity WHERE state = 'active';
SELECT * FROM pg_stat_user_tables ORDER BY seq_scan DESC;
SELECT query, calls, mean_time FROM pg_stat_statements
  ORDER BY mean_time DESC LIMIT 10;

# Application (if using Prometheus)
# rate(http_requests_total[5m])           -- request rate
# histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))
# process_resident_memory_bytes           -- memory usage
# go_goroutines                           -- goroutine count

Load testing validates your auto-scaling configuration, identifies performance optimization opportunities, and ensures your system handles high traffic events. Use the System Design Calculator to estimate your target RPS and concurrent user requirements.

Frequently Asked Questions

Q: How often should I run load tests?

Run load tests before every major release, after significant architecture changes, and on a regular schedule (weekly or monthly) as part of your CI/CD pipeline. k6 integrates easily into CI/CD pipelines for automated performance regression testing.

Q: How do I determine the right load level to test?

Start with your current peak traffic (from analytics/monitoring). Test at 1x, 2x, and 3x that level. For new systems, estimate based on expected user count, pages per session, and session duration. A good formula: target RPS = (daily active users x actions per session) / (seconds in peak hours).

Q: Should I load test in production or staging?

Test in an environment that matches production as closely as possible. Staging is safer but may have different hardware or data volumes. Some teams do production load testing during off-peak hours using shadow traffic or canary deployments. Never load test a shared production environment without coordination.

Q: What is the difference between virtual users and requests per second?

Virtual users (VUs) simulate concurrent users, each executing a scenario with think time between requests. RPS is the total request throughput. If 100 VUs each make 2 requests per second with think time, you get ~200 RPS. RPS is more useful for API testing; VUs are better for simulating realistic user behavior.

Related Articles