Load Testing: Validating System Performance at Scale
Load testing is the practice of simulating real-world traffic against your system to measure performance, identify bottlenecks, and validate that your infrastructure can handle expected (and unexpected) load. It is a critical practice before any major launch, scaling change, or architecture migration. This guide covers the major load testing tools, methodologies, key metrics, and practical examples.
Types of Performance Tests
| Test Type | Purpose | Duration | Load Pattern |
|---|---|---|---|
| Load Test | Validate expected traffic levels | 15-60 minutes | Steady at expected peak |
| Stress Test | Find the breaking point | Until failure | Gradually increasing |
| Spike Test | Test sudden traffic surges | 5-15 minutes | Sudden jump to high load |
| Soak Test | Find memory leaks, resource exhaustion | 4-24 hours | Sustained moderate load |
| Breakpoint Test | Determine maximum capacity | 30-60 minutes | Step increase until failure |
Tool Comparison
| Tool | Language | Protocol Support | Best For |
|---|---|---|---|
| k6 | JavaScript | HTTP, WebSocket, gRPC | Developer-friendly, CI/CD integration |
| Locust | Python | HTTP, custom protocols | Python teams, distributed testing |
| JMeter | Java (GUI) | HTTP, JDBC, JMS, FTP | Enterprise, database testing |
| Gatling | Scala/Java | HTTP, WebSocket | High-performance, detailed reports |
k6 Load Test Example
import http from "k6/http";
import { check, sleep } from "k6";
import { Rate, Trend } from "k6/metrics";
const errorRate = new Rate("errors");
const latency = new Trend("api_latency");
export const options = {
stages: [
{ duration: "2m", target: 100 }, // Ramp up to 100 users
{ duration: "5m", target: 100 }, // Hold at 100 users
{ duration: "2m", target: 500 }, // Ramp up to 500 users
{ duration: "5m", target: 500 }, // Hold at 500 users
{ duration: "2m", target: 0 }, // Ramp down
],
thresholds: {
http_req_duration: ["p(95)<500", "p(99)<1000"],
errors: ["rate<0.01"], // Error rate under 1%
},
};
export default function () {
// Simulate realistic user behavior
const loginRes = http.post("https://api.example.com/login", {
username: `user_${__VU}`,
password: "test123",
});
check(loginRes, { "login success": (r) => r.status === 200 });
errorRate.add(loginRes.status !== 200);
const token = loginRes.json("token");
const headers = { Authorization: `Bearer ${token}` };
sleep(Math.random() * 3);
const productsRes = http.get(
"https://api.example.com/products?page=1",
{ headers }
);
latency.add(productsRes.timings.duration);
check(productsRes, { "products loaded": (r) => r.status === 200 });
sleep(Math.random() * 2);
const productId = productsRes.json("products.0.id");
const detailRes = http.get(
`https://api.example.com/products/${productId}`,
{ headers }
);
check(detailRes, { "detail loaded": (r) => r.status === 200 });
sleep(Math.random() * 5);
}
Locust Load Test Example
from locust import HttpUser, task, between, events
import random
class WebsiteUser(HttpUser):
wait_time = between(1, 5)
host = "https://api.example.com"
def on_start(self):
response = self.client.post("/login", json={
"username": f"user_{self.environment.runner.user_count}",
"password": "test123"
})
self.token = response.json()["token"]
self.headers = {"Authorization": f"Bearer {self.token}"}
@task(5)
def browse_products(self):
self.client.get("/products?page=1", headers=self.headers)
@task(3)
def view_product(self):
product_id = random.randint(1, 1000)
self.client.get(f"/products/{product_id}", headers=self.headers)
@task(1)
def add_to_cart(self):
product_id = random.randint(1, 1000)
self.client.post("/cart", json={
"product_id": product_id,
"quantity": 1
}, headers=self.headers)
@task(1)
def search(self):
query = random.choice(["laptop", "phone", "tablet", "headphones"])
self.client.get(f"/search?q={query}", headers=self.headers)
Key Metrics to Track
| Metric | What It Measures | Good Target |
|---|---|---|
| p50 Latency | Median response time | <100ms for APIs |
| p95 Latency | 95th percentile response time | <500ms |
| p99 Latency | 99th percentile (tail latency) | <1000ms |
| Throughput (RPS) | Requests per second handled | Depends on requirements |
| Error Rate | Percentage of failed requests | <0.1% under normal load |
| Concurrent Users | Active simultaneous users | Based on capacity plan |
Bottleneck Identification
When load tests reveal performance problems, investigate these common bottlenecks:
- Database: Slow queries, missing indexes, connection pool exhaustion, lock contention
- Network: Bandwidth limits, DNS resolution delays, TLS handshake overhead
- Application: Memory leaks, thread starvation, garbage collection pauses, inefficient algorithms
- Infrastructure: CPU limits, disk I/O saturation, insufficient instances
# Correlate load test with system metrics
# During load test, monitor:
# Database
SELECT * FROM pg_stat_activity WHERE state = 'active';
SELECT * FROM pg_stat_user_tables ORDER BY seq_scan DESC;
SELECT query, calls, mean_time FROM pg_stat_statements
ORDER BY mean_time DESC LIMIT 10;
# Application (if using Prometheus)
# rate(http_requests_total[5m]) -- request rate
# histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))
# process_resident_memory_bytes -- memory usage
# go_goroutines -- goroutine count
Load testing validates your auto-scaling configuration, identifies performance optimization opportunities, and ensures your system handles high traffic events. Use the System Design Calculator to estimate your target RPS and concurrent user requirements.
Frequently Asked Questions
Q: How often should I run load tests?
Run load tests before every major release, after significant architecture changes, and on a regular schedule (weekly or monthly) as part of your CI/CD pipeline. k6 integrates easily into CI/CD pipelines for automated performance regression testing.
Q: How do I determine the right load level to test?
Start with your current peak traffic (from analytics/monitoring). Test at 1x, 2x, and 3x that level. For new systems, estimate based on expected user count, pages per session, and session duration. A good formula: target RPS = (daily active users x actions per session) / (seconds in peak hours).
Q: Should I load test in production or staging?
Test in an environment that matches production as closely as possible. Staging is safer but may have different hardware or data volumes. Some teams do production load testing during off-peak hours using shadow traffic or canary deployments. Never load test a shared production environment without coordination.
Q: What is the difference between virtual users and requests per second?
Virtual users (VUs) simulate concurrent users, each executing a scenario with think time between requests. RPS is the total request throughput. If 100 VUs each make 2 requests per second with think time, you get ~200 RPS. RPS is more useful for API testing; VUs are better for simulating realistic user behavior.