Rate Limiting for Security: Protecting Your Systems from Abuse
Rate limiting is one of the most effective first lines of defense against a wide range of security threats. From brute force login attacks to DDoS attacks and API scraping, controlling the rate of incoming requests prevents resource exhaustion and protects both your infrastructure and your users. This guide covers algorithms, implementation strategies, and real-world deployment patterns.
Why Rate Limiting Matters for Security
| Threat | Without Rate Limiting | With Rate Limiting |
|---|---|---|
| Brute Force Login | Millions of password guesses per hour | 5 attempts per 15 minutes |
| Credential Stuffing | Automated testing of leaked credentials | Slowed to useless pace |
| API Scraping | Entire database exfiltrated in minutes | Limited to acceptable data access rate |
| DDoS (Application Layer) | Server overloaded and crashes | Excess requests rejected early |
| Resource Exhaustion | Expensive queries drain CPU/memory | Request throughput capped per user |
Rate Limiting Algorithms
1. Token Bucket
The most widely used algorithm. A bucket holds tokens that are consumed per request. Tokens refill at a steady rate. If the bucket is empty, requests are rejected. This allows short bursts while enforcing an average rate.
class TokenBucket {
constructor(capacity, refillRate) {
this.capacity = capacity; // Max tokens (burst size)
this.tokens = capacity; // Current tokens
this.refillRate = refillRate; // Tokens per second
this.lastRefill = Date.now();
}
tryConsume(tokens = 1) {
this.refill();
if (this.tokens >= tokens) {
this.tokens -= tokens;
return true; // Request allowed
}
return false; // Rate limited
}
refill() {
const now = Date.now();
const elapsed = (now - this.lastRefill) / 1000;
this.tokens = Math.min(
this.capacity,
this.tokens + elapsed * this.refillRate
);
this.lastRefill = now;
}
}
// Usage: 10 requests/second with burst of 20
const bucket = new TokenBucket(20, 10);
if (!bucket.tryConsume()) {
res.status(429).json({ error: 'Rate limit exceeded' });
}
2. Leaky Bucket
Requests enter a queue (bucket) and are processed at a fixed rate. If the queue is full, new requests are dropped. This smooths out bursts, producing a steady output rate.
class LeakyBucket {
constructor(capacity, leakRate) {
this.capacity = capacity; // Queue size
this.queue = [];
this.leakRate = leakRate; // Requests processed per second
this.lastLeak = Date.now();
}
tryAdd(request) {
this.leak();
if (this.queue.length < this.capacity) {
this.queue.push(request);
return true;
}
return false; // Queue full, reject
}
leak() {
const now = Date.now();
const elapsed = (now - this.lastLeak) / 1000;
const leaked = Math.floor(elapsed * this.leakRate);
this.queue.splice(0, leaked);
if (leaked > 0) this.lastLeak = now;
}
}
3. Fixed Window Counter
Counts requests in fixed time windows (e.g., per minute). Simple but has a boundary problem: a user can send 2x the limit by timing requests at the window boundary.
// Redis-based fixed window counter
async function fixedWindowRateLimit(userId, limit, windowSec) {
const window = Math.floor(Date.now() / 1000 / windowSec);
const key = `ratelimit:${userId}:${window}`;
const current = await redis.incr(key);
if (current === 1) {
await redis.expire(key, windowSec);
}
return current <= limit;
}
4. Sliding Window Log
Stores the timestamp of each request and counts how many fall within the sliding window. Most accurate but requires more memory.
async function slidingWindowLog(userId, limit, windowMs) {
const now = Date.now();
const windowStart = now - windowMs;
const key = `ratelimit:${userId}`;
// Remove expired entries and add current request
const pipeline = redis.pipeline();
pipeline.zremrangebyscore(key, 0, windowStart);
pipeline.zadd(key, now, `${now}:${Math.random()}`);
pipeline.zcard(key);
pipeline.expire(key, Math.ceil(windowMs / 1000));
const results = await pipeline.exec();
const count = results[2][1];
return count <= limit;
}
Algorithm Comparison
| Algorithm | Memory | Accuracy | Burst Handling | Complexity |
|---|---|---|---|---|
| Token Bucket | Low (O(1)) | Good | Allows controlled bursts | Low |
| Leaky Bucket | Medium (queue) | Good | Smooths output rate | Low |
| Fixed Window | Very Low | Low (boundary issue) | 2x burst at boundary | Very Low |
| Sliding Window Log | High (per request) | Very High | Precise enforcement | Medium |
| Sliding Window Counter | Low | High | Good approximation | Low |
IP-Based vs User-Based Rate Limiting
| Strategy | Pros | Cons | Best For |
|---|---|---|---|
| IP-Based | Works for unauthenticated requests | Shared IPs (NAT, corporate) punish all users | Login pages, public endpoints |
| User-Based | Fair per-user limits, unaffected by shared IPs | Requires authentication first | Authenticated API endpoints |
| API Key-Based | Per-application limits, tiered plans | Key sharing, stolen keys | Public APIs, developer platforms |
| Combined | Multiple layers of protection | More complex to manage | Production systems |
// Multi-layer rate limiting
function rateLimitKey(req) {
// Layer 1: IP-based (catches unauthenticated abuse)
const ipKey = `ip:${req.ip}`;
// Layer 2: User-based (fair per-user limits)
const userKey = req.user ? `user:${req.user.id}` : null;
// Layer 3: Endpoint-specific (protect expensive operations)
const endpointKey = `endpoint:${req.method}:${req.path}`;
return { ipKey, userKey, endpointKey };
}
Rate Limiting at Different Layers
Layer Architecture
| Layer | Technology | Purpose |
|---|---|---|
| CDN / Edge | Cloudflare, AWS CloudFront | Block volumetric attacks before they reach origin |
| WAF | AWS WAF, Cloudflare WAF | Rule-based blocking, bot detection |
| Load Balancer | NGINX, AWS ALB | Connection limits, request rate per IP |
| API Gateway | Kong, AWS API Gateway | Per-key limits, throttling policies |
| Application | Express middleware, custom code | User-based limits, business logic |
NGINX Rate Limiting
# nginx.conf
http {
# Define rate limit zone: 10 requests/second per IP
limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s;
# Stricter zone for login
limit_req_zone $binary_remote_addr zone=login:10m rate=1r/s;
server {
location /api/ {
limit_req zone=api burst=20 nodelay;
limit_req_status 429;
proxy_pass http://backend;
}
location /api/auth/login {
limit_req zone=login burst=5;
limit_req_status 429;
proxy_pass http://backend;
}
}
}
Response Headers
Always communicate rate limit status to clients via standard headers:
// Standard rate limit response headers
res.setHeader('X-RateLimit-Limit', '100'); // Max requests per window
res.setHeader('X-RateLimit-Remaining', '87'); // Remaining requests
res.setHeader('X-RateLimit-Reset', '1700000060'); // Window reset (Unix timestamp)
// When rate limited (429 status)
res.setHeader('Retry-After', '60'); // Seconds until retry is allowed
res.status(429).json({
error: 'rate_limit_exceeded',
message: 'Too many requests. Please retry after 60 seconds.',
retryAfter: 60
});
WAF Integration
Web Application Firewalls add intelligent rate limiting with bot detection, geographic filtering, and behavioral analysis. They integrate with DDoS protection systems.
// AWS WAF rate-based rule (CloudFormation)
{
"Type": "AWS::WAFv2::WebACL",
"Properties": {
"Rules": [{
"Name": "RateLimitRule",
"Priority": 1,
"Statement": {
"RateBasedStatement": {
"Limit": 2000,
"AggregateKeyType": "IP"
}
},
"Action": { "Block": {} }
}]
}
}
Circuit Breaker Pattern
When downstream services are overwhelmed, the circuit breaker prevents cascading failures by temporarily stopping requests. This is rate limiting applied to outbound calls.
class CircuitBreaker {
constructor(options) {
this.failureThreshold = options.failureThreshold || 5;
this.resetTimeout = options.resetTimeout || 30000;
this.state = 'CLOSED'; // CLOSED, OPEN, HALF_OPEN
this.failureCount = 0;
this.lastFailureTime = null;
}
async execute(fn) {
if (this.state === 'OPEN') {
if (Date.now() - this.lastFailureTime > this.resetTimeout) {
this.state = 'HALF_OPEN';
} else {
throw new Error('Circuit breaker is OPEN');
}
}
try {
const result = await fn();
this.onSuccess();
return result;
} catch (error) {
this.onFailure();
throw error;
}
}
onSuccess() {
this.failureCount = 0;
this.state = 'CLOSED';
}
onFailure() {
this.failureCount++;
this.lastFailureTime = Date.now();
if (this.failureCount >= this.failureThreshold) {
this.state = 'OPEN';
}
}
}
Use our API and Network Tools to test rate limiting behavior and observe response headers. Explore API Security for broader protection strategies.
Graceful Degradation
Instead of hard-blocking rate-limited requests, consider graceful degradation strategies:
- Throttle response quality — Return cached or simplified responses instead of full real-time data
- Queue requests — Accept the request but process it with lower priority
- Progressive delays — Add increasing delays to repeated requests (tarpit pattern)
- CAPTCHA challenges — Require human verification before allowing continued access
- Tiered limits — Allow higher limits for premium users, as supported by API security best practices
Frequently Asked Questions
What rate limits should I set for login endpoints?
A common practice is 5 failed attempts per 15 minutes per username and 20 failed attempts per 15 minutes per IP address. After exceeding limits, require a CAPTCHA or enforce a cooldown period. Also implement account lockout after 10 consecutive failures, requiring email verification to unlock. This balances security against legitimate users mistyping passwords.
How do I rate limit behind a load balancer without Redis?
Without a centralized store, each server tracks limits independently. This means a user gets N times the limit across N servers. For accurate distributed rate limiting, use Redis or a similar shared data store. Alternatively, use sticky sessions at the load balancer level so each user hits the same server, but this reduces balancing effectiveness.
Can attackers bypass IP-based rate limiting?
Yes. Attackers use rotating proxies, VPNs, botnets, and cloud instances to cycle through thousands of IP addresses. IP-based limiting is a necessary first defense but not sufficient alone. Combine with user-based limits, behavioral analysis, device fingerprinting, and DDoS protection services for comprehensive defense.
What is the difference between rate limiting and throttling?
Rate limiting rejects requests that exceed the limit (returning 429). Throttling slows down requests — adding delays or queuing them for later processing. Both control request rates but with different user experiences. Throttling is gentler but uses more server resources (holding connections open). Rate limiting is more resource-efficient but provides a harder cutoff. Most production systems use rate limiting with clear Retry-After headers.
How do I handle rate limiting in microservices?
Apply rate limiting at the API gateway for external-facing limits (per client or API key). For internal service-to-service calls, use circuit breakers and bulkhead patterns instead of hard rate limits. This prevents cascading failures while allowing internal services to communicate freely. Learn more about microservice patterns and explore our tools for testing rate limiting configurations.