Rate Limiting for Security: Protecting Your Systems from Abuse

Rate limiting is one of the most effective first lines of defense against a wide range of security threats. From brute force login attacks to DDoS attacks and API scraping, controlling the rate of incoming requests prevents resource exhaustion and protects both your infrastructure and your users. This guide covers algorithms, implementation strategies, and real-world deployment patterns.

Why Rate Limiting Matters for Security

Threat	Without Rate Limiting	With Rate Limiting
Brute Force Login	Millions of password guesses per hour	5 attempts per 15 minutes
Credential Stuffing	Automated testing of leaked credentials	Slowed to useless pace
API Scraping	Entire database exfiltrated in minutes	Limited to acceptable data access rate
DDoS (Application Layer)	Server overloaded and crashes	Excess requests rejected early
Resource Exhaustion	Expensive queries drain CPU/memory	Request throughput capped per user

Rate Limiting Algorithms

1. Token Bucket

The most widely used algorithm. A bucket holds tokens that are consumed per request. Tokens refill at a steady rate. If the bucket is empty, requests are rejected. This allows short bursts while enforcing an average rate.

class TokenBucket {
  constructor(capacity, refillRate) {
    this.capacity = capacity;      // Max tokens (burst size)
    this.tokens = capacity;        // Current tokens
    this.refillRate = refillRate;  // Tokens per second
    this.lastRefill = Date.now();
  }

  tryConsume(tokens = 1) {
    this.refill();
    if (this.tokens >= tokens) {
      this.tokens -= tokens;
      return true;  // Request allowed
    }
    return false;   // Rate limited
  }

  refill() {
    const now = Date.now();
    const elapsed = (now - this.lastRefill) / 1000;
    this.tokens = Math.min(
      this.capacity,
      this.tokens + elapsed * this.refillRate
    );
    this.lastRefill = now;
  }
}

// Usage: 10 requests/second with burst of 20
const bucket = new TokenBucket(20, 10);
if (!bucket.tryConsume()) {
  res.status(429).json({ error: 'Rate limit exceeded' });
}

2. Leaky Bucket

Requests enter a queue (bucket) and are processed at a fixed rate. If the queue is full, new requests are dropped. This smooths out bursts, producing a steady output rate.

class LeakyBucket {
  constructor(capacity, leakRate) {
    this.capacity = capacity;   // Queue size
    this.queue = [];
    this.leakRate = leakRate;   // Requests processed per second
    this.lastLeak = Date.now();
  }

  tryAdd(request) {
    this.leak();
    if (this.queue.length < this.capacity) {
      this.queue.push(request);
      return true;
    }
    return false; // Queue full, reject
  }

  leak() {
    const now = Date.now();
    const elapsed = (now - this.lastLeak) / 1000;
    const leaked = Math.floor(elapsed * this.leakRate);
    this.queue.splice(0, leaked);
    if (leaked > 0) this.lastLeak = now;
  }
}

3. Fixed Window Counter

Counts requests in fixed time windows (e.g., per minute). Simple but has a boundary problem: a user can send 2x the limit by timing requests at the window boundary.

// Redis-based fixed window counter
async function fixedWindowRateLimit(userId, limit, windowSec) {
  const window = Math.floor(Date.now() / 1000 / windowSec);
  const key = `ratelimit:${userId}:${window}`;

  const current = await redis.incr(key);
  if (current === 1) {
    await redis.expire(key, windowSec);
  }

  return current <= limit;
}

4. Sliding Window Log

Stores the timestamp of each request and counts how many fall within the sliding window. Most accurate but requires more memory.

async function slidingWindowLog(userId, limit, windowMs) {
  const now = Date.now();
  const windowStart = now - windowMs;
  const key = `ratelimit:${userId}`;

  // Remove expired entries and add current request
  const pipeline = redis.pipeline();
  pipeline.zremrangebyscore(key, 0, windowStart);
  pipeline.zadd(key, now, `${now}:${Math.random()}`);
  pipeline.zcard(key);
  pipeline.expire(key, Math.ceil(windowMs / 1000));

  const results = await pipeline.exec();
  const count = results[2][1];

  return count <= limit;
}

Algorithm Comparison

Algorithm	Memory	Accuracy	Burst Handling	Complexity
Token Bucket	Low (O(1))	Good	Allows controlled bursts	Low
Leaky Bucket	Medium (queue)	Good	Smooths output rate	Low
Fixed Window	Very Low	Low (boundary issue)	2x burst at boundary	Very Low
Sliding Window Log	High (per request)	Very High	Precise enforcement	Medium
Sliding Window Counter	Low	High	Good approximation	Low

IP-Based vs User-Based Rate Limiting

Strategy	Pros	Cons	Best For
IP-Based	Works for unauthenticated requests	Shared IPs (NAT, corporate) punish all users	Login pages, public endpoints
User-Based	Fair per-user limits, unaffected by shared IPs	Requires authentication first	Authenticated API endpoints
API Key-Based	Per-application limits, tiered plans	Key sharing, stolen keys	Public APIs, developer platforms
Combined	Multiple layers of protection	More complex to manage	Production systems

// Multi-layer rate limiting
function rateLimitKey(req) {
  // Layer 1: IP-based (catches unauthenticated abuse)
  const ipKey = `ip:${req.ip}`;

  // Layer 2: User-based (fair per-user limits)
  const userKey = req.user ? `user:${req.user.id}` : null;

  // Layer 3: Endpoint-specific (protect expensive operations)
  const endpointKey = `endpoint:${req.method}:${req.path}`;

  return { ipKey, userKey, endpointKey };
}

Rate Limiting at Different Layers

Layer Architecture

Layer	Technology	Purpose
CDN / Edge	Cloudflare, AWS CloudFront	Block volumetric attacks before they reach origin
WAF	AWS WAF, Cloudflare WAF	Rule-based blocking, bot detection
Load Balancer	NGINX, AWS ALB	Connection limits, request rate per IP
API Gateway	Kong, AWS API Gateway	Per-key limits, throttling policies
Application	Express middleware, custom code	User-based limits, business logic

NGINX Rate Limiting

# nginx.conf
http {
    # Define rate limit zone: 10 requests/second per IP
    limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s;

    # Stricter zone for login
    limit_req_zone $binary_remote_addr zone=login:10m rate=1r/s;

    server {
        location /api/ {
            limit_req zone=api burst=20 nodelay;
            limit_req_status 429;
            proxy_pass http://backend;
        }

        location /api/auth/login {
            limit_req zone=login burst=5;
            limit_req_status 429;
            proxy_pass http://backend;
        }
    }
}

Response Headers

Always communicate rate limit status to clients via standard headers:

// Standard rate limit response headers
res.setHeader('X-RateLimit-Limit', '100');        // Max requests per window
res.setHeader('X-RateLimit-Remaining', '87');     // Remaining requests
res.setHeader('X-RateLimit-Reset', '1700000060'); // Window reset (Unix timestamp)

// When rate limited (429 status)
res.setHeader('Retry-After', '60');  // Seconds until retry is allowed
res.status(429).json({
  error: 'rate_limit_exceeded',
  message: 'Too many requests. Please retry after 60 seconds.',
  retryAfter: 60
});

WAF Integration

Web Application Firewalls add intelligent rate limiting with bot detection, geographic filtering, and behavioral analysis. They integrate with DDoS protection systems.

// AWS WAF rate-based rule (CloudFormation)
{
  "Type": "AWS::WAFv2::WebACL",
  "Properties": {
    "Rules": [{
      "Name": "RateLimitRule",
      "Priority": 1,
      "Statement": {
        "RateBasedStatement": {
          "Limit": 2000,
          "AggregateKeyType": "IP"
        }
      },
      "Action": { "Block": {} }
    }]
  }
}

Circuit Breaker Pattern

When downstream services are overwhelmed, the circuit breaker prevents cascading failures by temporarily stopping requests. This is rate limiting applied to outbound calls.

class CircuitBreaker {
  constructor(options) {
    this.failureThreshold = options.failureThreshold || 5;
    this.resetTimeout = options.resetTimeout || 30000;
    this.state = 'CLOSED';  // CLOSED, OPEN, HALF_OPEN
    this.failureCount = 0;
    this.lastFailureTime = null;
  }

  async execute(fn) {
    if (this.state === 'OPEN') {
      if (Date.now() - this.lastFailureTime > this.resetTimeout) {
        this.state = 'HALF_OPEN';
      } else {
        throw new Error('Circuit breaker is OPEN');
      }
    }

    try {
      const result = await fn();
      this.onSuccess();
      return result;
    } catch (error) {
      this.onFailure();
      throw error;
    }
  }

  onSuccess() {
    this.failureCount = 0;
    this.state = 'CLOSED';
  }

  onFailure() {
    this.failureCount++;
    this.lastFailureTime = Date.now();
    if (this.failureCount >= this.failureThreshold) {
      this.state = 'OPEN';
    }
  }
}

Use our API and Network Tools to test rate limiting behavior and observe response headers. Explore API Security for broader protection strategies.

Graceful Degradation

Instead of hard-blocking rate-limited requests, consider graceful degradation strategies:

Throttle response quality — Return cached or simplified responses instead of full real-time data
Queue requests — Accept the request but process it with lower priority
Progressive delays — Add increasing delays to repeated requests (tarpit pattern)
CAPTCHA challenges — Require human verification before allowing continued access
Tiered limits — Allow higher limits for premium users, as supported by API security best practices

Frequently Asked Questions

A common practice is 5 failed attempts per 15 minutes per username and 20 failed attempts per 15 minutes per IP address. After exceeding limits, require a CAPTCHA or enforce a cooldown period. Also implement account lockout after 10 consecutive failures, requiring email verification to unlock. This balances security against legitimate users mistyping passwords.

How do I rate limit behind a load balancer without Redis?

Without a centralized store, each server tracks limits independently. This means a user gets N times the limit across N servers. For accurate distributed rate limiting, use Redis or a similar shared data store. Alternatively, use sticky sessions at the load balancer level so each user hits the same server, but this reduces balancing effectiveness.

Can attackers bypass IP-based rate limiting?

Yes. Attackers use rotating proxies, VPNs, botnets, and cloud instances to cycle through thousands of IP addresses. IP-based limiting is a necessary first defense but not sufficient alone. Combine with user-based limits, behavioral analysis, device fingerprinting, and DDoS protection services for comprehensive defense.

What is the difference between rate limiting and throttling?

Rate limiting rejects requests that exceed the limit (returning 429). Throttling slows down requests — adding delays or queuing them for later processing. Both control request rates but with different user experiences. Throttling is gentler but uses more server resources (holding connections open). Rate limiting is more resource-efficient but provides a harder cutoff. Most production systems use rate limiting with clear Retry-After headers.

How do I handle rate limiting in microservices?

Apply rate limiting at the API gateway for external-facing limits (per client or API key). For internal service-to-service calls, use circuit breakers and bulkhead patterns instead of hard rate limits. This prevents cascading failures while allowing internal services to communicate freely. Learn more about microservice patterns and explore our tools for testing rate limiting configurations.

Rate Limiting for Security: Protecting Your Systems from Abuse