Skip to main content
🔐Security

Rate Limiting for Security: Protecting Your Systems from Abuse

Rate limiting is one of the most effective first lines of defense against a wide range of security threats. From brute force login attacks to DDoS attacks ...

📖 8 min read

Rate Limiting for Security: Protecting Your Systems from Abuse

Rate limiting is one of the most effective first lines of defense against a wide range of security threats. From brute force login attacks to DDoS attacks and API scraping, controlling the rate of incoming requests prevents resource exhaustion and protects both your infrastructure and your users. This guide covers algorithms, implementation strategies, and real-world deployment patterns.

Why Rate Limiting Matters for Security

Threat Without Rate Limiting With Rate Limiting
Brute Force Login Millions of password guesses per hour 5 attempts per 15 minutes
Credential Stuffing Automated testing of leaked credentials Slowed to useless pace
API Scraping Entire database exfiltrated in minutes Limited to acceptable data access rate
DDoS (Application Layer) Server overloaded and crashes Excess requests rejected early
Resource Exhaustion Expensive queries drain CPU/memory Request throughput capped per user

Rate Limiting Algorithms

1. Token Bucket

The most widely used algorithm. A bucket holds tokens that are consumed per request. Tokens refill at a steady rate. If the bucket is empty, requests are rejected. This allows short bursts while enforcing an average rate.

class TokenBucket {
  constructor(capacity, refillRate) {
    this.capacity = capacity;      // Max tokens (burst size)
    this.tokens = capacity;        // Current tokens
    this.refillRate = refillRate;  // Tokens per second
    this.lastRefill = Date.now();
  }

  tryConsume(tokens = 1) {
    this.refill();
    if (this.tokens >= tokens) {
      this.tokens -= tokens;
      return true;  // Request allowed
    }
    return false;   // Rate limited
  }

  refill() {
    const now = Date.now();
    const elapsed = (now - this.lastRefill) / 1000;
    this.tokens = Math.min(
      this.capacity,
      this.tokens + elapsed * this.refillRate
    );
    this.lastRefill = now;
  }
}

// Usage: 10 requests/second with burst of 20
const bucket = new TokenBucket(20, 10);
if (!bucket.tryConsume()) {
  res.status(429).json({ error: 'Rate limit exceeded' });
}

2. Leaky Bucket

Requests enter a queue (bucket) and are processed at a fixed rate. If the queue is full, new requests are dropped. This smooths out bursts, producing a steady output rate.

class LeakyBucket {
  constructor(capacity, leakRate) {
    this.capacity = capacity;   // Queue size
    this.queue = [];
    this.leakRate = leakRate;   // Requests processed per second
    this.lastLeak = Date.now();
  }

  tryAdd(request) {
    this.leak();
    if (this.queue.length < this.capacity) {
      this.queue.push(request);
      return true;
    }
    return false; // Queue full, reject
  }

  leak() {
    const now = Date.now();
    const elapsed = (now - this.lastLeak) / 1000;
    const leaked = Math.floor(elapsed * this.leakRate);
    this.queue.splice(0, leaked);
    if (leaked > 0) this.lastLeak = now;
  }
}

3. Fixed Window Counter

Counts requests in fixed time windows (e.g., per minute). Simple but has a boundary problem: a user can send 2x the limit by timing requests at the window boundary.

// Redis-based fixed window counter
async function fixedWindowRateLimit(userId, limit, windowSec) {
  const window = Math.floor(Date.now() / 1000 / windowSec);
  const key = `ratelimit:${userId}:${window}`;

  const current = await redis.incr(key);
  if (current === 1) {
    await redis.expire(key, windowSec);
  }

  return current <= limit;
}

4. Sliding Window Log

Stores the timestamp of each request and counts how many fall within the sliding window. Most accurate but requires more memory.

async function slidingWindowLog(userId, limit, windowMs) {
  const now = Date.now();
  const windowStart = now - windowMs;
  const key = `ratelimit:${userId}`;

  // Remove expired entries and add current request
  const pipeline = redis.pipeline();
  pipeline.zremrangebyscore(key, 0, windowStart);
  pipeline.zadd(key, now, `${now}:${Math.random()}`);
  pipeline.zcard(key);
  pipeline.expire(key, Math.ceil(windowMs / 1000));

  const results = await pipeline.exec();
  const count = results[2][1];

  return count <= limit;
}

Algorithm Comparison

Algorithm Memory Accuracy Burst Handling Complexity
Token Bucket Low (O(1)) Good Allows controlled bursts Low
Leaky Bucket Medium (queue) Good Smooths output rate Low
Fixed Window Very Low Low (boundary issue) 2x burst at boundary Very Low
Sliding Window Log High (per request) Very High Precise enforcement Medium
Sliding Window Counter Low High Good approximation Low

IP-Based vs User-Based Rate Limiting

Strategy Pros Cons Best For
IP-Based Works for unauthenticated requests Shared IPs (NAT, corporate) punish all users Login pages, public endpoints
User-Based Fair per-user limits, unaffected by shared IPs Requires authentication first Authenticated API endpoints
API Key-Based Per-application limits, tiered plans Key sharing, stolen keys Public APIs, developer platforms
Combined Multiple layers of protection More complex to manage Production systems
// Multi-layer rate limiting
function rateLimitKey(req) {
  // Layer 1: IP-based (catches unauthenticated abuse)
  const ipKey = `ip:${req.ip}`;

  // Layer 2: User-based (fair per-user limits)
  const userKey = req.user ? `user:${req.user.id}` : null;

  // Layer 3: Endpoint-specific (protect expensive operations)
  const endpointKey = `endpoint:${req.method}:${req.path}`;

  return { ipKey, userKey, endpointKey };
}

Rate Limiting at Different Layers

Layer Architecture

Layer Technology Purpose
CDN / Edge Cloudflare, AWS CloudFront Block volumetric attacks before they reach origin
WAF AWS WAF, Cloudflare WAF Rule-based blocking, bot detection
Load Balancer NGINX, AWS ALB Connection limits, request rate per IP
API Gateway Kong, AWS API Gateway Per-key limits, throttling policies
Application Express middleware, custom code User-based limits, business logic

NGINX Rate Limiting

# nginx.conf
http {
    # Define rate limit zone: 10 requests/second per IP
    limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s;

    # Stricter zone for login
    limit_req_zone $binary_remote_addr zone=login:10m rate=1r/s;

    server {
        location /api/ {
            limit_req zone=api burst=20 nodelay;
            limit_req_status 429;
            proxy_pass http://backend;
        }

        location /api/auth/login {
            limit_req zone=login burst=5;
            limit_req_status 429;
            proxy_pass http://backend;
        }
    }
}

Response Headers

Always communicate rate limit status to clients via standard headers:

// Standard rate limit response headers
res.setHeader('X-RateLimit-Limit', '100');        // Max requests per window
res.setHeader('X-RateLimit-Remaining', '87');     // Remaining requests
res.setHeader('X-RateLimit-Reset', '1700000060'); // Window reset (Unix timestamp)

// When rate limited (429 status)
res.setHeader('Retry-After', '60');  // Seconds until retry is allowed
res.status(429).json({
  error: 'rate_limit_exceeded',
  message: 'Too many requests. Please retry after 60 seconds.',
  retryAfter: 60
});

WAF Integration

Web Application Firewalls add intelligent rate limiting with bot detection, geographic filtering, and behavioral analysis. They integrate with DDoS protection systems.

// AWS WAF rate-based rule (CloudFormation)
{
  "Type": "AWS::WAFv2::WebACL",
  "Properties": {
    "Rules": [{
      "Name": "RateLimitRule",
      "Priority": 1,
      "Statement": {
        "RateBasedStatement": {
          "Limit": 2000,
          "AggregateKeyType": "IP"
        }
      },
      "Action": { "Block": {} }
    }]
  }
}

Circuit Breaker Pattern

When downstream services are overwhelmed, the circuit breaker prevents cascading failures by temporarily stopping requests. This is rate limiting applied to outbound calls.

class CircuitBreaker {
  constructor(options) {
    this.failureThreshold = options.failureThreshold || 5;
    this.resetTimeout = options.resetTimeout || 30000;
    this.state = 'CLOSED';  // CLOSED, OPEN, HALF_OPEN
    this.failureCount = 0;
    this.lastFailureTime = null;
  }

  async execute(fn) {
    if (this.state === 'OPEN') {
      if (Date.now() - this.lastFailureTime > this.resetTimeout) {
        this.state = 'HALF_OPEN';
      } else {
        throw new Error('Circuit breaker is OPEN');
      }
    }

    try {
      const result = await fn();
      this.onSuccess();
      return result;
    } catch (error) {
      this.onFailure();
      throw error;
    }
  }

  onSuccess() {
    this.failureCount = 0;
    this.state = 'CLOSED';
  }

  onFailure() {
    this.failureCount++;
    this.lastFailureTime = Date.now();
    if (this.failureCount >= this.failureThreshold) {
      this.state = 'OPEN';
    }
  }
}

Use our API and Network Tools to test rate limiting behavior and observe response headers. Explore API Security for broader protection strategies.

Graceful Degradation

Instead of hard-blocking rate-limited requests, consider graceful degradation strategies:

  • Throttle response quality — Return cached or simplified responses instead of full real-time data
  • Queue requests — Accept the request but process it with lower priority
  • Progressive delays — Add increasing delays to repeated requests (tarpit pattern)
  • CAPTCHA challenges — Require human verification before allowing continued access
  • Tiered limits — Allow higher limits for premium users, as supported by API security best practices

Frequently Asked Questions

What rate limits should I set for login endpoints?

A common practice is 5 failed attempts per 15 minutes per username and 20 failed attempts per 15 minutes per IP address. After exceeding limits, require a CAPTCHA or enforce a cooldown period. Also implement account lockout after 10 consecutive failures, requiring email verification to unlock. This balances security against legitimate users mistyping passwords.

How do I rate limit behind a load balancer without Redis?

Without a centralized store, each server tracks limits independently. This means a user gets N times the limit across N servers. For accurate distributed rate limiting, use Redis or a similar shared data store. Alternatively, use sticky sessions at the load balancer level so each user hits the same server, but this reduces balancing effectiveness.

Can attackers bypass IP-based rate limiting?

Yes. Attackers use rotating proxies, VPNs, botnets, and cloud instances to cycle through thousands of IP addresses. IP-based limiting is a necessary first defense but not sufficient alone. Combine with user-based limits, behavioral analysis, device fingerprinting, and DDoS protection services for comprehensive defense.

What is the difference between rate limiting and throttling?

Rate limiting rejects requests that exceed the limit (returning 429). Throttling slows down requests — adding delays or queuing them for later processing. Both control request rates but with different user experiences. Throttling is gentler but uses more server resources (holding connections open). Rate limiting is more resource-efficient but provides a harder cutoff. Most production systems use rate limiting with clear Retry-After headers.

How do I handle rate limiting in microservices?

Apply rate limiting at the API gateway for external-facing limits (per client or API key). For internal service-to-service calls, use circuit breakers and bulkhead patterns instead of hard rate limits. This prevents cascading failures while allowing internal services to communicate freely. Learn more about microservice patterns and explore our tools for testing rate limiting configurations.

Related Articles