📌 API Gateway — The Front Door to Your Microservices

An API Gateway is a server that acts as the single entry point for all client requests into a microservices architecture. Instead of clients communicating directly with dozens of backend services, every request flows through the gateway, which handles routing, security, rate limiting, and request transformation before forwarding traffic to the appropriate service.

Think of it as a smart reverse proxy on steroids. While a traditional reverse proxy simply forwards requests, an API gateway adds cross-cutting concerns like authentication, logging, circuit breaking, and protocol translation — all in one centralized layer. Companies like Netflix, Amazon, and Uber rely heavily on API gateways to manage billions of requests per day across hundreds of microservices.

In this guide, we will cover everything you need to know about API gateways — from core responsibilities and popular tools to the BFF pattern, security best practices, and real-world architecture examples. For a broader understanding of system design fundamentals, check out swehelper.com system design topics.

⚙️ Core Responsibilities of an API Gateway

An API gateway shoulders a wide range of responsibilities that would otherwise be duplicated across every microservice. Here are the key functions:

1. Request Routing

The gateway inspects the incoming request path, headers, or query parameters and routes it to the correct backend service. For instance, /api/users/* might route to the User Service while /api/orders/* goes to the Order Service. This decouples clients from the internal service topology — services can be split, merged, or moved without any client-side changes.

2. Authentication and Authorization

Rather than each service implementing its own auth logic, the gateway validates JWT tokens, API keys, or OAuth2 tokens at the edge. It can attach user identity information to downstream requests via headers like X-User-Id or X-User-Roles, so backend services trust the gateway's verification. Learn more about securing distributed systems on the authentication patterns page.

3. Rate Limiting and Throttling

Rate limiting protects backend services from being overwhelmed. The gateway enforces limits like "100 requests per minute per API key" using algorithms such as token bucket, sliding window, or fixed window. This is essential for both preventing abuse and ensuring fair resource allocation among tenants.

4. Request/Response Transformation

The gateway can modify requests before forwarding them (adding headers, rewriting paths, converting protocols) and transform responses before returning them to the client (stripping internal fields, aggregating results from multiple services, converting XML to JSON).

5. Load Balancing

API gateways distribute incoming traffic across multiple instances of a backend service using strategies like round-robin, least connections, or weighted routing. This complements service-level load balancers and is especially useful for canary deployments and A/B testing. For deeper coverage, visit swehelper.com load balancing guide.

6. Circuit Breaking and Resilience

When a downstream service is failing, the gateway can open a circuit breaker to fail fast rather than letting requests pile up. This prevents cascading failures across the entire system — a critical concern in distributed architectures.

7. Caching

Gateways can cache responses for idempotent GET requests, reducing load on backend services and improving latency. Cache invalidation strategies (TTL-based, event-driven) are configured per route.

🧩 Popular API Gateways — A Comparison

Choosing the right API gateway depends on your infrastructure, team expertise, and scale requirements. Here is a comparison of the most widely used gateways:

Feature	Kong	AWS API Gateway	Azure APIM	Nginx
Type	Open-source / Enterprise	Managed (AWS)	Managed (Azure)	Open-source / Commercial
Deployment	Self-hosted / Cloud	Fully managed	Fully managed	Self-hosted
Plugin Ecosystem	Rich (100+ plugins)	Lambda authorizers	Policy expressions	Lua / NJS modules
Protocol Support	REST, gRPC, GraphQL, WebSocket	REST, WebSocket, HTTP	REST, SOAP, GraphQL, WebSocket	REST, gRPC, WebSocket
Rate Limiting	Built-in plugin	Usage plans + API keys	Built-in policies	ngx_http_limit_req
Best For	Multi-cloud, Kubernetes	AWS-native workloads	Azure-native, enterprise	High-perf reverse proxy
Pricing	Free (OSS) / Enterprise $$	Pay per request	Tiered plans	Free (OSS) / Plus $$

Use swehelper.com comparison tools to evaluate these gateways against your specific requirements.

🔍 Code Examples — Configuration in Practice

Kong Gateway — Declarative Configuration

Kong uses a declarative YAML format (or its Admin API) to define services, routes, and plugins. Below is a configuration that sets up a service with JWT authentication and rate limiting:

_format_version: "3.0"

services:
  - name: user-service
    url: http://user-svc.internal:8080
    routes:
      - name: user-routes
        paths:
          - /api/users
        methods:
          - GET
          - POST
          - PUT
        strip_path: true
    plugins:
      - name: jwt
        config:
          claims_to_verify:
            - exp
          header_names:
            - Authorization
      - name: rate-limiting
        config:
          minute: 100
          hour: 5000
          policy: redis
          redis_host: redis.internal
          redis_port: 6379
      - name: correlation-id
        config:
          header_name: X-Request-ID
          generator: uuid

  - name: order-service
    url: http://order-svc.internal:8080
    routes:
      - name: order-routes
        paths:
          - /api/orders
        methods:
          - GET
          - POST
    plugins:
      - name: key-auth
        config:
          key_names:
            - X-API-Key
      - name: response-transformer
        config:
          remove:
            headers:
              - X-Internal-Trace-Id
            json:
              - internal_metadata

AWS API Gateway — CloudFormation / SAM Template

AWS API Gateway is configured through the AWS Console, CLI, or Infrastructure as Code. Here is a SAM template that defines an HTTP API with a Lambda authorizer and usage plan:

AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31

Resources:
  ApiGateway:
    Type: AWS::Serverless::HttpApi
    Properties:
      StageName: prod
      CorsConfiguration:
        AllowOrigins:
          - "https://app.example.com"
        AllowMethods:
          - GET
          - POST
          - PUT
          - DELETE
        AllowHeaders:
          - Authorization
          - Content-Type

  GetUsersFunction:
    Type: AWS::Serverless::Function
    Properties:
      Handler: handlers/users.getAll
      Runtime: nodejs18.x
      MemorySize: 256
      Timeout: 30
      Events:
        GetUsers:
          Type: HttpApi
          Properties:
            ApiId: !Ref ApiGateway
            Path: /api/users
            Method: GET

  UsagePlan:
    Type: AWS::ApiGateway::UsagePlan
    Properties:
      UsagePlanName: StandardPlan
      Throttle:
        BurstLimit: 200
        RateLimit: 100
      Quota:
        Limit: 10000
        Period: DAY

Rate Limiting with Sliding Window (Pseudocode)

Understanding how rate limiting works under the hood is essential. Here is a sliding window counter implementation commonly used inside gateways:

class SlidingWindowRateLimiter:
    def __init__(self, max_requests, window_seconds):
        self.max_requests = max_requests
        self.window_seconds = window_seconds
        self.request_log = {}  # key -> deque of timestamps

    def is_allowed(self, client_id):
        now = time.time()
        window_start = now - self.window_seconds

        if client_id not in self.request_log:
            self.request_log[client_id] = deque()

        log = self.request_log[client_id]

        # Remove timestamps outside the current window
        while log and log[0] < window_start:
            log.popleft()

        if len(log) < self.max_requests:
            log.append(now)
            return True  # Request allowed

        return False  # Rate limit exceeded

# Usage: 100 requests per 60-second window
limiter = SlidingWindowRateLimiter(max_requests=100, window_seconds=60)

if limiter.is_allowed("user-abc-123"):
    forward_to_backend(request)
else:
    return Response(status=429, body="Too Many Requests")

💡 The BFF (Backend for Frontend) Pattern

The Backend for Frontend pattern is an evolution of the API gateway concept where you create separate gateway layers tailored to each client type — one for web, one for mobile, one for third-party integrations, and so on.

Why BFF?

Different clients have different needs. A mobile app on a slow 3G connection needs minimal, compressed payloads. A web dashboard needs rich, nested data. A third-party API consumer needs stable, versioned endpoints. A single monolithic gateway trying to serve all these needs becomes bloated and difficult to maintain.

BFF Architecture

┌──────────┐  ┌──────────┐  ┌──────────────┐
│ Web App  │  │ Mobile   │  │ Partner API  │
│ (React)  │  │ (iOS/And)│  │ (3rd Party)  │
└────┬─────┘  └────┬─────┘  └──────┬───────┘
     │             │               │
     ▼             ▼               ▼
┌─────────┐  ┌──────────┐  ┌─────────────┐
│ Web BFF │  │Mobile BFF│  │ Partner BFF │
│ GraphQL │  │ REST/slim│  │ REST/versnd │
└────┬────┘  └────┬─────┘  └──────┬──────┘
     │             │               │
     └─────────┬───┴───────┬───────┘
               ▼           ▼
        ┌────────────┐ ┌────────────┐
        │ User Svc   │ │ Order Svc  │
        └────────────┘ └────────────┘

Each BFF is owned by the frontend team it serves. The Web BFF might use GraphQL to let the frontend query exactly what it needs, while the Mobile BFF returns pre-shaped, minimal JSON payloads. The Partner BFF provides stable versioned REST endpoints with strict rate limits.

When to Use BFF

Multiple client types with significantly different data needs
Teams organized by client (web team, mobile team) that want autonomy
Performance optimization — mobile BFF can aggressively cache and compress
Avoid if you only have one client type or your gateway logic is simple

For more architectural patterns, explore microservices patterns on swehelper.com.

🔐 Security Features and Best Practices

The API gateway is your first line of defense. Here are critical security capabilities and practices:

TLS Termination: Terminate HTTPS at the gateway and use mTLS for internal service-to-service communication. Never pass unencrypted traffic beyond the gateway.
Input Validation: Validate request schemas (OpenAPI spec enforcement) at the gateway to reject malformed requests before they reach backend services.
IP Whitelisting / Blacklisting: Block known malicious IPs or restrict access to specific CIDR ranges for internal APIs.
CORS Enforcement: Centralize CORS policy at the gateway so individual services do not need to manage it.
Bot Detection: Integrate with WAF (Web Application Firewall) services to detect and block automated attacks, SQL injection, and XSS attempts.
API Key Rotation: Enforce key expiration policies and provide self-service key management portals for consumers.
Payload Size Limits: Set maximum request body sizes to prevent denial-of-service attacks via large payloads.

A common security architecture layers an API gateway behind a CDN/WAF (like CloudFront + AWS WAF or Azure Front Door) for DDoS protection, while the gateway handles application-level security. See more at swehelper.com security patterns.

📊 Monitoring and Observability

Since all traffic flows through the gateway, it is the ideal place to instrument observability. Key metrics and practices include:

Metric	What It Tells You	Alert Threshold Example
Request Rate (RPS)	Traffic volume and trends	Spike above 3x baseline
Error Rate (4xx/5xx)	Client or server-side failures	5xx rate exceeds 1%
P50/P95/P99 Latency	Response time distribution	P99 above 500ms
Rate Limit Hits	Clients hitting throttle limits	Consistent 429 responses
Circuit Breaker State	Downstream service health	Circuit opens for any service

The gateway should inject a correlation ID (e.g., X-Request-ID) into every request so you can trace a single user action across all downstream services. Integrate with distributed tracing systems like Jaeger, Zipkin, or AWS X-Ray. Export metrics to Prometheus/Grafana or your cloud provider's monitoring stack.

Use swehelper.com latency calculator to model the impact of gateway overhead on your end-to-end response times.

✅ Patterns and Anti-Patterns

Good Patterns

Gateway as a thin layer: Keep business logic out of the gateway. It should only handle cross-cutting concerns. If your gateway has if/else branches based on business rules, something is wrong.
Declarative configuration: Define routes, plugins, and policies as code (YAML/JSON) in version control. This enables code review, rollback, and reproducibility.
Health check endpoints: Configure the gateway to actively probe backend health (/health or /ready) and remove unhealthy instances from the routing pool automatically.
Graceful degradation: When a non-critical service is down, return cached or default responses instead of propagating errors. For example, if the recommendation service is down, show popular items instead of an error.
API versioning at the gateway: Route /v1/users to the legacy service and /v2/users to the new service. This allows gradual migration without client disruption.

Anti-Patterns to Avoid

God Gateway: Stuffing business logic, data transformation, orchestration, and even database calls into the gateway. This creates a monolithic bottleneck that defeats the purpose of microservices.
Single point of failure: Running a single gateway instance without redundancy. Always deploy at least two instances behind a load balancer with health checks.
Tight coupling to gateway vendor: Embedding vendor-specific constructs deeply into your service contracts. Use standard protocols (OpenAPI, gRPC) so you can swap gateways if needed.
Ignoring gateway latency: Every hop adds latency. An API gateway typically adds 5-20ms. If you chain multiple gateways (edge gateway to internal gateway to service mesh sidecar), the cumulative overhead can become significant.
No canary or blue-green strategy: Deploying gateway configuration changes to all traffic at once. Always use traffic splitting to roll out changes gradually.

🏗️ Real-World Architecture Example

Here is a production-grade API gateway architecture for an e-commerce platform handling 50,000 requests per second:

Internet Traffic
       │
       ▼
┌──────────────┐
│  CloudFront  │  CDN + DDoS protection
│  + AWS WAF   │  Edge caching for static content
└──────┬───────┘
       │
       ▼
┌──────────────┐
│   AWS ALB    │  Layer 7 load balancer
│ (multi-AZ)   │  SSL termination, health checks
└──────┬───────┘
       │
       ▼
┌──────────────────────────────────────┐
│          Kong Gateway Cluster        │
│  (3 nodes, Kubernetes, PostgreSQL)   │
│                                      │
│  Plugins: JWT auth, rate limiting,   │
│  correlation-id, prometheus,         │
│  response-transformer, bot-detect    │
└──┬───────┬───────┬───────┬───────┬──┘
   │       │       │       │       │
   ▼       ▼       ▼       ▼       ▼
 User    Order  Payment Catalog  Search
 Svc     Svc     Svc     Svc     Svc
 (x4)   (x3)    (x2)    (x6)    (x3)

Key design decisions in this architecture:

Three-tier entry: CDN/WAF handles volumetric attacks and caching, ALB provides high-availability load balancing, Kong handles application-level gateway logic.
Kong on Kubernetes: Auto-scales gateway pods based on CPU and request rate. PostgreSQL stores gateway configuration (routes, plugins, consumers).
Service-level scaling: The Catalog service has 6 replicas because it handles the most read traffic. Payment has only 2 because it processes fewer but more critical requests.
Redis-backed rate limiting: Shared Redis cluster ensures rate limits are enforced consistently across all Kong nodes, not per-node.

For hands-on practice designing architectures like this, try the swehelper.com system design simulator.

🔄 API Gateway vs Service Mesh

A common source of confusion is the overlap between API gateways and service meshes (like Istio or Linkerd). Here is the distinction:

Aspect	API Gateway	Service Mesh
Traffic Scope	North-south (external to internal)	East-west (service to service)
Deployment	Centralized edge proxy	Sidecar per service instance
Primary Focus	External API management	Internal service communication
Use Together?	Yes — API gateway at the edge, service mesh internally. They are complementary.

In mature architectures, both work together. The API gateway handles external client concerns (API keys, public rate limits, request shaping) while the service mesh handles internal concerns (mTLS between services, retry policies, circuit breaking). Explore service mesh patterns on swehelper.com for deeper coverage.

❓ Frequently Asked Questions

Q1: When should I introduce an API gateway?

Introduce a gateway when you have more than 2-3 microservices that clients consume directly. If you are building a monolith or have a single backend, a simple reverse proxy (Nginx) is sufficient. The tipping point is when you find yourself duplicating auth, rate limiting, or CORS configuration across multiple services — that is when a gateway pays for itself.

Q2: Does an API gateway add significant latency?

A well-configured gateway adds 5-20ms of latency in most cases. Managed gateways like AWS API Gateway add slightly more (10-30ms) due to their multi-tenant infrastructure. Self-hosted gateways like Kong or Envoy running close to your services add minimal overhead. The latency tradeoff is almost always worth it for the security, observability, and operational benefits you gain.

Q3: Can I use an API gateway with GraphQL?

Yes. You have two common approaches: (1) Place a GraphQL server (like Apollo Gateway) behind the API gateway, which handles auth and rate limiting at the REST/HTTP level, or (2) use a gateway that natively supports GraphQL (Kong has a GraphQL plugin, and Azure APIM supports GraphQL APIs). The BFF pattern works particularly well here — your Web BFF can expose GraphQL while other BFFs expose REST.

Q4: How do I handle API versioning through the gateway?

Three common strategies: URL path versioning (/v1/users, /v2/users) is the simplest — the gateway routes each version to the appropriate backend. Header-based versioning (Accept: application/vnd.api.v2+json) keeps URLs clean but is harder to debug. Query parameter versioning (?version=2) is the least recommended. Whichever you choose, the gateway should handle the routing so backend services only need to serve their own version.

Q5: What is the difference between an API gateway and a reverse proxy?

A reverse proxy (Nginx, HAProxy) forwards requests to backend servers with basic load balancing. An API gateway does everything a reverse proxy does plus application-aware features: authentication, rate limiting, request transformation, API analytics, developer portal, and more. Think of an API gateway as a reverse proxy with an extensive plugin system designed specifically for API management. Many organizations start with Nginx and evolve to Kong or a managed gateway as their API surface grows.

API gateways are a foundational building block in modern distributed systems. Whether you choose a managed service like AWS API Gateway for simplicity or a self-hosted solution like Kong for flexibility, the key is keeping the gateway thin, observable, and focused on cross-cutting concerns. For more system design topics and interview preparation, visit swehelper.com system design.

📌 API Gateway — The Front Door to Your Microservices