Service Discovery in Distributed Systems

Service discovery is the process by which services in a distributed system find and communicate with each other. In a dynamic environment where services scale up and down, containers are created and destroyed, and IP addresses change constantly, hard-coding service locations is not viable. Service discovery provides the mechanism for services to register themselves and discover other services at runtime.

Why Service Discovery Is Needed

In traditional monolithic applications, components communicate through in-process function calls. In microservices, each service runs as a separate process, often in containers that are dynamically assigned IP addresses. Service discovery solves the fundamental problem: how does Service A know where Service B is running right now?

Dynamic Scaling: Auto-scaling adds or removes instances constantly
Container Orchestration: Containers get new IPs each time they start
Failure Recovery: Failed instances are replaced with new ones at different addresses
Multi-Environment: Services run at different locations in dev, staging, and production

Client-Side vs Server-Side Discovery

Aspect	Client-Side Discovery	Server-Side Discovery
How it works	Client queries registry, picks instance, makes request directly	Client sends request to load balancer, which queries registry and routes
Load Balancing	Client-side (e.g., round-robin in client)	Server-side (load balancer decides)
Complexity	Client must implement discovery logic	Client is simpler; load balancer is the single point
Language Coupling	Discovery library needed per language	Language-agnostic (HTTP to load balancer)
Examples	Netflix Eureka + Ribbon, Consul + client library	AWS ALB, Kubernetes Services, Nginx

Consul by HashiCorp

Consul is a full-featured service mesh solution that includes service discovery, health checking, key-value storage, and multi-datacenter support.

# Register a service with Consul via HTTP API
curl -X PUT http://localhost:8500/v1/agent/service/register \
  -d '{
    "ID": "payment-service-1",
    "Name": "payment-service",
    "Tags": ["v2", "primary"],
    "Address": "10.0.1.50",
    "Port": 8080,
    "Check": {
      "HTTP": "http://10.0.1.50:8080/health",
      "Interval": "10s",
      "Timeout": "5s"
    }
  }'

# Discover healthy instances
curl http://localhost:8500/v1/health/service/payment-service?passing=true

Consul also supports DNS-based discovery, so any application can resolve service names:

# DNS lookup for payment-service
dig @localhost -p 8600 payment-service.service.consul SRV

# Returns SRV records like:
# payment-service.service.consul. 0 IN SRV 1 1 8080 10.0.1.50.

Netflix Eureka

Eureka is a client-side service discovery system developed by Netflix. It consists of a Eureka Server (the registry) and Eureka Clients (the services).

// Spring Boot Eureka Server
@SpringBootApplication
@EnableEurekaServer
public class EurekaServerApplication {
    public static void main(String[] args) {
        SpringApplication.run(EurekaServerApplication.class, args);
    }
}

// application.yml for Eureka Server
server:
  port: 8761
eureka:
  client:
    registerWithEureka: false
    fetchRegistry: false

// Eureka Client - Payment Service
@SpringBootApplication
@EnableEurekaClient
public class PaymentServiceApplication {
    public static void main(String[] args) {
        SpringApplication.run(PaymentServiceApplication.class, args);
    }
}

// application.yml for the client
spring:
  application:
    name: payment-service
eureka:
  client:
    serviceUrl:
      defaultZone: http://localhost:8761/eureka/
  instance:
    preferIpAddress: true
    leaseRenewalIntervalInSeconds: 10

Kubernetes Service Discovery

Kubernetes has built-in service discovery through its Service resource. When you create a Service, Kubernetes automatically creates a DNS entry that other pods can use to find it.

apiVersion: v1
kind: Service
metadata:
  name: payment-service
  namespace: production
spec:
  selector:
    app: payment
  ports:
    - protocol: TCP
      port: 80
      targetPort: 8080
  type: ClusterIP

Other pods access the service using DNS:

# From any pod in the same namespace
curl http://payment-service/api/charge

# From a different namespace
curl http://payment-service.production.svc.cluster.local/api/charge

# Kubernetes DNS resolves this to the ClusterIP
# kube-proxy routes to a healthy pod via iptables/IPVS rules

DNS-Based Service Discovery

DNS-based discovery uses standard DNS records (A, AAAA, SRV) to resolve service names to IP addresses. This approach is language-agnostic and requires no special client libraries.

DNS Record Type	Use Case	Limitation
A Record	Maps hostname to IP address	No port information, TTL caching delays
SRV Record	Includes port, priority, and weight	Not all clients support SRV lookups
CNAME Record	Alias to another domain	Additional DNS lookup required

Health Checking

Service discovery is only useful if it returns healthy instances. Health checks are a critical component:

Liveness checks: Is the service process running?
Readiness checks: Is the service ready to accept traffic?
Deep health checks: Can the service connect to its dependencies (database, cache)?

from flask import Flask, jsonify
import psycopg2

app = Flask(__name__)

@app.route("/health/live")
def liveness():
    return jsonify({"status": "alive"}), 200

@app.route("/health/ready")
def readiness():
    try:
        conn = psycopg2.connect(DATABASE_URL)
        conn.execute("SELECT 1")
        conn.close()
        cache_client.ping()
        return jsonify({"status": "ready"}), 200
    except Exception as e:
        return jsonify({"status": "not_ready", "error": str(e)}), 503

Comparison of Service Discovery Solutions

Feature	Consul	Eureka	Kubernetes	etcd
Discovery Type	Both	Client-side	Server-side	Client-side
Health Checking	Built-in	Heartbeat	Probe-based	TTL-based
DNS Support	Yes	No	Yes (CoreDNS)	No
Multi-DC	Native	Federation	Multi-cluster	No
Consistency	CP (Raft)	AP	CP (etcd)	CP (Raft)

Service discovery is closely related to consistent hashing for request routing and leader election for registry high availability. For more on building resilient service-to-service communication, see our circuit breaker guide.

Frequently Asked Questions

Q: Should I use client-side or server-side discovery?

If you are running on Kubernetes, use its built-in server-side discovery via Services. For non-Kubernetes environments, Consul provides excellent flexibility with both approaches. Client-side discovery (Eureka) gives you more control over load balancing but couples your client code to the discovery mechanism.

Q: How does service discovery work with service meshes?

Service meshes like Istio use the sidecar proxy pattern where a proxy (Envoy) handles discovery automatically. Your application code makes requests to localhost, and the proxy resolves the target service via the control plane. This is the most transparent approach for service discovery.

Q: What happens if the service registry goes down?

Most discovery clients cache the last known service locations. Eureka clients, for example, maintain a local cache refreshed every 30 seconds. If the registry goes down, services continue using cached data. Consul achieves high availability through its Raft consensus protocol. The registry itself should be deployed as a highly available cluster.

Q: How do I handle service discovery across multiple regions?

Consul natively supports multi-datacenter deployment where each data center has its own Consul cluster. Services can discover services in other data centers using the .dc1.consul suffix. Kubernetes supports multi-cluster service discovery through projects like Submariner or Istio multi-cluster mesh. See our geo-distribution guide for more patterns.

Service Discovery in Distributed Systems