Service Discovery in Distributed Systems
Service discovery is the process by which services in a distributed system find and communicate with each other. In a dynamic environment where services scale up and down, containers are created and destroyed, and IP addresses change constantly, hard-coding service locations is not viable. Service discovery provides the mechanism for services to register themselves and discover other services at runtime.
Why Service Discovery Is Needed
In traditional monolithic applications, components communicate through in-process function calls. In microservices, each service runs as a separate process, often in containers that are dynamically assigned IP addresses. Service discovery solves the fundamental problem: how does Service A know where Service B is running right now?
- Dynamic Scaling: Auto-scaling adds or removes instances constantly
- Container Orchestration: Containers get new IPs each time they start
- Failure Recovery: Failed instances are replaced with new ones at different addresses
- Multi-Environment: Services run at different locations in dev, staging, and production
Client-Side vs Server-Side Discovery
| Aspect | Client-Side Discovery | Server-Side Discovery |
|---|---|---|
| How it works | Client queries registry, picks instance, makes request directly | Client sends request to load balancer, which queries registry and routes |
| Load Balancing | Client-side (e.g., round-robin in client) | Server-side (load balancer decides) |
| Complexity | Client must implement discovery logic | Client is simpler; load balancer is the single point |
| Language Coupling | Discovery library needed per language | Language-agnostic (HTTP to load balancer) |
| Examples | Netflix Eureka + Ribbon, Consul + client library | AWS ALB, Kubernetes Services, Nginx |
Consul by HashiCorp
Consul is a full-featured service mesh solution that includes service discovery, health checking, key-value storage, and multi-datacenter support.
# Register a service with Consul via HTTP API
curl -X PUT http://localhost:8500/v1/agent/service/register \
-d '{
"ID": "payment-service-1",
"Name": "payment-service",
"Tags": ["v2", "primary"],
"Address": "10.0.1.50",
"Port": 8080,
"Check": {
"HTTP": "http://10.0.1.50:8080/health",
"Interval": "10s",
"Timeout": "5s"
}
}'
# Discover healthy instances
curl http://localhost:8500/v1/health/service/payment-service?passing=true
Consul also supports DNS-based discovery, so any application can resolve service names:
# DNS lookup for payment-service
dig @localhost -p 8600 payment-service.service.consul SRV
# Returns SRV records like:
# payment-service.service.consul. 0 IN SRV 1 1 8080 10.0.1.50.
Netflix Eureka
Eureka is a client-side service discovery system developed by Netflix. It consists of a Eureka Server (the registry) and Eureka Clients (the services).
// Spring Boot Eureka Server
@SpringBootApplication
@EnableEurekaServer
public class EurekaServerApplication {
public static void main(String[] args) {
SpringApplication.run(EurekaServerApplication.class, args);
}
}
// application.yml for Eureka Server
server:
port: 8761
eureka:
client:
registerWithEureka: false
fetchRegistry: false
// Eureka Client - Payment Service
@SpringBootApplication
@EnableEurekaClient
public class PaymentServiceApplication {
public static void main(String[] args) {
SpringApplication.run(PaymentServiceApplication.class, args);
}
}
// application.yml for the client
spring:
application:
name: payment-service
eureka:
client:
serviceUrl:
defaultZone: http://localhost:8761/eureka/
instance:
preferIpAddress: true
leaseRenewalIntervalInSeconds: 10
Kubernetes Service Discovery
Kubernetes has built-in service discovery through its Service resource. When you create a Service, Kubernetes automatically creates a DNS entry that other pods can use to find it.
apiVersion: v1
kind: Service
metadata:
name: payment-service
namespace: production
spec:
selector:
app: payment
ports:
- protocol: TCP
port: 80
targetPort: 8080
type: ClusterIP
Other pods access the service using DNS:
# From any pod in the same namespace
curl http://payment-service/api/charge
# From a different namespace
curl http://payment-service.production.svc.cluster.local/api/charge
# Kubernetes DNS resolves this to the ClusterIP
# kube-proxy routes to a healthy pod via iptables/IPVS rules
DNS-Based Service Discovery
DNS-based discovery uses standard DNS records (A, AAAA, SRV) to resolve service names to IP addresses. This approach is language-agnostic and requires no special client libraries.
| DNS Record Type | Use Case | Limitation |
|---|---|---|
| A Record | Maps hostname to IP address | No port information, TTL caching delays |
| SRV Record | Includes port, priority, and weight | Not all clients support SRV lookups |
| CNAME Record | Alias to another domain | Additional DNS lookup required |
Health Checking
Service discovery is only useful if it returns healthy instances. Health checks are a critical component:
- Liveness checks: Is the service process running?
- Readiness checks: Is the service ready to accept traffic?
- Deep health checks: Can the service connect to its dependencies (database, cache)?
from flask import Flask, jsonify
import psycopg2
app = Flask(__name__)
@app.route("/health/live")
def liveness():
return jsonify({"status": "alive"}), 200
@app.route("/health/ready")
def readiness():
try:
conn = psycopg2.connect(DATABASE_URL)
conn.execute("SELECT 1")
conn.close()
cache_client.ping()
return jsonify({"status": "ready"}), 200
except Exception as e:
return jsonify({"status": "not_ready", "error": str(e)}), 503
Comparison of Service Discovery Solutions
| Feature | Consul | Eureka | Kubernetes | etcd |
|---|---|---|---|---|
| Discovery Type | Both | Client-side | Server-side | Client-side |
| Health Checking | Built-in | Heartbeat | Probe-based | TTL-based |
| DNS Support | Yes | No | Yes (CoreDNS) | No |
| Multi-DC | Native | Federation | Multi-cluster | No |
| Consistency | CP (Raft) | AP | CP (etcd) | CP (Raft) |
Service discovery is closely related to consistent hashing for request routing and leader election for registry high availability. For more on building resilient service-to-service communication, see our circuit breaker guide.
Frequently Asked Questions
Q: Should I use client-side or server-side discovery?
If you are running on Kubernetes, use its built-in server-side discovery via Services. For non-Kubernetes environments, Consul provides excellent flexibility with both approaches. Client-side discovery (Eureka) gives you more control over load balancing but couples your client code to the discovery mechanism.
Q: How does service discovery work with service meshes?
Service meshes like Istio use the sidecar proxy pattern where a proxy (Envoy) handles discovery automatically. Your application code makes requests to localhost, and the proxy resolves the target service via the control plane. This is the most transparent approach for service discovery.
Q: What happens if the service registry goes down?
Most discovery clients cache the last known service locations. Eureka clients, for example, maintain a local cache refreshed every 30 seconds. If the registry goes down, services continue using cached data. Consul achieves high availability through its Raft consensus protocol. The registry itself should be deployed as a highly available cluster.
Q: How do I handle service discovery across multiple regions?
Consul natively supports multi-datacenter deployment where each data center has its own Consul cluster. Services can discover services in other data centers using the .dc1.consul suffix. Kubernetes supports multi-cluster service discovery through projects like Submariner or Istio multi-cluster mesh. See our geo-distribution guide for more patterns.