Delivery Semantics: At-Most-Once, At-Least-Once, and Exactly-Once Explained
When a message travels from producer to consumer through a message queue or event stream, what guarantees do you have about delivery? Will the message arrive? Could it arrive more than once? Understanding delivery semantics is crucial for building reliable distributed systems — choosing the wrong guarantee can lead to lost data, duplicate processing, or unnecessary complexity.
The Three Delivery Guarantees
At-Most-Once Delivery
The message is delivered zero or one times. The system makes no attempt to retry if delivery fails. Messages may be lost, but they are never duplicated.
# At-Most-Once: Fire and forget
def send_at_most_once(message):
try:
broker.send(message)
# No retry on failure — message may be lost
except SendError:
log.warning(f"Message lost: {message.id}")
# Consumer: auto-commit offset BEFORE processing
def consume_at_most_once(consumer):
message = consumer.poll()
consumer.commit() # Commit offset immediately
process(message) # If process() fails, message is lost
# Offset already committed — message will not be redelivered
How it fails: Producer sends a message, network drops the packet, no retry. Or consumer commits the offset, then crashes before processing — the message is considered consumed but was never processed.
Use cases: Metrics collection, logging, real-time analytics where occasional data loss is acceptable. UDP-based protocols use at-most-once semantics.
At-Least-Once Delivery
The message is guaranteed to be delivered one or more times. The system retries on failure until the message is acknowledged. Duplicates are possible.
# At-Least-Once: Retry until acknowledged
def send_at_least_once(message, max_retries=3):
for attempt in range(max_retries):
try:
ack = broker.send(message)
if ack:
return True # Delivery confirmed
except SendError:
log.warning(f"Retry {attempt + 1} for {message.id}")
raise DeliveryFailed(f"Failed after {max_retries} attempts")
# Consumer: commit offset AFTER processing
def consume_at_least_once(consumer):
message = consumer.poll()
process(message) # Process first
consumer.commit() # Then commit offset
# If crash between process() and commit(),
# message will be redelivered — possible duplicate
How duplicates happen: Consumer processes a message, then crashes before committing the offset. When it restarts, it re-reads and re-processes the same message. Or a producer sends a message, the broker stores it, but the ACK is lost in the network. The producer retries, creating a duplicate.
Use cases: This is the most common guarantee in production systems. Used for order processing, payment handling, email sending — any case where losing a message is worse than processing it twice, as long as consumers are idempotent.
Exactly-Once Delivery
The message is delivered and processed exactly one time. No loss, no duplicates. This is the holy grail of messaging but is extremely difficult (some argue impossible) to achieve in distributed systems.
Why it is hard: True exactly-once requires coordinating state across producer, broker, and consumer atomically. Network partitions, crashes, and retries can all break this coordination. What most systems call "exactly-once" is actually "effectively-once" — at-least-once delivery combined with deduplication to achieve the effect of exactly-once processing.
# Kafka exactly-once: Transactional produce + consume
from kafka import KafkaProducer, KafkaConsumer
producer = KafkaProducer(
bootstrap_servers=['kafka:9092'],
enable_idempotence=True, # Prevents duplicate produces
transactional_id='order-processor' # Enables transactions
)
producer.init_transactions()
# Consume-transform-produce in a single transaction
consumer = KafkaConsumer('input-topic', group_id='processor')
for message in consumer:
producer.begin_transaction()
try:
result = transform(message.value)
producer.send('output-topic', value=result)
# Commit offset AND produce in same transaction
producer.send_offsets_to_transaction(
{message.partition: message.offset + 1},
consumer.group_id
)
producer.commit_transaction()
except Exception:
producer.abort_transaction()
Comparison
| Aspect | At-Most-Once | At-Least-Once | Exactly-Once |
|---|---|---|---|
| Message loss | Possible | No | No |
| Duplicates | No | Possible | No |
| Complexity | Low | Medium | Very High |
| Performance | Fastest | Good | Slowest |
| Typical implementation | Fire-and-forget | Retry + ACK | Transactions + deduplication |
Idempotency: The Practical Solution
Since at-least-once delivery is the most practical guarantee, the key to reliable messaging is making consumers idempotent — processing the same message multiple times produces the same result as processing it once.
# Idempotent consumer using message ID deduplication
class IdempotentConsumer:
def __init__(self, db, cache):
self.db = db
self.cache = cache
def process(self, message):
message_id = message["id"]
# Check if already processed
if self.cache.exists(f"processed:{message_id}"):
log.info(f"Duplicate skipped: {message_id}")
return # Already processed — skip
# Process the message
self._handle(message)
# Mark as processed (with TTL matching retention period)
self.cache.setex(f"processed:{message_id}", 86400, "1")
def _handle(self, message):
# Use database upsert instead of insert to be idempotent
self.db.execute("""
INSERT INTO orders (id, customer_id, total, status)
VALUES (%(id)s, %(customer_id)s, %(total)s, 'placed')
ON CONFLICT (id) DO NOTHING
""", message)
Idempotency Strategies
| Strategy | Implementation | Best For |
|---|---|---|
| Deduplication table | Store processed message IDs in DB/Redis | General purpose |
| Upsert operations | INSERT ON CONFLICT DO UPDATE/NOTHING | Database writes |
| Conditional updates | UPDATE WHERE version = expected_version | State machines |
| Natural idempotency | SET balance = 500 (not += 100) | Absolute value updates |
Deduplication at Scale
# Scalable deduplication with Bloom filter
from pybloom_live import BloomFilter
class ScalableDeduplicator:
def __init__(self, capacity=1000000, error_rate=0.001):
self.bloom = BloomFilter(capacity=capacity, error_rate=error_rate)
self.exact_store = redis_client # For false-positive verification
def is_duplicate(self, message_id):
# Fast check with Bloom filter (no false negatives)
if message_id not in self.bloom:
return False
# Bloom says maybe — verify with exact store
return self.exact_store.exists(f"msg:{message_id}")
def mark_processed(self, message_id):
self.bloom.add(message_id)
self.exact_store.setex(f"msg:{message_id}", 86400, "1")
How Different Systems Implement Delivery Semantics
- Apache Kafka: Supports all three. Idempotent producers (prevent duplicate sends), transactions (exactly-once consume-produce), configurable via
acks,enable.idempotence, andtransactional.id. - RabbitMQ: At-most-once (auto-ack) or at-least-once (manual ack with requeue). No built-in exactly-once; requires application-level deduplication.
- Amazon SQS: Standard queues provide at-least-once. FIFO queues provide exactly-once (via deduplication ID within a 5-minute window).
- Google Cloud Pub/Sub: At-least-once delivery. Exactly-once via Dataflow integration.
Frequently Asked Questions
Is exactly-once delivery truly possible?
In a strict theoretical sense across arbitrary distributed systems, no — the Two Generals' Problem proves this. However, within a controlled system boundary (like Kafka's consume-process-produce pipeline), you can achieve "effectively exactly-once" through idempotent producers and transactional writes. The key insight is that most systems do not need true exactly-once delivery — they need exactly-once processing, which is achievable with at-least-once delivery plus idempotent consumers.
How do I choose between at-least-once and exactly-once?
Default to at-least-once with idempotent consumers. It is simpler, faster, and sufficient for 95% of use cases. Use exactly-once only when: deduplication at the consumer is impractical (e.g., sending external API calls that are not idempotent), or when the processing pipeline is entirely within Kafka (consume-transform-produce). The performance overhead and operational complexity of exactly-once should be justified by the business requirement.
How do I make an API call idempotent in a consumer?
Use an idempotency key. Before making the external API call, check your deduplication store. If the message has been processed, skip it. If not, make the API call, record the result, and mark the message as processed — all in a single database transaction if possible. Many payment APIs (Stripe, PayPal) accept idempotency keys natively, handling deduplication on their end.
What happens to delivery guarantees during network partitions?
During network partitions, at-least-once systems may deliver duplicates (retries from the producer side while the broker already stored the message). At-most-once systems may lose messages (no retry). Exactly-once systems may become unavailable (waiting for all participants in the transaction to be reachable). This is a manifestation of the CAP theorem — during partitions, you trade between consistency (no duplicates) and availability (keep processing).
How long should I retain deduplication IDs?
At minimum, retain IDs for longer than the maximum retry window of your messaging system. If your message queue retries for up to 24 hours, retain deduplication IDs for at least 24 hours. A common practice is 7 days, matching typical Kafka retention. Use Redis with TTL for fast lookups and automatic cleanup, or a database table with periodic pruning.