Two-Phase Commit (2PC): Distributed Transaction Coordination

Two-Phase Commit (2PC) is a distributed algorithm that ensures all participants in a transaction either commit or abort atomically. It is the classic solution for achieving atomicity across multiple databases or services. Despite its limitations, 2PC remains widely used in enterprise systems, XA transactions, and database replication.

How 2PC Works

2PC involves a coordinator and multiple participants. The protocol runs in two phases:

Phase 1: Prepare (Voting Phase)

The coordinator sends a PREPARE request to all participants
Each participant executes the transaction locally up to the point of committing
Each participant writes the transaction data to its write-ahead log (WAL)
Each participant responds with VOTE_COMMIT (ready) or VOTE_ABORT (cannot)

Phase 2: Commit/Abort (Decision Phase)

If ALL participants voted COMMIT: coordinator sends GLOBAL_COMMIT
If ANY participant voted ABORT: coordinator sends GLOBAL_ABORT
Participants execute the decision and acknowledge

class TwoPhaseCoordinator:
    def __init__(self, participants):
        self.participants = participants
        self.transaction_log = TransactionLog()

    def execute(self, transaction):
        tx_id = generate_transaction_id()

        # Phase 1: Prepare
        self.transaction_log.write("PREPARE", tx_id)
        votes = {}
        for participant in self.participants:
            try:
                vote = participant.prepare(tx_id, transaction)
                votes[participant.id] = vote
            except Exception:
                votes[participant.id] = "ABORT"

        # Decision
        if all(v == "COMMIT" for v in votes.values()):
            decision = "COMMIT"
        else:
            decision = "ABORT"

        # Phase 2: Commit or Abort
        self.transaction_log.write(decision, tx_id)

        for participant in self.participants:
            try:
                if decision == "COMMIT":
                    participant.commit(tx_id)
                else:
                    participant.abort(tx_id)
            except Exception:
                # Retry until participant acknowledges
                self.retry_queue.add(participant.id, tx_id, decision)

        self.transaction_log.write("COMPLETE", tx_id)


class TwoPhaseParticipant:
    def __init__(self, participant_id, database):
        self.id = participant_id
        self.db = database
        self.prepared_transactions = {}

    def prepare(self, tx_id, transaction):
        try:
            self.db.begin_transaction()
            self.db.execute(transaction.get_sql(self.id))
            self.db.write_wal(tx_id, transaction)
            self.prepared_transactions[tx_id] = transaction
            return "COMMIT"
        except Exception:
            self.db.rollback()
            return "ABORT"

    def commit(self, tx_id):
        self.db.commit_prepared(tx_id)
        del self.prepared_transactions[tx_id]

    def abort(self, tx_id):
        self.db.rollback_prepared(tx_id)
        if tx_id in self.prepared_transactions:
            del self.prepared_transactions[tx_id]

The Coordinator Failure Problem

The most critical weakness of 2PC is the coordinator failure scenario. After participants vote COMMIT but before the coordinator sends the decision, if the coordinator crashes, participants are stuck in an uncertain state — they cannot safely commit or abort on their own.

Failure Scenario	Impact	Resolution
Coordinator fails before PREPARE	None — transaction never started	Client retries with new coordinator
Coordinator fails after PREPARE, before decision	Participants blocked — holding locks	Wait for coordinator recovery or elect new one
Coordinator fails after decision, before all ACKs	Some participants unaware of decision	Coordinator replays decision from log on recovery
Participant fails before voting	Coordinator times out, aborts	Transaction is safely aborted
Participant fails after voting COMMIT	Must commit when it recovers	Reads decision from coordinator on recovery

The Blocking Problem

2PC is a blocking protocol. When the coordinator is down and a participant has voted COMMIT, it cannot proceed. It must hold all locks and wait, potentially blocking other transactions. This is the fundamental motivation for 3PC and non-blocking alternatives.

# Scenario: Coordinator fails after collecting votes
#
# Timeline:
# T1: Coordinator sends PREPARE to P1, P2, P3
# T2: P1 votes COMMIT, P2 votes COMMIT, P3 votes COMMIT
# T3: Coordinator crashes before sending decision
#
# State: All participants are UNCERTAIN
#   - P1 holds locks, cannot commit or abort
#   - P2 holds locks, cannot commit or abort
#   - P3 holds locks, cannot commit or abort
#
# Resolution: Wait for coordinator recovery
#   - Coordinator reads its WAL on startup
#   - If decision was logged: resend decision
#   - If no decision was logged: send ABORT (safe default)

Performance Impact

2PC introduces significant overhead:

Latency: At least 2 round trips (prepare + commit), plus disk flushes at each participant
Lock duration: Locks are held from prepare through commit — much longer than a local transaction
Throughput: Reduced due to lock contention and synchronous disk writes
Availability: A single participant or coordinator failure blocks the entire transaction

Metric	Local Transaction	2PC Transaction
Latency	1-5 ms	50-200 ms
Lock Duration	Transaction duration	Prepare + commit duration
Disk Flushes	1	2N + 2 (N participants)
Network Round Trips	0	2 (minimum)

XA Transactions

XA is a standard interface for 2PC, defined by the X/Open group. Most enterprise databases (PostgreSQL, MySQL, Oracle) and message brokers (ActiveMQ, RabbitMQ) support XA transactions.

// Java XA Transaction example
UserTransaction utx = (UserTransaction) ctx.lookup("java:comp/UserTransaction");
utx.begin();

try {
    // These use different databases via XA
    orderDao.createOrder(order);          // Database 1
    inventoryDao.decrementStock(item);     // Database 2
    paymentDao.chargeCustomer(payment);    // Database 3

    utx.commit();  // 2PC across all three databases
} catch (Exception e) {
    utx.rollback();
    throw e;
}

When to Use 2PC

Use when: You need strict atomicity across databases within a trusted network (same data center)
Avoid when: Services span multiple organizations, networks, or require high availability
Alternative: Consider the Saga pattern for long-lived transactions across microservices

For improvements to 2PC's blocking problem, see Three-Phase Commit. For alternatives that avoid distributed transactions entirely, see distributed transaction patterns.

Frequently Asked Questions

Q: Is 2PC the same as database transactions?

No. A local database transaction uses a write-ahead log for atomicity within a single database. 2PC coordinates atomicity across multiple databases or services. Local transactions are much faster and simpler. 2PC is only needed when a single operation must modify data in multiple independent systems atomically.

Q: Why not just use 2PC for everything?

2PC has significant drawbacks: it blocks on coordinator failure, adds latency, reduces throughput, and creates tight coupling between services. In microservices, where services should be independently deployable and loosely coupled, 2PC is usually avoided in favor of eventual consistency patterns like Sagas.

Q: Can 2PC guarantee safety during network partitions?

2PC guarantees safety (no inconsistency) but sacrifices liveness (availability). During a partition, participants that cannot reach the coordinator remain blocked. This aligns with CP in the CAP theorem — consistency is preserved at the cost of availability.

Q: How does Google Spanner handle distributed transactions?

Spanner uses 2PC for cross-shard transactions but avoids the blocking problem by using Paxos groups instead of a single coordinator. Each shard is a Paxos group, so coordinator failure is handled by Paxos leader election. This combines the atomicity of 2PC with the fault tolerance of consensus.

Two-Phase Commit (2PC): Distributed Transaction Coordination