Distributed Transactions in Microservices: Why Consistency Gets Complicated Fast

Short description:
Distributed transactions sound straightforward in theory: multiple services should either succeed together or fail together. In practice, things become messy very quickly. Network failures, retries, partial commits, and service crashes make consistency one of the hardest problems in distributed systems. This post takes a deep dive into distributed transactions, why traditional approaches struggle in microservices, and how Two-Phase Commit and Saga patterns behave in real production systems.

The Monolith Advantage Nobody Appreciates Enough

In a monolith, transactions feel easy.

You update multiple tables, wrap everything inside a database transaction, and either commit or rollback.


BEGIN;

UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;

COMMIT;

The database guarantees atomicity.

If something fails halfway through, everything rolls back automatically.

Most engineers grow up with this mental model.

Then microservices arrive.

Why Transactions Become Hard in Microservices

Microservices intentionally split data ownership across services.

Each service has:

Its own database
Its own deployment lifecycle
Its own failure modes

This improves scalability and team independence.

But it destroys the simplicity of local database transactions.

Now imagine a checkout flow:

Order Service creates order
Payment Service charges customer
Inventory Service reserves stock
Notification Service sends confirmation

What happens if payment succeeds but inventory fails?

You no longer have a single database transaction protecting consistency.

You have a distributed systems problem.

The Core Problem: Partial Success

Distributed transactions are difficult because partial success is normal.

In distributed systems:

Networks fail
Services restart
Requests timeout
Messages arrive late

The dangerous state is not total failure.

The dangerous state is when half the system thinks the operation succeeded and the other half thinks it failed.

This is where consistency breaks.

The Two Main Approaches

Modern distributed systems usually solve consistency using one of two patterns:

Two-Phase Commit (2PC)
Saga Pattern

Both attempt to coordinate changes across services.

Both involve trade-offs.

Neither is perfect.

Two-Phase Commit (2PC): The Traditional Approach

Two-Phase Commit tries to preserve strong consistency across distributed systems.

It works using a coordinator.

The flow looks like this:


Step 1: Prepare Phase
Coordinator asks all services:
"Can you commit?"

Step 2: Commit Phase
If all say YES:
"Commit transaction"

Else:
"Rollback transaction"

This sounds elegant.

And under ideal conditions, it works.

How 2PC Works Internally

During the prepare phase, each participant:

Executes the transaction locally
Locks required resources
Waits for coordinator decision

Nothing is fully committed yet.

Then the coordinator decides:

If all participants are ready → commit
If even one fails → rollback

This guarantees atomicity across services.

Why 2PC Looks Great on Whiteboards

2PC provides:

Strong consistency
Clear transactional guarantees
Predictable rollback behavior

From a business perspective, this is attractive.

Especially in financial systems, consistency matters deeply.

The Real Problems With Two-Phase Commit

The problems appear under failure.

1. Blocking Behavior

During the prepare phase, participants lock resources.

If the coordinator crashes before sending commit or rollback, participants remain blocked waiting for instructions.

This creates:

Stuck transactions
Resource contention
Reduced throughput

In high-scale systems, this becomes dangerous quickly.

2. Coordinator Becomes a Single Point of Failure

The coordinator controls transaction state.

If it becomes unavailable, the entire transaction pipeline suffers.

Even with replication, complexity increases significantly.

3. Poor Scalability

2PC performs poorly in highly distributed environments.

Why?

Multiple synchronous network round trips
Long-lived locks
Cross-service coordination overhead

Latency compounds rapidly.

4. Availability Suffers

2PC prioritizes consistency over availability.

Under network partitions, systems often pause instead of risking inconsistent state.

This aligns with CP systems in the CAP theorem.

The Industry Shift Toward Sagas

Because of these limitations, many microservice architectures moved toward eventual consistency.

This is where the Saga pattern became popular.

Saga Pattern: Distributed Transactions Through Compensation

Instead of one large atomic transaction, a Saga breaks the workflow into smaller local transactions.

Each service commits independently.

If something fails later, compensating actions undo previous steps.


Order Created
   |
   v
Payment Processed
   |
   v
Inventory Reserved
   |
   v
Notification Sent

If inventory reservation fails:


Compensation:
Refund Payment
Cancel Order

This fundamentally changes how consistency is handled.

The Core Philosophy Behind Sagas

Sagas accept that distributed systems fail.

Instead of preventing partial success, they embrace it and recover afterward.

This trades immediate consistency for resilience and scalability.

Two Saga Models

Sagas are generally implemented in two ways.

1. Choreography-Based Saga

Services communicate through events.


Order Service
   |
   v
OrderCreated Event
   |
   v
Payment Service
   |
   v
PaymentProcessed Event

No central coordinator exists.

Each service reacts independently.

Advantages

Loosely coupled
Highly scalable
No central bottleneck

Disadvantages

Harder debugging
Complex event chains
Difficult observability

2. Orchestration-Based Saga

A central orchestrator controls workflow execution.


Saga Orchestrator
   |
   +--> Payment Service
   +--> Inventory Service
   +--> Notification Service

The orchestrator tracks state and triggers compensations.

Advantages

Easier observability
Centralized control flow
Simpler debugging

Disadvantages

More coupling
Coordinator complexity
Potential bottleneck

The Hidden Complexity of Compensation

Compensation sounds simple in diagrams.

Reality is harder.

Some operations are difficult or impossible to reverse:

Emails already sent
External bank transfers
SMS notifications

Compensation often means “business correction,” not true rollback.

This distinction matters enormously.

Idempotency Becomes Mandatory

Saga systems rely heavily on retries.

Messages may be delivered multiple times.

Services must handle duplicate requests safely.

Without idempotency:

Payments may double-charge
Inventory may over-reserve
Notifications may duplicate

Idempotency is not optional in Saga-based systems.

Observability Is Much Harder Than Traditional Transactions

In monoliths, a transaction is visible inside one database.

In distributed systems, a transaction spans:

Multiple services
Multiple queues
Multiple databases

Tracing becomes significantly harder.

Mature systems use:

Distributed tracing
Correlation IDs
Centralized event logging

Without observability, debugging distributed transactions becomes nearly impossible.

When Two-Phase Commit Makes Sense

Despite its problems, 2PC is not obsolete.

It still makes sense when:

Strong consistency is mandatory
Transaction volume is relatively low
Participants are tightly controlled

Financial settlement systems are common examples.

When Sagas Make More Sense

Sagas work well when:

High scalability matters
Temporary inconsistency is acceptable
Services are loosely coupled

This is why Sagas dominate modern microservice architectures.

The Most Important Mindset Shift

The biggest lesson in distributed transactions is this:

Consistency is not free.

Every consistency guarantee introduces trade-offs:

Latency
Availability
Operational complexity

The real engineering challenge is choosing which trade-offs your business can tolerate.

Final Thoughts

Distributed transactions are one of the clearest examples of why distributed systems are fundamentally different from traditional application development.

Two-Phase Commit gives stronger guarantees but struggles with scalability and availability.

Sagas improve resilience and scalability but introduce eventual consistency and compensation complexity.

Neither pattern is universally better.

The right choice depends entirely on system requirements, business guarantees, and operational realities.

And once systems scale, understanding those trade-offs becomes more important than the implementation itself.