
Ever wondered what happens when you make a purchase online and your payment fails halfway through? How do distributed systems ensure that your money isn’t deducted while the order remains incomplete? These questions led my friends and me down a rabbit hole of implementing the two-phase commit protocol from scratch, choosing Rust for the coordinator and Go for the microservices.
What is Two-Phase Commit?
Two-phase commit (2PC) is a distributed algorithm that ensures transaction atomicity across multiple nodes. Think of it as a voting system: either all nodes agree to commit a transaction, or none of them do.
Architecture Overview
We built three main components to demonstrate the protocol:
- Coordinator (Rust): Orchestrates the commit protocol
- Wallet Service (Go): Handles user balances
- Order Service (Go): Manages product inventory
The Coordinator
The coordinator is the brain of our system. Here’s the core Rust implementation:
1 | struct Coordinator { |
The coordinator implements two key phases:
- Prepare Phase: The coordinator sends prepare messages to all participants and waits for their votes. If any participant votes “no” or times out, the transaction is aborted.
- Commit Phase: If all participants voted “yes”, the coordinator tells everyone to commit. Otherwise, it sends abort messages.
Microservices in Go
The microservices handle the actual business logic. Here’s a simplified version of our wallet service:
1 | type WalletService struct { |
Handling Failures
The interesting part comes when things go wrong. We implemented several failure scenarios:
Performance Considerations
While 2PC ensures consistency, it comes with some drawbacks:
- It’s blocking: participants must wait for coordinator decisions;
- Network overhead: requires multiple round trips;
- Single point of failure: coordinator crashes can block the system.
Distributed Deployment
We deployed our system on Google Cloud Platform, using separate VMs for each component. This revealed interesting challenges around network latency and partial failures.
Testing Distributed Transactions
Testing distributed systems requires special consideration due to their concurrent and asynchronous nature. We built a comprehensive test suite that simulates various failure scenarios:
1 |
|
Lessons Learned
Building a distributed transaction system taught us several things:
- Rust’s ownership model is perfect for handling complex distributed state
- Go’s goroutines make concurrent transaction handling elegant
- Network failures are the norm, not the exception
- Testing distributed systems requires careful consideration of timing
For those interested in exploring the implementation details further, the complete code is available on GitHub. Note that the README is currently in Norwegian.