Paxos is the predecessor of the Raft algorithm. Both algorithms have the same consensus mechanism: Approved by a majority of nodes. Paxos is more difficult than Raft, because Paxos is running in a peer-to-peer network without Leader. For understanding Paxos well, I strongly recommend you to understand the Raft algorithm first.
Raft node has three actors: Candidate, Leader, Follower. The leader performs all requests and sorts the operation log. The follower receives the log and votes the final consensus. The Candidate is only set on leader election. For learning more Raft details, you can read my previous article.
Paxos node has three actors: Suggester, Voter, Learner. The Suggester receives a request and creates it as a proposal. The Voter grants permission to the proposal and votes the final consensus state in the next stage. The Learner saves the final state permanently.
Paxos algorithm has three committing phases:
- suggest a proposal.
- reject or grant the proposal.
- vote for consensus.
The Suggester makes a proposal ID and sends it to Voter. Each proposal ID represents the node's identity and two IDs can be compared. If two Suggesters had created a different proposal simultaneously, the proposal ID would be:
Node A creates a proposal and the ID is: [1, A]
Node B creates a proposal and the ID is: [1, B]
These two IDs can be compared:
[1,A] < [1,B]. In a peer-to-peer network with Paxos running, there is no single leader who can handle all requests and sort them. The comparable ID is used in which who can be performed first.
When a voter has granted permission to a proposal, e.g. [1, B], the later proposals which are lower than [1, B] will be rejected. The Suggester will know the rejection and reassign a higher proposal ID to send the Voter and obtain permission again.
The next phase is voting the final state. Because of the peer-to-peer network, each node may hold a different proposal. They must vote for the unique proposal that granted by majority of nodes. Then the final consensus state will be sent to Learner, and the Learner save it permanently.
There has a trap that we must resolve. Suppose Voter A and B are granting two proposals. They might like this:
A obtains permission for [1,A] B obtains permission for [2,B] A suggests [1,A] and is denied A obtains permission for [3,A] B suggests [2,B] and is denied B obtains permission for [3,B] A suggests [3,A] and is denied … repeat infinitely …
This will enter an infinite loop. To resolve this problem, we can use a timeout retransmission algorithm like TCP.
The Paxos paper didn't mention the details about error handling. If a node crashed, how to continue the consensus reaching? The leaderless peer-to-peer network adds more difficulties to development. Some Paxos variants improve the error handling by adding a leader with an election algorithm(Raft is an excellent example). When the leader handles all reqeusts, we can ensure the order of all requests. The remaing processing is to synchronize and vote the sequence of operations. and then commit the final consensus state.