- Keeping Data Resilient on a Single Server
- Competing Updates
- Dealing with the Leader Failing
- Multiple Failures Need a Generation Clock
- Log Entries Cannot Be Committed until They Are Accepted by a Majority Quorum
- Followers Commit Based on a High-Water Mark
Log Entries Cannot Be Committed until They Are Accepted by a Majority Quorum
As seen above, entries like B1 can be overwritten if they haven’t been successfully replicated to a Majority Quorum of nodes in the cluster. So the leader cannot apply the request to its data store after just appending to its own log—it has to wait until it gets enough acknowledgments from other nodes first. When an update is added to a local log, it is uncommitted, until the leader has had replies from a Majority Quorum of other nodes, at which point it becomes committed. In the case of the example above, Neptune cannot commit B1 until it hears that at least one other node has accepted it, at which point that other node, plus Neptune itself, makes two out of three nodes—a majority and thus a Majority Quorum.
When Neptune, the leader, receives an update, either from a user (Bob) directly or via a follower, it adds the uncommitted update to its log and then sends replication messages to the other nodes. Once Saturn (for example) replies, that means two nodes have accepted the update, Neptune and Saturn. This is two out of three nodes, which is the majority and thus a Majority Quorum. At that point Neptune can commit the update (Figure 2.20).
![FIGURE 2.20](/content/images/chap2_9780138221980/elementLinks/overview-log-quorum-replicate.jpg)
Figure 2.20 Log entries are committed once they are accepted by a Majority Quorum.
The importance of the Majority Quorum is that it applies to decision by the cluster. Should a node fail, any leadership election must involve a Majority Quorum of nodes. Since any committed updates have also been sent to a Majority Quorum of nodes, we can be sure that committed updates will be visible during the election.
If Neptune receives Bob’s update (B1), replicates, gets an acknowledgment from Saturn, and then crashes, Saturn still has a copy of B1. If the nodes then elect Jupiter as the leader, Jupiter must apply any uncommitted updates—that is, B1—before it can start accepting new ones (Figure 2.21).
![FIGURE 2.21](/content/images/chap2_9780138221980/elementLinks/overview-jupiter-gets-log-entry-from-quorum.jpg)
Figure 2.21 New leader commits uncommitted log entries.
When the log is large, moving the log across nodes for leader election can be costly. The most commonly used algorithm for Replicated Log, Raft [Ongaro2014], optimizes this by electing the leader with the most up-to-date log. In the above example this would elect Saturn as the leader.