Multiple Failures Need a Generation Clock
We assumed here that Jupiter and Saturn can figure out whose log is most up to date. But things can get trickier. Let’s say Neptune accepted a request from Bob to move 40 widgets from Boston to Pune but failed before replicating it (Figure 2.13).
Figure 2.13 Leader fails before replication.
Jupiter is elected as a new leader, and accepts a request from Alice to move 30 widgets from Boston to London. But it also crashes before replicating the request to other nodes (Figure 2.14).
Figure 2.14 New leader fails before replication.
In a while, Neptune and Jupiter come back, but before they can talk, Saturn crashes. Neptune is elected as a leader. Neptune checks with itself and Jupiter for the log entries. It will see two separate requests at index 1, the one from Bob which it had accepted and the one from Alice that Jupiter has accepted. Neptune can’t tell which one it should pick (Figure 2.15).
Figure 2.15 Leader needs to resolve existing log entries.
To solve this kind of situation, we use a Generation Clock. This is a number that increments with each leadership election. It is a key requirement of Leader and Followers.
Looking at the previous scenario again, Neptune was leader for generation 1. It adds Bob’s entry in its log marking it with its generation (Figure 2.16).
Figure 2.16 Leader adds generation to log entries.
When Jupiter gets elected as a leader, it increments the generation to 2. So when it adds Alice’s entry to its log, it’s marked for generation 2 (Figure 2.17).
Figure 2.17 New leader increments generation.
Now, when Neptune is again elected as a leader, it will be for generation 3. Before it starts serving the client requests, it checks the logs of all the available nodes for entries which are not replicated on the Majority Quorum. We call these entries as “uncommitted,” as they are not yet applied to data. We will see how each node figures out which entries are incompletely replicated in a while. But once the leader knows about these entries, it completes the replication for those entries. In case of conflict, it safely picks up the entry with higher generation (Figure 2.18).
Figure 2.18 Conflicting log entries are resolved based on generation.
After selecting the entry with the latest generation, Neptune overwrites the uncommitted entry in its own log with its current generation number and replicates with Jupiter.
Every node tracks the latest generation it knows of the leader. This is helpful in another problem that might occur, as Figure 2.19 demonstrates. When Jupiter became leader, the previous leader, Neptune, might not have crashed, but just temporarily disconnected. It might come back online and send the requests to Jupiter and Saturn. If Jupiter and Saturn have elected a new leader and accepted requests from Alice, what should they do when they suddenly start getting requests from Neptune? Generation Clock is useful in this case as well. Every request is sent to cluster nodes, along with the generation clock. So every node can always choose the requests with the higher generation and reject the ones with the lower generation.
Figure 2.19 Generation helps detecting stale requests from old leader.