Overview of the Patterns
by Unmesh Joshi and Martin Fowler
As discussed in the last chapter, distributing data means at least one of two things: partitioning and replication. To start our journey through the patterns in this book, we’ll focus on replication first.
Imagine a very minimal data record that captures how many widgets we have in four locations (Figure 2.1).
Figure 2.1 An example data record
We replicate it on three nodes: Jupiter, Saturn, and Neptune (Figure 2.2).
Figure 2.2 Replicated data record
Keeping Data Resilient on a Single Server
The first area of potential inconsistency appears with no distribution at all. Consider a case where the data for Boston, London, and Pune are held on different files. In this case, performing a transfer of 40 widgets means changing bos.json to reduce its count to 10 and changing pnq.json to increase its count to 115. But what happens if Neptune crashes after changing Boston’s file but before updating Pune’s? In that case we would have inconsistent data, destroying 40 widgets (Figure 2.3).
Figure 2.3 Node crash causes inconsistency
An effective solution to this is Write-Ahead Log (Figure 2.4). With this, the message handler first writes all the information about the required update to a log file. This is a single write, so is simple to ensure it’s done atomically. Once the write is done, the handler can acknowledge to its caller that it has handled the request. Then the handler, or other component, can read the log entry and carry out the updates to the underlying files.
Figure 2.4 Using WAL
Should Neptune crash after updating Boston, the log should contain enough information for Neptune, when it restarts, to figure out what happened and restore the data to a consistent state, as shown in Figure 2.5. (In this case it would store the previous values in the log before any updates are made to the data file.)
Figure 2.5 Recovery using WAL
The log gives us resilience because, for a known prior state, the linear sequence of changes determines the state after the log is executed. This property is important for resilience in a single node scenario but, as we’ll see, it’s also very valuable for replication. If multiple nodes start at the same state, and they all play the same log entries, we know they will end up at the same state too.
Databases use a Write-Ahead Log, as discussed in the above example, to implement transactions.