Planning for Mutable Documents
Things change. Things have been changing since the Big Bang. Things will most likely continue to change. It helps to keep these facts in mind when designing databases.
Some documents will change frequently, and others will change infrequently. A document that keeps a counter of the number of times a web page is viewed could change hundreds of times per minute. A table that stores server event log data may only change when there is an error in the load process that copies event data from a server to the document database. When designing a document database, consider not just how frequently a document will change, but also how the size of the document may change.
Incrementing a counter or correcting an error in a field will not significantly change the size of a document. However, consider the following scenarios:
- Trucks in a company fleet transmit location, fuel consumption, and other operating metrics every three minutes to a fleet management database.
- The price of every stock traded on every exchange in the world is checked every minute. If there is a change since the last check, the new price information is written to the database.
- A stream of social networking posts is streamed to an application, which summarizes the number of posts; overall sentiment of the post; and the names of any companies, celebrities, public officials, or organizations. The database is continuously updated with this information.
Over time, the number of data sets written to the database increases. How should an application designer structure the documents to handle such input streams? One option is to create a new document for each new set of data. In the case of the trucks transmitting operational data, this would include a truck ID, time, location data, and so on:
{ truck_id: 'T87V12', time: '08:10:00', date : '27-May-2015', driver_name: 'Jane Washington', fuel_consumption_rate: '14.8 mpg', ... }
Each truck would transmit 20 data sets per hour, or assuming a 10-hour operations day, 200 data sets per day. The truck_id, date, and driver_name would be the same for all 200 documents. This looks like an obvious candidate for embedding a document with the operational data in a document about the truck used on a particular day. This could be done with an array holding the operational data documents:
{ truck_id: 'T87V12', date : '27-May-2015', driver_name: 'Jane Washington', operational_data: [ {time : '00:01', fuel_consumption_rate: '14.8 mpg', ...}, {time : '00:04', fuel_consumption_rate: '12.2 mpg', ...}, {time : '00:07', fuel_consumption_rate: '15.1 mpg', ...}, ...] }
The document would start with a single operational record in the array, and at the end of the 10-hour shift, it would have 200 entries in the array.
From a logical modeling perspective, this is a perfectly fine way to structure the document, assuming this approach fits your query requirements. From a physical model perspective, however, there is a potential performance problem.
When a document is created, the database management system allocates a certain amount of space for the document. This is usually enough to fit the document as it exists plus some room for growth. If the document grows larger than the size allocated for it, the document may be moved to another location. This will require the database management system to read the existing document and copy it to another location, and free the previously used storage space (see Figure 8.7).
Figure 8.7 When documents grow larger than the amount of space allocated for them, they may be moved to another location. This puts additional load on the storage systems and can adversely affect performance.
Avoid Moving Oversized Documents
One way to avoid this problem of moving oversized documents is to allocate sufficient space for the document at the time the document is created. In the case of the truck operations document, you could create the document with an array of 200 embedded documents with the time and other fields specified with default values. When the actual data is transmitted to the database, the corresponding array entry is updated with the actual values (see Figure 8.8).
Figure 8.8 Creating documents with sufficient space for anticipated growth reduces the need to relocate documents.
Consider the life cycle of a document and when possible plan for anticipated growth. Creating a document with sufficient space for the full life of the document can help to avoid I/O overhead.