Choosing Design Patterns for Your Document Databases
Document databases are a popular alternative to relational databases because they offer greater flexibility and scalability. With their distinct characteristics, document databases have led to several useful design patterns. This article provides some tips for modeling one-to-many, many-to-many, trees, and hierarchies, sharing some guidelines for choosing an appropriate pattern for various needs.
Modeling One-to-Many Using Embedded Documents
The embedded approach to modeling one-to-many relationships reduces the complexity of retrieving data, at the cost of additional storage space. With this model, multiple documents are embedded within one, in order to simplify data access and possibly improve performance.
Consider an order document for an e-commerce website. Using the one-to-many model, the order document has an ID and fields indicating the details of the order, such as payment and shipping information. The data for each item in the order, such as product name, quantity, and sales price, is stored in item documents within the order document.
Embedding the attributes of a "many" entity in a one-to-many document allows application developers to take advantage of single fetch operations that return both primary and related data. If used properly, these fetch operations improve performance by reducing latency associated with retrieving additional data blocks when the many entities are stored in a separate collection.
There are some disadvantages to embedding "many" attributes:
- Embedded one-to-many models increase document size. Such a model must be implemented carefully to prevent performance degradation.
- Attributes of the "many" entity that are used frequently with the parent entity should be embedded. When embedding is overused, however, data is read more often than necessary, slowing the application.
Modeling Many-to-Many Using References
Many-to-many relationships occur when no single parent entity is associated with an entity. For example, employees may work on many projects, and projects may have many employees assigned. In document databases, this type of relationship can be modeled with references or with embedded documents.
Document identifiers (references) provide an efficient way to organize information with a many-to-many model. References are essentially embedded IDs that refer to other documents; they function much like foreign keys in relational databases. As with embedding documents, using references has advantages and disadvantages:
- Because a reference is just an ID for a separate piece of information, redundancy is minimized. Using the customer order example, if two or more customers order more than one of the same item, using a reference minimizes the need to store redundant copies of that item's data. This design keeps the documents relatively small and free of redundancy.
- On the other hand, references sometimes force multiple read operations in order to fetch all referenced documents.
Modeling Many-to-Many Using Embedded Documents
Many-to-many models can also utilize document embedding. When implemented within the many-to-many model, this method stores multiple documents within an outer document. For example, this method may be useful when organizing the employee-to-project relationship and workflow.
As employees work on multiple projects, and projects take on multiple employees, the many-to-many architecture helps to streamline document storage. The project collection may include project documents, each containing embedded employee documents that describe the employees working on the project. Similarly, the employee collection might contain employee documents. Each employee document would have an embedded document describing each project on which that particular employee worked.
Embedded documents are useful when you need to capture point-in-time data that may change in the future. For example, if a project document contains only employee IDs of those currently working on a project, then past employees on the project will not be represented. By storing embedded employee documents along with start and end dates, you can store a history of who has worked on a project.
The one major disadvantage of the embedded document approach, much like with the one-to-many example, is the large size of documents. In general, using a many-to-many model with embedded documents works best with small to medium-sized documents and collections.
Modeling Trees
Using trees in documents databases is an efficient way to organize both reference-based and embedded-based document models. Trees are commonly used to represent "is-a" and "part-of" relationships.
For example, a product hierarchy might include the type "electronics" with a set of subtypes such as mobile, home audio, and home video equipment. These subtypes may in turn be broken down further, with mobile products including phones, tablets, and so on. An example "part-of" relationship is an automobile that has an electrical system and a transmission, and under the electrical system are the ignition and lighting systems.
If trees are of arbitrary depth, using references is a good choice. This approach will keep the size of any given document under control by preventing it from being overrun with embedded documents. At any point in the tree, you can embedded a reference to the parent node or a list of references to children nodes.
Using children references promotes top-down navigation of documents, whereas parent references promote bottom-up navigation. A combination of the two referencing methods allows for comprehensive navigation of stored documents.
Avoiding Too Much of a Good Thing
It's important to watch for adverse effects when using these patterns. For instance, look out for large arrays or significant growth in document size after adding embedded documents. Documents can grow larger than the space allocated to them, which can trigger the database to copy the data to a new location. This result takes more space than necessary and can have a significant impact on performance.
A second word of warning: Avoid fetching more data than needed, which can slow down reads and unnecessarily increase response times for your queries. If any of these symptoms occur, consider revising your data model.
Summary
The various design patterns for document databases offer different advantages and disadvantages, and rarely is one answer right for every situation. When choosing a document database pattern, always keep in mind your database's query requirements and the volume of data in your collections.
Dan Sullivan, author of NoSQL for Mere Mortals, is an enterprise architect and consultant with over 20 years of IT experience with engagements in advanced analytics, systems architecture, database design, enterprise security and business intelligence.
James Sullivan is a business technology writer with concentrations in mobile, security and database services. He is based in Portland, Oregon.