Modeling Common Relations
As you gather requirements and design a document database, you will likely find the need for one or more of three common relations:
- One-to-many relations
- Many-to-many relations
- Hierarchies
The first two involve relations between two collections, whereas the third can entail an arbitrary number of related documents within a collection. You learned about one-to-one and one-to-many relations previously in the discussion of normalization. At that point, the focus was on the need for joins when normalizing data models. Here, the focus is on how to efficiently implement such relationships in document databases. The following sections discuss design patterns for modeling these three kinds of relations.
One-to-Many Relations in Document Databases
One-to-many relations are the simplest of the three relations. This relation occurs when an instance of an entity has one or more related instances of another entity. The following are some examples:
- One order can have many order items.
- One apartment building can have many apartments.
- One organization can have many departments.
- One product can have many parts.
This is an example in which the typical model of document database differs from that of a relational database. In the case of a one-to-many relation, both entities are modeled using a document embedded within another document. For example:
{ customer_id: 76123, name: 'Acme Data Modeling Services', person_or_business: 'business', address : [ { street: '276 North Amber St', city: 'Vancouver', state: 'WA', zip: 99076} , { street: '89 Morton St', city: 'Salem', state: 'NH', zip: 01097} ] }
The basic pattern is that the one entity in a one-to-many relation is the primary document, and the many entities are represented as an array of embedded documents. The primary document has fields about the one entity, and the embedded documents have fields about the many entities.
Many-to-Many Relations in Document Databases
A many-to-many relation occurs when instances of two entities can both be related to multiple instances of another entity. The following are some examples:
- Doctors can have many patients and patients can have many doctors.
- Operating system user groups can have many users and users can be in many operating system user groups.
- Students can be enrolled in many courses and courses can have many students enrolled.
- People can join many clubs and clubs can have many members.
Many-to-many relations are modeled using two collections—one for each type of entity. Each collection maintains a list of identifiers that reference related entities. For example, a document with course data would include an array of student IDs, and a student document would include a list of course IDs, as in the following:
Courses:
{ { courseID: 'C1667', title: 'Introduction to Anthropology', instructor: 'Dr. Margret Austin', credits: 3, enrolledStudents: ['S1837', 'S3737', 'S9825' ... 'S1847'] }, { courseID: 'C2873', title: 'Algorithms and Data Structures', instructor: 'Dr. Susan Johnson', credits: 3, enrolledStudents: ['S1837','S3737', 'S4321', 'S9825' ... 'S1847'] }, { courseID: C3876, title: 'Macroeconomics', instructor: 'Dr. James Schulen', credits: 3, enrolledStudents: ['S1837', 'S4321', 'S1470', 'S9825' ... 'S1847'] }, ...
Students:
{ {studentID:'S1837', name: 'Brian Nelson', gradYear: 2018, courses: ['C1667', C2873,'C3876']}, {studentID: 'S3737', name: 'Yolanda Deltor', gradYear: 2017, courses: [ 'C1667','C2873']}, ... }
The pattern minimizes duplicate data by referencing related documents with identifiers instead of embedded documents.
Care must be taken when updating many-to-many relationships so that both entities are correctly updated. Also remember that document databases will not catch referential integrity errors as a relational database will. Document databases will allow you to insert a student document with a courseID that does not correspond to an existing course.
Modeling Hierarchies in Document Databases
Hierarchies describe instances of entities in some kind of parent-child or part-subpart relation. The product_category attribute introduced earlier is an example where a hierarchy could help represent relations between different product categories (see Figure 8.11).
Figure 8.11 Hierarchies describe parent-child or part-subpart relations.
There are a few different ways to model hierarchical relations. Each works well with particular types of queries.
Parent or Child References
A simple technique is to keep a reference to either the parent or the children of an entity. Using the data depicted in Figure 8.11, you could model product categories with references to their parents:
{ {productCategoryID: 'PC233', name:'Pencils', parentID:'PC72'}, {productCategoryID: 'PC72', name:'Writing Instruments', parentID: 'PC37''}, {productCategoryID: 'PC37', name:'Office Supplies', parentID: 'P01'}, {productCategoryID: 'P01', name:'Product Categories' } }
Notice that the root of the hierarchy, 'Product Categories', does not have a parent and so has no parent field in its document.
This pattern is useful if you frequently have to show a specific instance and then display the more general type of that category.
A similar pattern works with child references:
{ {productCategoryID: 'P01', name:'Product Categories', childrenIDs: ['P37','P39','P41']}, {productCategoryID: 'PC37', name:'Office Supplies', childrenIDs: ['PC72','PC73','PC74'']}, {productCategoryID: 'PC72', name:'Writing Instruments', childrenIDs: ['PC233','PC234']'}, {productCategoryID: 'PC233', name:'Pencils'} }
The bottom nodes of the hierarchy, such as 'Pencils', do not have children and therefore do not have a childrenIDs field.
This pattern is useful if you routinely need to retrieve the children or subparts of the instance modeled in the document. For example, if you had to support a user interface that allowed users to drill down, you could use this pattern to fetch all the children or subparts of the current level of the hierarchy displayed in the interface.
Listing All Ancestors
Instead of just listing the parent in a child document, you could keep a list of all ancestors. For example, the 'Pencils' category could be structured in a document as
{productCategoryID: 'PC233', name:'Pencils', ancestors:['PC72', 'PC37', 'P01']}
This pattern is useful when you have to know the full path from any point in the hierarchy back to the root.
An advantage of this pattern is that you can retrieve the full path to the root in a single read operation. Using a parent or child reference requires multiple reads, one for each additional level of the hierarchy.
A disadvantage of this approach is that changes to the hierarchy may require many write operations. The higher up in the hierarchy the change is, the more documents will have to be updated. For example, if a new level was introduced between 'Product Category' and 'Office Supplies', all documents below the new entry would have to be updated. If you added a new level to the bottom of the hierarchy—for example, below 'Pencils' you add 'Mechanical Pencils' and 'Non-mechanical Pencils'—then no existing documents would have to change.