Principles of LDAP Schema Design
An LDAP accessible Directory may be thought of as an object store. In LDAP terminology, the stored objects are known as entries. The entries are arranged in a hierarchical fashion. Every entry in a Directory has exactly one parent entry and zero or more child entries. Entries with no children are termed leaf entries. All of the children of an entry are siblings and are said to reside in the same container. Each entry stores some set of information. This information is stored as a set of attribute-value pairs. In every entry, at least one of the attribute-value pairs is used to uniquely identify the entry among all of its siblings. For example, in a Directory storing information about people, the email address attribute could be used as the naming attribute. This scheme assumes that everyone in the Directory has an email address. Unfortunately, this is often but not always the case. For example, in a manufacturing division of a company, you typically don't find many employees with email, yet applications need to have information on these employees. Since information is hierarchical, the naming attributes of all of an entry's ancestors up the tree can be strung together to create a unique name for the entry among all entries in the Directory. This unique name is known as the entry's distinguished name (dn for short).
Not only is the LDAP data hierarchical, but the LDAP metadata is hierarchical as well. LDAP metadata is defined by creating object class and attribute type definitions. The entry's object class defines the different attributes that may be stored in an entry. An object class is defined by listing:
Name. The string of characters by which the object class is known
Mandatory attributes. Attributes that must be present in any entry of the object class
Optional attributes. Attributes that may be present in any entry of the object class
Superclass. The name of an object class from which this object class inherits all mandatory and optional attributes
Type. Indicates whether objects of the type can be created in the Directory (structural) and whether the object class can be used only as a superclass in the creation of other object classes (abstract). The type also indicates whether the object class is used to augment an entry that is already stored in the Directory (auxiliary).
For example, consider the following object class definitions, taken from RFC 2256.2:
( 2.5.6.0 NAME 'top' ABSTRACT MUST objectClass ) ( 2.5.6.6 NAME 'person' SUP top STRUCTURAL MUST ( sn $ cn ) MAY
( userPassword $ telephoneNumber $ seeAlso $ description ) )
In these definitions, there are two names given for each object class. The numerical object identifier is followed by a textual name. Then the superclass, if any, is given, which is then followed by the object class type. Finally, the mandatory and optional attributes are listed. Note that the "$" character is used as a separator. Notice that entries with the object class "person" inherit the "objectClass" attribute from the superclass. Since the "Top" object class is abstract, then no entries can be created of that class. However, entries that are of the "person" object class may be created. In the "person" object class, the "cn" attribute is short for common name and is normally used to give the entry the unique name within the container.
Attribute types are defined similarly. The most important parts of an attribute type are
Name. The string of characters by which the attribute is known
Syntax. The definition of the legal values for an attribute (e.g., character string, Boolean, etc.)
Number of values allowed. Indicates whether there can be more than one value for the attribute in a single object class
In LDAP, most attributes are multivalued. For example, any entry with the object class "person" would have an attribute type of "objectClass" with two values:
"top"
"person"
Now, consider the following object class definition, which will be used in the creation of an example Directory tree:
( NAME 'department' SUP top STRUCTURAL MUST departmentName MAY description )
Notice that the numerical name of the object class has been omitted for brevity. Figure 4.1 shows a pictorial view of the information in an example Directory.
Figure 4.1 Example Directory information.
In the figure, the object class names are given using the oc attribute type. In this example, there are seven entries, with the following dn's, with object class names given in parentheses after the dn:
departmentName = uc (department, top)
departmentName = cs, departmentName = uc (department, top)
departmentName = art, departmentName = uc (department, top)
departmentName = cafe, departmentName = uc (department, top)
cn = pablo, departmentName = art, departmentName = uc (per-_son, top)
cn = henri, departmentName = art, departmentName = uc (person, top)
cn = augustus, departmentName = art, departmentName = uc (person, top)
Note that the description attribute in the entry for "cn = pablo" has two distinct values. Information is retrieved from a Directory by using the LDAP Search operation. A Search operation can be used to retrieve attributes from a single entry, from entries in the container immediately below an entry, or from an entire subtree of entries. There are four parameters of interest to the Search operation (there are actually eight parameters, but the others don't affect the normalization discussion):
Base object. The starting point for the search. This is a distinguished name.
Scope. Indicates whether the search is for single object, container, or subtree
Filter. Describes the conditions that must be fulfilled for an entry to be retrieved by the Search operation. The filter either matches or doesn't match an entry.
Attributes. Gives the list of attributes that are to be returned from entries that match the filter. If an attribute is listed, then all of the values for that attribute are returned in the Search result. If no attributes are listed, then this is an indication that all attributes in the matching entries are to be returned.
Consider the following example Search operations that are applied to the sample Directory information in Figure 4.1:
-
Base Object = "departmentName = uc," Scope = subtree, Filter = _"description = *sculptor"
-
This search would match the one entry: cn = augustus, departmentName = art, departmentName = uc.
-
-
Base Object = "departmentName = uc," Scope = single level, Filter = "description = *sculptor"
-
This search would not match any entries at all.
-
-
Base Object = "departmentName = uc," Scope = subtree, Filter = _"description = *artist"
-
This search would match the two entries:
-
cn = pablo, departmentName = art, departmentName = uc (person, top),
-
cn = henri, departmentName = art, departmentName = uc (person, top).
-
-
-
Base Object = "departmentName = uc," Scope = subtree, Filter = _"description = *"
-
This search would match every entry in the example Directory tree.
-
Typical Problems with LDAP Schema Design
The typical problems that can affiict an LDAP schema design are similar to those that arise in the design of a relational database schema design.1 These problems are
- Data redundancy
- Delete anomalies
- Update anomalies
- Retrieval of unwanted data
Data redundancy occurs when the same information is repeated in many objects throughout the Directory. Collecting the information in common into a separate entry can often eliminate this data redundancy. Thus, when the common information needs to be changed, it has to be changed only in one entry, not in many entries throughout the Directory.
A delete anomaly occurs when a source object points to a target object and the target object is deleted from the Directory. This can happen frequently in Directories, since many entries have attributes that are the distinguished names of other entries in the Directory.
An update anomaly occurs when the source or target object is modified and the relationship implied by the pointer is no longer valid. Consider the situation in which an entry has an attribute that indicates a user's department number and department administrator. If the user switches departments, both of these attributes must be changed for the entry to remain valid. Similarly, whenever the department changes administrators, the entry for each user in that department must be updated with the new administrator's name.
Retrieval of unwanted data occurs when the LDAP server returns attribute values that are not needed by the LDAP client. This occurs in LDAP because the standard LDAP search operation does not allow for the retrieval of individual attribute values. In LDAP, all of the values of a particular multivalued attribute are returned to the client or none of them are. In the following sections, examples of LDAP schemas with these problems, and suggested solutions to resolve these problems, are presented.
Relational Database Normalization
In relational databases, normalizing the relational tables solves these problems. A relational database is made up of tables. The data types of a table are defined by the column definition. Each row in the table must conform to the definitions of the column. For example, consider a table used to represent suppliers of parts that can be ordered. In its simplest form there might just be a supplier name column and a city name column. Both of these columns are strings. In this case, the supplier name column would be considered a primary key since it uniquely identifies the row. This means that there can't be two rows in the table with the same supplier name.
There are many normalization rules in database theory, but the basic, most widely used are first, second, and third normal forms. These rules are summarized here:
First normal form. A table is said to be in first normal form if all of the cells in the table contain only atomic values. This means that sets of value are not allowed in individual cells.
Second normal form. A table is said to be in second normal form if every nonkey attribute is fully dependent on the primary key. It must also be in first normal form.
Third normal form. A table is said to be in third normal form if all nonkey attributes are dependent only on the primary key. If a nonkey attribute is dependent on an attribute in addition to the primary key attribute, this can lead to the update anomalies mentioned above.
In moving from second normal form to third normal form an additional table (or more) is created. The typical example uses a table to hold user address information. This table would have the following five columns:
- User Name (key)
- Street Address
- City
- State
- Zip Code
This table is in second normal form since the user name determines columns 2"5. It is not in third normal form since columns 2"4 always determine the zip code in column 5. Thus, to move to third normal form, this table must be split into two separate tables, each of which obeys the rules of third normal form above.
These three rules of normalization can be applied to LDAP schema design in order to eliminate some of the common problems. In order to apply the normalization rules to LDAP schema design, simply replace table in the rules with object class, replace primary key with relative distinguished name (RDN), _and replace cell with attribute value. This gives these rules for LDAP Schema Normalization:
First normal form. An object class is said to be in first normal form if all of the attribute values in the object class contain only atomic values. This means that sets of values are not allowed in an individual attribute value.
Second normal form. An object class is said to be in second normal form if every nonkey attribute is fully dependent on the RDN. It must also be in first normal form.
Third normal form. An object class is said to be in third normal form if all nonkey attributes are dependent only on the RDN. If a nonkey attribute is dependent on an attribute in addition to the RDN attribute, this can lead to the update anomalies mentioned above.
Data Redundancy
Consider the following enhanced person object class definition:
( NAME 'enhancedPerson' SUP person STRUCTURAL MUST ( email )
MAY ( streetAddress $ city $ state $ postalCode ) )
Using this schema definition, every person in the Directory would have data stored about their mailing address. In organizational directories where virtually all users have common address information, this is a tremendous waste of space and has the potential for inconsistent data. A better solution is to eliminate the redundancy by normalizing2 the postal information into a separate object class.
>( NAME 'enhancedPerson' SUP person STRUCTURAL MUST ( email )
MAY postalInformationDN ) ( NAME 'postalInformation' SUP top STRUCTURAL MUST
( cn $ streetAddress $ city $ state $ postalCode ) )
Notice that the person information only stores the name of some other object in the Directory that holds the actual information. The postalInformation object class is specifically designed to hold this information.
Figure 4.2 shows an example Directory with this information.
Figure 4.2 Eliminating data redundancy.
Retrieval of Unwanted Data
In LDAP, all values of an attribute are returned in a search result if the attribute type name is listed in the attributes field of the search filter. This can be a problem if an entry has numerous values for an attribute and the LDAP client is really only interested in one or two of the values. Consider the scenario of secure email. In typical public-key technology implementation, if one user named Alice wants to send an encrypted message to another user named Bob, Alice must first retrieve Bob's public key.3 When the public-key information is stored in a Directory, it is often stored in a special format, known as a Certificate.4
A typical LDAP schema definition allows Bob's certificate to be stored in his Directory entry using the following object class definition:
( NAME 'strongAuthenticationUser SUP top AUXILIARY MAY _( userCertificate ) ) ( 2.5.4.36 NAME 'userCertificate' SYNTAX 1.3.6.1.4.1.1466.115.121.1.8 )
Since the definition of userCertificate doesn't specify the number of values, the attribute can hold any number of values. In certain military and highly secure environments, a single user can hold many hundreds of certificates.5 In this situation, even if the LDAP client wants only to retrieve a single certificate, all of the user's certificates are retrieved and must be examined one at a time in order to find the desired certificate. Not only are all the certificates returned by the LDAP server, they are returned unordered, so the LDAP client may have to examine each certificate in the entry to find the desired certificate. A typical certificate is about 2K bytes. Thus, the LDAP result containing 250 certificates would contain about 500K of data. Thus, in addition to the computational overhead of examining each certificate in order to find the right one, the network overhead would certainly slow down the response time. A better situation is to enhance the schema and revise the DIT. An alternate schema to solve this problem is proposed in a current Internet Draft.6 This schema contains the following object class definition:
( NAME 'certificateType' SUP top STRUCTURAL MUST typeName MAY ( serialNumber $
issuer $ validityNotBefore $ validityNotAfter $ subject $ subjectPublicKeyInfo $
certificateExtension $ otherInfo ))
This definition extracts many fields from the certificate data structure in order that they may be easily searchable by standard LDAP search operations. All of the fields except for certificateExtension are defined as SINGLE-VALUE. Notice that it does not include a certificate attribute. This is because the certificate is still attached to the certificateType entry using an auxiliary object class, such as strongAuthenticationUser. This new design for the DIT places all of a user's certificates in a container beneath the user's entry in the DIT rather than directly attached to the entry as in the previous design. Figure 4.3 illustrates this new DIT.
Figure 4.3 Normalized DIT for holding user certificates.
In this DIT the user henri has three certificates, which are found in the entries immediately beneath that entry in the DIT. They have these distinguished names:
-
tn = Visa, cn = henri, departmentName = art, departmentName = uc
-
tn = Master Card, cn = henri, departmentName = art, departmentName = uc
-
tn = American Express, cn = henri, departmentName = art, departmentName = uc
This allows an LDAP search operation to retrieve exactly the certificates that it wants and no more. For example, to retrieve henri's Visa certificate, this search could be issued:
-
Base Object = "cn = henri, departmentName = art, departmentName = uc," Scope = single level, Filter = "typeName = Visa"
- This search would match only tn = Visa, cn = henri, departmentName = art, departmentName = uc
If the alternate schema had been used, all of the certificates for the user henri would have been stored in the userCertificate attribute. The LDAP client would have to retrieve all of the certificates and parse through them to find the one that had been issued by Visa. Notice that this still allows for the easy retrieval of all of henri's certificates. This is done using the following search operation:
-
Base Object = "cn = henri, departmentName = art, departmentName = uc," Scope = single level, Filter = "objectClass = certificateType"
- This search would match all three certificateType entries in the DIT below bruceg's entry.
This same mechanism of restructuring the DIT and redefining classes can be used anywhere that attributes can have numerous values and LDAP clients need to retrieve the values only a few at a time. This mechanism also makes it simple to find all of the certificates from a single issuer or a certain type. For example, to find all of the certificates in the art department that are from Visa, the following search operation is used:
-
Base Object = "departmentName = art, departmentName = uc," Scope = subtree, Filter = "typeName = Visa"
-
This would match the three Visa certificateType entries in the DIT.
-
tn = Visa, cn = henri, departmentName = art, departmentName = uc
-
tn = Visa, cn = pablo, departmentName = art, departmentName = uc
- tn = Visa, cn = augustus, departmentName = art, departmentName = uc
-
In these examples, the entire certificateType entry is retrieved. Note that it is possible to just retrieve the certificate itself by naming the userCertificate attribute type in the attributes field of the Search operation. Note that if there are multiple Visa certificates for a single user, then the userCertificate attribute would have multiple values for this certificateType entry. If this situation arises, a new naming scheme for the certificateType entries should be employed.
Delete and Update Anomalies
Delete and Update Anomalies occur in LDAP when there is a reference in one entry to the distinguished name of another entry. When the referenced distinguished name is deleted or renamed, the entries references are no longer valid. Some LDAP implementations go to great pains when entries are deleted or moved to make sure that all objects that reference the modified or deleted _entry are updated as appropriate.
A better solution that eliminates these anomalies is to restructure the DIT to take advantage of the hierarchy. Some object classes use the technique of placing the distinguished names of the referenced entries directly in the entry as in this standard LDAP object class:
(2.5.6.9 NAME 'groupOfNames' SUP top STRUCTURAL MUST _
( member $ cn ) MAY ( businessCategory $ seeAlso $ owner $ ou $ o $ description ) )
where the member attribute has this definition:
( 2.5.4.31 NAME 'member' SUP distinguishedName )
The use of the SUP designation in the attribute type definition is similar to its use in the object class definition. It is an indication that the syntax and matching rules in the specified SUP attribute type are to be used in this attribute type definition. Using this definition, each time a member in the groupOfNames is deleted or renamed, the groupOfNames object must be updated so that all of the member attribute values are valid. A better solution is to remove the member attribute from the groupOfNames object class and to place all member entries in the DIT beneath the groupOfNames object. Unfortunately, this is not a general solution since it does not allow the same member entry to enjoy membership in multiple groups. However, there are many applications in which restructuring the DIT in this way can be achieved.