Directory Services: Storing Directory Information
- Chapter 3: Storing Directory Information
- Partitioning the Directory
- Directory Replication
Chapter 3: Storing Directory Information
In this Chapter
The Directory Database
Partitioning the Directory
Directory Replication
At its core, a directory is an information repository, and how that information is stored and managed is of critical importance. What is perhaps not quite as obvious is that the methods used for the storage and management of directory information impact many aspects of directory functionality. How effectively information is stored, retrieved, and distributed can directly impact the overall scalability, performance, and reliability of the directory.
This chapter examines how directory data is stored and managed in a distributed directory environment, and it discusses how directory information is subdivided and replicated to multiple directory servers. This chapter also examines the methods of maintaining data consistency among the distributed portions of the directory.
The Directory Database
The unified collection of objects managed by the directory is stored in a database, commonly called the Directory Information Base (DIB). The DIB contains the directory objects representing information and entities such as users, network resources, applications, and so on along with the administrative data (such as security settings) needed to manage and control access to those objects.
The X.500 standards do not specify how the information is stored and retrievedthe database storage mechanisms are not considered part of the scope of the X.500 standards.
The methods used in directory database storage and retrieval mechanisms are implementation specific and can vary widely. Some vendors, such as Novell, write proprietary database engines to meet the specific needs of their directory service. Other vendors have chosen to use a relational database as the repository for directory information.
Because there will always be many more queries than updates performed on a directory, many vendors optimize the search engine and provide highly available catalogs to speed up resolution of client queries.
What Is Stored Depends on Focus
What information is stored depends entirely on what the directory service is being used for. For example, a directory service that is used to manage an enterprise network stores information not only on users, but also the servers, network services and resources, applications, and other information necessary to administer the network. Likewise, a directory that is used to manage an e-commerce site stores information on the users, and user preferences data.
Storing the Directory Database on Disk
Keep in mind that this description is about how the information is written to diskthe actual data storage mechanismnot about partitioning and replication, which is discussed throughout the rest of this chapter. The description that immediately follows assumes a unified (nonpartitioned) directory, or a single partition of a larger, partitioned directory.
Although the DIB is generally spoken of in the singular, it should be noted that this is only true in a logical sensefrequently more than one file makes up the DIB. A directory database may be contained in a range of storage structures, from a single file containing the directory information, to a collection of files containing subsets of directory data with a table of pointers used to link and organize the files.
A directory that has a specialized and small information set is more likely to use a single file than a large general-purpose directory. A quick look at two extremes will help illustrate why.
Single fileAt one end of the spectrum is a directory with a relatively simple and small datasetthe Domain Name System (DNS). DNS, in most implementations, stores its information in text files that are very small and consist of a small dataset that can be searched quickly. Of course, DNS has one of these text files for each zone (a zone is analogous to a partition); therefore, the DNS directory is made up of a distributed database of millions of these little text files.
Multiple filesAt the other extreme is Novell's eDirectory, a more general-purpose directory that commonly contains a large amount of information of varied types. To support the flexibility needed by eDirectory, Novell has devised a storage method that uses a series of files, each containing a particular portion of the data that makes up the DIB. One file stores basic information about directory objects; a variable number of others store most of the actual property value information (that is, the data) associated with those objects.
Distributing the Directory Database
Whatever the specifics of a particular DIB implementation, directory service designers must contend with some fundamental data management issues. Obviously, support for the distribution of the DIB must be explicitly defined for a directory to function with a distributed datastore. The support for a distributed datastore is implemented via partitioning and replication of the DIB. Concomitantly, a directory service must define a method of linking the partitions into a complete directory tree, as well as a means to pass queries and other information between partitions.
DIB replication presents other critical issues of data access and information consistency between the distributed replicas. When using multiple copies of the directory datastore (replicas), data integrity between the copies of the information must be maintained. All replicas must (eventually) be updated whenever changes are made to any replica. Additionally, to maintain the consistency of the directory information, some form of data synchronization must be performed.
The next section describes partitioning, and later sections examine replication and data consistency concepts and operations.