Basic Concepts
Seven basic concepts form the foundation of content management practice. Each concept embodies a single technique or idea that plays vital role in a comprehensive solution for content management.
Identify stakeholders.
Recognize the Chaos Zone.
Separate development and production.
Identify assets (source, derived, and deployed).
Use direct feedback.
Exploit parallel development.
Use both file-versioning and site-versioning.
Identify control mechanisms.
Stakeholder Identification
The cooperation of major stakeholders in the web operation is vital to successful implementation of a content-management solution. A stakeholder is an individual or a group that either participates in the web development, deployment, or production processes, or is a sponsor of web initiatives. Sometimes are even located outside your organization. The major stakeholders usually include the content contributor, the business owner, the content administrator, the production manager, and the executive. The diverse stakeholders have different needs and hold different perspectives that need consideration.
Each group of stakeholders benefits from an effective web development infrastructure in different ways. Occasionally, stakeholder groups work at cross-purposes. This adds an interesting dimension to our exposition. When we introduce a particular feature of the content-management solution, we'll highlight the benefits derived by each stakeholder group.
The web developer needs a development environment and fast track to speed intellectual contribution to the target audience. But at the same time, the developers' efforts should be fully supported with the checks and balances that only an organization can provide. They need to have their work reviewed both technically and from a business perspective, to have their work versioned, and to have it tested. We'll define a web developer as a contributor that typically changes multiple web assets to complete a task.
The content contributor also needs a productive development environment. Unlike a web developer, a content contributor typically changes a single web asset. The content contributors also need support in the form of technical and business-centric reviews of their work. They require effective testing and review of assets they contribute while rendering their work in a realistic context. Their contributions are highly structured; examples include artwork of precise shape and size, or highly structured, tagged text.
The business owners of a web initiative need a way to express their priorities, with the ability to advise and comment to the end product as it moves through development. Does the content look right? Does it convey the right impression? The business owner cares that the business goals are achieved, according to both tangible measures and intangible factors that require visual inspection. For example, a business goal might be to convey a seamless linkage from a company web site to a business partner's web site. To make this determination, the business owner of the company's partnership needs to grasp the physical layout of the page, and the "feel" and the placement of the required clicks.
The content administrator wants to ensure that scarce company resources are applied effectively to manage and evolve the web property. The overall goal is to build a solid infrastructure so that all groups work effectively. In addition, the content administrator must make sure that the talent that took so much effort to recruit and retain is productive, and that busy-work and throwaway work is absolutely minimized. All assets need to be present and accounted for. Remember that the talent goes home at the end of the day, but the organization is responsible for managing and keeping safe the fruits of everyone's daily laborsnamely, the web assets.
The production manager desires that the production version of the web property is of high quality and that it moves to production to the correct location at the right time. In addition, the web property must be resilient so that if a problem is found in the web property, there is an immediately available backup copy. Because mistakes can occur at any time, there needs to be a prearranged way to revert to a last-known good copy of a web property.
The business executive wants infrastructure and processes to allow the business to react swiftly to changes, whether they are competitive pressures, market pressures, or simply the need to move quickly, surely, and with measured steps on a business initiative.
All of these stakeholders have needs that are vital to the successful operation of a web-based business. Any solution must take into account their needs and motivations. This is our first principle of content management.
Are We in the Chaos Zone?
"Hey, someone clobbered my changes!"
Although multiple people change an asset, this needn't be a problem if this conflict condition is detected and handled appropriately. Without this detection, one person's change overwrites another's. The downfall is the inability to deal with multiple modifiers of the same asset, or possibly to preclude multiple modifiers in the first place.
"I didn't know that was changing!"
This is a milder form of the previous problem. The surprise reveals that some change should have been known. Someone should have been notified. A succession of surprises of this sort undermines the team's confidence in undertaking tasks with assurance. Occasionally the problem leads to loss of data.
"No single person can keep track of all the changes anymore."
This person wishes that a single person could keep track of all changes, but that's no longer practical. The problem is now out of hand. There's a sense of resignation. The organization must act quickly before morale suffers irreparably or the business sustains damage.
"Hey, my change wasn't supposed to go out yet!"
Running a web site requires deployment in deft synchrony with time-sensitive and periodic events. This process is definitely out of control because the content owner's change went live prematurely. Perhaps deployment procedures are error-prone. Perhaps content owners don't have a way to specify the required deployment timing on an asset. Perhaps content with different timing requirements is mixed together.
"I'm stuck because your changes broke my stuff!"
When one person's change "breaks" someone else's, this reveals that there is inadequate integration of the changes. Who went first, or what broke what, isn't really the point. This is a process breakdown. The organization needs to take steps to strengthen the process and build an infrastructure that makes it natural for integration to occur. Encourage early and frequent integration of changes.
These signs indicate a realm that we refer to as the Chaos Zone. Managing a development effort becomes increasingly challenging when it enters this mode of operation. The Chaos Zone is defined when a web operation can be characterized as follows:
More than 5,000 assets
More than seven developers, reviewers, testers
More than one deployment per week
Scale of business exceeds $1 million annually
Although these figures should be interpreted as approximate guidelines, they are chosen to reflect the scale at which a web operation begins to experience the productivity-sapping effects of chaos. If three or more of these factors are true for your organization, then the content-management solution recommended in this article should be seriously considered. As a web operation increases in size, pace, and importance, most organizations reach a point that existing informal processes and tools become inadequate for the challenge at hand.
Recognize the signs of the Chaos Zone. Existing processes are different. They become inadequate at different thresholds. For example, a web team might use the practice of making changes directly on the production web server, known as the "direct edit" approach. Another team might make changes on a staging server and enlist the help of a small team of people to selectively copy changes to the production web server; this is also known as the "staging server" approach. Each of these approaches has different advantages and drawbacks, relative to the approach that will be presented in this article. Regardless of the approach, the benefits of formal content management become especially dramatic in the Chaos Zone.
Development and Production Separation
Web development in the Chaos Zone involves large numbers of contributors, and the process must move quickly. Each day represents a seething chaotic mass of activity that has the potential to spin out of control without warning. With chaos a possible outcome, the final output from the activity is critical to the organization and must be carefully controlled.
The primary goal of development is to arrange people and processes to efficiently and rapidly allow an organization to create, edit, review, and test successive changes to a web property. Development combines editorial, creative, and programming inputs, to produce a released version. A released version of a web site is the complete set of web assets corresponding to a given point in time. We'll also refer to the released version as simply a version of a web site. Here we use the term "web asset" in a broad sense; a web asset is a file, a directory, or a row in a database table.
The next principle divides the difficult job of managing a web property into two smaller subproblems: development and production. Usually a different organization manages development, compared to production. Deploying controlled, released versions of our web property defines a clear handoff between the two organizations. (See Figure 1.)
Figure 1 Deployment moves developed web assets to one or more production servers.
Asset Identification
Content management is all about managing, protecting, nurturing, and evolving an ever-increasing collection of assets in a web property. We retain previous versions of a web property. We often need previous versions as archival copies, as references to assist in ongoing development, or we need to retain the ability to revert an individual asset or to revert an entire web property to a designated last-known good version.
Because they are entirely in digital form, assets can be copied, moved, manipulated, and transformed with ease, thanks to the help of powerful computers and storage devices. But storage capacity and processing power aren't limitless. The key to success is to save precisely what we need, but no more. We may retrieve some previous version of the entire web property or one particular version of a specific asset contained within it.
If we save more than what is minimally required, we will eventually regret that choice. At least at first, storage capacity seems limitless, like the empty rooms of a house after you first move in. You're eager to fill the rooms with furniture. But like the house filled with clutter, a storage device filled with superfluous copies of assets becomes a burdensome chore to manage. Add to that the difficulty of finding what we're looking for. The same happens to a web property as unnecessary historical copies add clutter, making it more difficult to find specific assets for specific dates.
Our task is to define the minimal set of assets required to regenerate a web property. A source asset is a special category of asset that is valuable because it is the result of a skilled individual using a tool. We strive to maintain possession of this asset because its contents are difficult or impossible to reproduce. A derived asset is an asset that can be automatically generated through a tool or other automatic procedure, possibly from another asset. The property that distinguishes a source asset from a derived asset is that a derived asset can be easily reproduced, whereas a source asset cannot. For example, a Photoshop .psd file typically represents the work of a skilled artist, and if we were to lose a Photoshop file, it might be difficult or impossible to re-create the original image. In other words, a Photoshop .psd file is a source asset. In contrast, we consider a .gif file a derived asset when it can be regenerated from a source asset.
For the purposes of web content management, we version source assets. We reuse them, or use them to base follow-up work on. For example, we put an Active Server Page (.asp) and a Java Server Page (.jsp) source file under version control. First, we may need to refer to the logic that it contains. Second, we revert to a previous version of the file if we discover a critical bug in the current version of the file. Third, we create an improved version of file by using some previous version as the starting point.
Here's our third principle of web content management.
We'll see later that, under certain circumstances, deployed assets, including some derived assets, ought to be versioned as well.
Direct Feedback (WYSIWYG)
A web property's value depends on the experience that it provides for the user. Visual elements, together with active elements and embedded logic, respond to requests, such as search and personalization engines. In the same way that a web visitor responds to a page layout or to the tactile feel of a mouse-over navigation element, the creator, tester, and reviewer most effectively and rapidly judges the suitability of those elements by experiencing them in the context of a functioning web site. Identical factors govern the experience of developing, testing, and reviewing the underlying assets in the first place.
The direct feedback paradigm says that the activities within a work cycle should minimize the perceptual distance between making a change and viewing the results of the changein other words, apply the what-you-see-is-what-you-get (WYSIWYG) principle to web assets. For example, change a common included navigation file; then view how the modified look affects the user experience throughout a web site. (See Figure 2.)
Figure 2 A developer gets direct feedback from the results of her changes.
Parallel Development
Multiple concurrent projects can mean that you are in the Chaos Zone. At any single time, projects are commencing, some are in midflight, and others are in the completion phases of testing and review. This is inevitable, given the size of the team, the number of assets, and the frequency of updates to the production web site.
In an environment with multiple concurrent projects, it is essential to provide ongoing tasks with separate copies of the web property so that you can make edits, test the result, and solicit review and approval. We define a task as a set of interrelated changes to the web property. Here are some examples of tasks:
A single developer makes an HTML change.
A web designer and a graphic designer collaborate on new pages.
A developer changes the logic in C++ files to fix a bug.
Several marketing managers create press releases, all of which are scheduled to go live on the same day.
Notice that in items 1 and 3, a single developer uses a workarea. (See Figure 3.) In items 2 and 4, several people work in the same workarea. In item 2, a web designer and a graphic artist collaborate on new pages, so, in this case, it makes sense for them to work in the same workarea because their changes don't interfere with each other's. For example, the web designer might change the HTML, while the artist will change the images. In item 4, the marketing managers make changes independently (see Figure 3), but the commonality is that all of their changes go live on the same day.
Arrange to conduct development on a single task in each workarea. This is our fifth principle.
This typically means that each content contributor has a separate workarea, or possibly a small handful of people working closely together use a common workarea. This minimizes the overlap and potential interference between the tasks.
Figure 3 Tasks proceed independently in separate workareas.
Versioning
Versioning means that there are earlier versions to refer to and that earlier versions are available to fall back to, as an insurance policy. Although mistakes always occur, people are more productive, daring, and innovative when they know that there is a safety net to rescue them.
Content management relies on two kinds of versioning. Submit is a composite operation that copies an assets from a workarea to the staging area, makes an immutable snapshot of the asset, records the current time and submitter, and collects a textual comment from the submitter. In file-level versioning, an asset is submitted to the staging area when the asset has been tested and reviewed; it is now ready to be incorporated into the workareas of other developers. Once in the staging area, a submitted asset is read-only in the following sense. It can be superseded only by a newer version from one of the workareas, or it can be deleted from the staging area. The submission of an asset is an important event because subsequent testing and review takes place on a fixed copy of an asset.
Publish is a composite operation that creates a read-only copy of the entire web staging area, records the current time and publisher, and collects a textual comment from the publisher. In site-level versioning, the current content of the staging area is published as an edition. This operation records an edition, which is an immutable snapshot of the contents of the staging area. The read-only snapshots of the staging area become known reference points on which subsequent development is based.
File- and site-level versioning encourages multiple streams of work conducted in parallel, whereas file- and site-versioning operations record the contents of assets individually or as an entirety. Versioning boosts productivity and throughput because developers make changes confidently, knowing that previous asset versions provide a safety net in case of a misstep.
Control Mechanisms: Auditing and Enforcement
The size of an organization strongly influences auditing and enforcement requirements. Auditing records the results of important activities, such as submitting a file or deploying changes to production. Enforcement is an activity that allows and denies the ability to carry out key activities, such as reading a file, modifying a file, changing a field in a data record, or deploying content to a particular production server during a certain time period.
To illustrate these concepts, we'll introduce three broad categories of human organizations: a tribe, a chiefdom, and a state. As you will see, these will server as metaphors for types of web development organizations. Distinguished mainly by size, each kind of organization adopts particular ways of communicating within itself, making decisions, and influencing the behavior of its members.
A tribe consists of a small, tightly knit group of people. In a tribe, everybody knows everyone else, and information flows freely. Despite the presence of a tribal leader, decision-making tends to be informal or even communal. In a tribe, consensus tends to be an important part of the decision-making process. There is minimal process or bureaucracy. If we extend this idea to the context of web development, an organization with tribe-like characteristics focuses on moving quickly and safely. This means that work routing and version control are strong needs. Communication within the tribe tends to be effective. Because of the level of trust and communication, permission systems, access control, formal authorization, and auditing aren't necessary.
A chiefdom consists of a group of people who have exceeded the ability of tribe-like behaviors to suit their needs. Too large for free-flowing informality, a chiefdom installs a chief to initiate jobs that carry out the chief's intentions. Specialization occurs within functional areas. However, because this specialization doesn't negate the need for information to continue to flow between areas, there's an important need to gather, record, and pass information between functional areas. Despite the criticality of information flow, there's a countervailing need to control the flow of certain kinds of information and to restrict the flow of other kinds of information. Most of the operation remains informal, but assistant chiefs or trusted tribal members are given additional latitude. They become the "glue" that makes the loosely organized system work.
We can extend this idea to web development. With the larger number of members, a chiefdom cannot be as informal or freewheeling as a tribe. There's a need to track the activities of members. A chiefdom institutes audit mechanisms to record important activities, such as deploying new content to the production servers. In case something goes wrong, audit trails enable the group to determine the cause and to provide a remedy. It is important to note that the ability to audit does not itself prevent intentional misbehavior, nor does it preclude accidents. Instead, social pressures are sufficient to encourage good behavior, without resorting to heavy-handed enforcement.
A state stands in contrast to a tribe or a chiefdom in that a significant amount of collective energy focuses on making normal activities uniform and predictable. Although a state has more resources at its disposal, because of its scale, it has little ability to bestow blanket trust on significant segments of its populace with certain knowledge or certain capabilities. Instead, it expends effort to deny certain knowledge and capabilities to defined segments and to enforce such restrictions in a provable manner. A state relies on enforcement. It has exceeded the scale at which simple trust is sufficient, as in a tribe.
In a web development context, a state consists of a group of people so large that no single person or even group of trusted individuals can make it all work. Distinct functions have evolved, each with a defined charter. There's a continual effort to refine and redefine department charters. It is an ongoing struggle to codify practices and disseminate knowledge in order to repeat successes and to avoid mistakes of the past. The flow of information is more strictly controlled. There's a need to meet audit requirements, for instance, to prove that so-and-so did something or, conversely, to be able to prove that so-and-so could not possibly have done something. Rules and regulations prevent unauthorized access to forbidden resources and block the ability to initiate a prohibited action.
A state exceeds the ability of simple auditing of key activities to achieve its goals. A state's larger population contains both newcomers and old-timers who aren't fully aware of carefully honed rules and regulations. Practicality dictates that a state clearly defines responsibilities and capabilities, and that the use of rights to exercise those capabilities be enforced instead of merely audited.
We see then that the size of an organization imposes constraints on the mechanisms at its disposal to carry out its activities. A tribe has the luxury of adopting informal procedures and consensus decision-making. A chiefdom relies more heavily on functional specialization, and widescale communication becomes less efficient. Auditing is essential. In addition, a chiefdom relies on social pressures to avoid enforcement as a tool. A state, just by virtue of its scale, can no longer rely on social pressures and auditing. Instead, enforcement becomes essential. Make no mistake, enforcement adds necessary overhead, such as adding a group to administer access controls or requiring that a deployment script check that the initiator has the proper authorization before commencing the deployment itself.
We can extend the tribe-chiefdom-state idea as we analyze an online issue such as knowledge of the root or administrator password. In a tribe, each member is trusted to use the root password in a responsible manner. The practical reality is that everyone at some time will need to use it, and everyone is trusted to use that knowledge and capability wisely. If an unintended consequence or misuse occurs, the tribe gathers to pool their collective knowledge to identify and repair the breach.
In a chiefdom, most people don't need to know the root password, but it is common knowledge because there's a loosely organized substructure that routinely needs to use it. If some activity needs the root password, everyone is but a handful of steps away from someone who possesses that knowledge. Efficiency has higher precedence over precise control.
In a state, authority must be granted for critical information, such as to know and use the root password to selected job functions. It is important to prove that large segments cannot possibly knowand, therefore, cannot possibly misusethe root password. Any breach of this rule is itself a problem because the state loses its ability to eliminate certain possibilities if a problem were to occur. By this line of reasoning, it follows that someone must have explicit responsibility to deny root password access to nearly everyone in the state.
This discussion applies to other access and capabilities as well: for example, the ability to see or change a file in a certain area of the web site, or the ability to deploy specific content to a particular production server at a certain time.
Table 1 shows a checklist to categorize the character of an organization. Knowing this, it is possible to infer the issues and requirements that will be important to that organization and thereby know how to focus the implementation effort.
Table 1 Characteristics of organizations and their control mechanisms.