1.5 Policy-Based Self-Healing
Sometimes the primary role of policies is to make sure that the operational state of the system satisfies the policies that are defined within the system. If the system is not satisfying those constraints, it should take corrective actions or create an alert. Thus, both action and alert policies can be used to implement self-healing systems, although one may take the position that the alert policy simply allows the system to call for assistance when it sees an issue rather than healing itself.
An example of this is the following security policy:
The temperature in the blade center must be maintained at less than 65 degrees.
For example, if one of the fans in the blade center breaks, this policy can trigger an action to put some of the blades to sleep in order to reduce the temperature. If the action does not sufficiently reduce the temperature, an alert policy can be triggered to request the attention of the system administrator.
Another good example can be found in storage area network (SAN) configuration management. SAN configuration is a complex problem because of the interaction between many different devices and software systems. Configurations need to make sure proper device drivers are installed, incompatible devices are not connected or are not configured in the same zone3, redundancy requirements are fulfilled, and so on. To cope with the complexity, experts come up with various sets of policies that represent best practices for interoperability and reliability. An example of a policy in this set would be the following:
The same host bus adapter (HBA) cannot be used to access both tape and disk devices.
This policy can be verified automatically if there is an appropriate software module installed at different computers and devices in the storage area network. Data about the system configuration is collected and policies are evaluated against the data. If the configuration is not compliant, the policy management service reports the violations and may isolate parts of the system to avoid errors or failures. In some cases, it may even be able to modify the system automatically for compliance, thereby achieving the goal of self-healing.
As previously mentioned, a machine-readable language needs to be developed for the purpose of defining policies, and an information model is an important aspect of the specification process. The software module that validates compliance needs to be a component of the management software for the system that collects the state information and checks it for violation of policies.