The Service Operations Domain
- "Civilization advances by extending the number of important operations which we can perform without thinking about them"
- —Alfred North Whitehead
This section covers governance of services and automated business processes in a production environment. SOA offers both benefits and challenges to IT operations. The benefits include the following:
- It's much easier to monitor the usage and QoS provided by services, and there are some excellent commercial products that support this activity, some of which even provide real-time "dashboards" to display items such as QoS and usage statistics.
- Services offer much richer deployment opportunities and much more information about who uses them than do applications.
- A generic security mechanism can protect all services from security exposures or malicious threats.
The challenges include the following:
- SLAs need to be continuously monitored, and SLA violations need to be recorded.
- Frequent deployment of individual services can represent a threat to the stability of the production environments unless the service quality, build management, and deployment processes are highly effective.
Key SOA Governance Tasks Involved in Operating Services
Key tasks in ensuring that SOA operations are governed effectively include the following:
- Providing effective technical support for services and resolving software quality incidents efficiently and rapidly
- Monitoring the execution of services and automated business processes, most especially in terms of monitoring their compliance with the terms of their SLAs
- Maintaining the vitality, reliability, availability, and performance of the operational systems
Most organizations have a separate IT operations organization to manage the production IT systems. Table 4.2 describes the key capabilities that an IT operations organization needs to enable effective governance of services.
Table 4.2. Service Operations Domain: Capabilities, Risks, and Remedial Work Products
Capability |
Associated Issues and Risks |
Risk Level |
Governance Work Products |
Cost of Remedy |
O01. Service execution monitoring |
Service consumers expect high availability and good QoS—and the SLAs guarantee that they will receive them. |
Critical |
|
Fairly high |
O02. Service operational vitality |
Service consumers do not want to have to modify their applications every time new service versions appear, unless those versions contain changes that they need. |
Critical |
|
Moderate |
O03. Service support |
Service consumers may need technical support if there are any bugs or other problems with services. |
Critical |
|
Fairly high |
Service Operations Domain Work Product Definitions
This sections contains a list of descriptions of the work products whose production can help govern the Service Operations domain. Most of these work products should exist in any moderately well-governed IT production environment, but some of these need additional extensions for SOA. Again, they are organized in alphabetic order, and work products that have already been defined in Chapter 3 are not repeated. The roles involved in creating these products were also defined in that chapter.
Deprecated or Decommissioned Services Work Product
Description: This is just a list of obsolete services that have been or will eventually be discontinued. Deprecated service should continue to be supported, but no new consumers should be able to access them. The number of such obsolete or obsolescent service versions should help determine the efficacy of the service version management approach—in an ideal world there would be few if any of these.
Purpose: Monitor SOA operational vitality.
When needed: Every four to six months.
Responsible: Service registrar
Accountable: Business service champions.
Consulted: Service consumers, PMO.
Informed: SOA enablement team, PMO, service consumers, IT operations.
Problem Incident Log Work Product
Description: This is a basic IT governance work product that should be updated to reflect the specific needs of service consumers and users of automated business processes. It records details of any technical problems that have occurred and how they were resolved.
Purpose: Used to provide invaluable information about the quality of deployed services. The SOA governance lead and lead SOA architect should look for any system quality issues and correct them as a matter of urgency.
When needed: Should be completed before any SLAs are published, and the SLAs should contain clauses that reflect the level of technical support that will be provided.
Responsible: Help desk / technical support staff.
Accountable: IT operations manager, technical support manager.
Consulted: SOA governance lead.
Informed: Problem reporter, PMO.
Quality of Service Goals Work Product
Description: QoS goals set explicit targets for performance and operational efficiency that services and automated processes will be measured against. The goals should exceed those specified in individual service nonfunctional requirements and SLAs by a small increment to ensure that those contractual conditions are more secure than the QoS goals themselves.
Purpose: Monitor performance of service operations.
When needed: Created as the first services are deployed, then updated every six months or so.
Responsible: Lead service architect, lead SOA architect, SOA governance lead.
Accountable: IT operations manager.
Consulted: Business service champions, PMO, individual business units.
Informed: IT operations, SOA enablement team, PMO.
QoS Monitoring Plan Work Product
Description: This is the plan to monitor execution of services and automated business processes against the QoS goals. This plan should include monitoring the business impact of any SOA infrastructure problem. The breakdown of a single network element might seem relatively insignificant, but if it represents a single point of failure in a major service or function of the SOA infrastructure, it may have significant business impact, unless the operational model includes full redundancy.
Purpose: Ensure that QoS goals are met.
When needed: Created as the first services are deployed, and then updated every six months or so.
Responsible: Lead service architect, lead SOA architect, SOA governance lead, monitoring developer.
Accountable: IT operations.
Consulted: Business service champions, QA, PMO.
Informed: SOA enablement team.
QoS Report Work Product
Description: This a regular report (or ideally a component on a real-time governance dashboard) that compares actual QoS with the QoS goals on a service-by-service basis.
Purpose: Ensure that QoS goals are met.
When needed: At least monthly, ideally updated online in real time.
Responsible: Lead service architect, SOA architect, SOA governance lead, monitoring developer, IT operations.
Accountable: SOA governance lead or IT operations manager.
Consulted: IT operations.
Informed: SOA enablement team, SOA executive sponsor.
Service Operational Vitality Report Work Product
Description: This is a regular summary report that provides an overview of the status of SOA operational vitality, including growth in service usage (both in terms of new service consumers and growth in service execution requests), QoS reporting, and new services deployed during each reporting period. It draws from several sources, such as the QoS report, SLA compliance report, and deployed services and service usage data provided by IT operations.
Purpose: Communicate SOA operational vitality.
Responsible: SOA governance lead, monitoring developer, and PMO define the structure and layout of the report; IT operations and service registrar provide the data.
Accountable: IT operations manager or SOA governance lead.
Consulted: Business service champions, IT operations.
Informed: SOA enablement team, SOA executive sponsor.
Service Usage Data Work Product
Description: One of the major advantages of SOA over more traditional software development styles is that it is possible to capture a great deal of operational information about usage of services and automated processes. In fact, it is impossible to govern the Service Operations domain, or to bill third parties for service usage, unless this information is captured and recorded.
It is impossible to capture usage statistics manually, so it is essential that the SOA infrastructure itself automates this task. Data that should be captured for both services and automated processes include the following:
- Who invoked each service operation or automated process?
- Were there any attempts to invoke a service or process by unauthorized requesters?
- What was the total time taken to execute the request, and was it within the SLA and QoS targets?
- What were the overall transaction rates for each service, hour by hour?
In the case of automated processes, additional data needs to be recorded:
- Which of the several possible logical branches were taken in each execution?
- When an automated process invoked a manual task, how long before that task completed?
- Are any manual tasks "hung up"—that is, not completed within the QoS thresholds for that task?
The volume of data this represents will be formidable, and it will need to be summarized in a report or dashboard display.
Purpose: Capture and summarize essential data on performance and utilization of services and automated processes.
Responsible: SOA governance lead, monitoring developer, and PMO define the structure and layout of the report or dashboard display; IT operations provides the raw data.
Accountable: IT operations manager or SOA governance lead.
Consulted: Business service champions, IT operations.
Informed: SOA enablement team, SOA executive sponsor.
SLA Compliance Report Work Product
Description: There are two types of SLA compliance report:
- Regular SLA compliance confirmation. Most, and ideally all, SLA compliance monitoring reposts should show actual QoS numbers achieved that are well within the SLA terms.
- The more important—but rare, we hope—SLA compliance failure reports. In the event of SLA compliance failures, notifications should be sent urgently to the service consumers, the lead SOA architect, business service champions, and the SOA governance lead, apprising them of the situation and the actions taken to prevent recurrence.
Purpose: Monitor SLA compliance and handle incidents.
When needed: Regular SLA compliance reporting should be performed monthly, in support of the SLA monitoring plan described next. In the event of a SLA violation, immediate corrective action should be taken, and all effected service consumers informed as soon as possible.
Responsible: IT operations, monitoring developer, SOA architect, service architects.
Accountable: IT operations manager.
Consulted: Service architect, service designer, QA in the event of SLA violations.
Informed: PMO, SOA enablement team, help desk / technical support staff, service consumers.
SLA Monitoring Plan Work Product
Description: This is the plan to monitor all service and process executions, consumer by consumer, to ensure that they are executed with the terms of the SLAs appropriate to that individual consumer. SLA monitoring needs to be performed using a formal systems management tool or framework; it is not a task that can be performed manually.
Purpose: There is no point in having SLAs if you don't know whether you are meeting them. There is little value in SOA governance without formal SLAs being in place and enforced.
When needed: At the same time as the SLAs are defined.
Responsible: PMO, lead service architect, lead SOA architect, IT operations, SOA governance lead, monitoring developer.
Accountable: SOA governance lead or PMO.
Consulted: Business service champions, IT operations.
Informed: IT operations, QA, PMO, SOA enablement team.
Technical Support Approach and Targets Work Product
Description: This is a basic IT governance work product that should be updated to reflect the specific needs of service consumers and users of automated business processes. It defines the level and speed of technical support that internal and external service consumers can expect. Enhanced levels of support should be provided for high-severity problems, and for high-value services. Assured levels of technical support should be included in service and automated process SLAs.
Purpose: Ensure that and SLAs honored and that service consumers receive adequate technical support.
When needed: Should be completed before any SLAs are published, and the SLAs should contain clauses that reflect the level of technical support that will be provided.
Responsible: Lead service architect, SOA governance lead, PMO.
Accountable: IT operations manager.
Consulted: Business service champions.
Informed: Help desk / technical support manager, QA, PMO.
The dependencies among work products are shown in Figure 4.4. Again, the format is the same as for Figure 3.3 (in Chapter 3) and Figure 4.3.
Figure 4.4 Service Operation Domain: Work Product Dependencies