1.2 The Evolution of CERT-RMM
The CERT Resilience Management Model is the result of an evolutionary development path that incorporates concepts from other CERT tools, techniques, methods, and activities.
In 1999, CERT officially released the Operationally Critical Threat, Asset, and Vulnerability Evaluation (OCTAVE) method for information security risk management. OCTAVE provided a new way to look at information security risk from an operational perspective and asserted that business people are in the best position to identify and analyze security risk. This effectively repositioned IT's role in security risk assessment and placed the responsibility closer to the operations activity in the organization [Alberts 1999].
In October 2003, a group of 20 IT and security professionals from financial, IT, and security services, defense organizations, and the SEI met at the SEI to begin to build an executive-level community of practice for IT operations and security. The desired outcome for this Best in Class Security and Operations Roundtable (BIC-SORT) was to better capture and articulate the relevant bodies of knowledge that enable and accelerate IT operational and security process improvement. The bodies of knowledge identified included IT and information security governance, audit, risk management, IT operations, security, project management, and process management (including benchmarking), as depicted in Figure 1.2.
Figure 1.2 Bodies of Knowledge Related to Security Process Improvement
In Figure 1.2, the upper four capabilities (white text) include processes that provide oversight and top-level management. Governance and audit serve as enablers and accelerators. Risk management informs decisions and choices. Strategy serves as the explicit link to business drivers to ensure that value is being delivered. The lower four capabilities (black text) include processes that provide detailed management and execution in accordance with the policies, procedures, and guidelines established by higher-level management. We observed that these capabilities were all connected in high-performing IT operations and security organizations.
Workshop topics and results included defining what it means to be best in class, areas of pain and promise (potential solutions), how to use improvement frameworks and models in this domain, the applicability of Six Sigma, and emerging frameworks for enterprise security management (precursors of CERT-RMM) [Allen 2004].
In December 2004, CERT released a technical note entitled Managing for Enterprise Security that described security as a process reliant on many organizational capabilities. In essence, the security challenge was characterized as a business problem owned by everyone in the organization, not just IT [Caralli 2004]. This technical note also introduced operational resilience as the objective of security activities and began to describe the convergence between security management, business continuity management, and IT operations management as essential for managing operational risk.
In March 2005, CERT hosted a meeting with representatives of the Financial Services Technology Consortium (FSTC).1 At the time of this meeting, FSTC's Business Continuity Standing Committee was actively organizing a project to explore the development of a reference model to measure and manage operational resilience capability. Although our approaches to operational resilience had different starting points (security versus business continuity), our efforts were clearly focused on solving the same problem: How can an organization predictably and systematically control operational resilience through activities such as security and business continuity?
In April 2006, CERT introduced the concept of a process improvement model for operational resilience in the technical report Sustaining Operational Resiliency: A Process Improvement Approach to Security Management [Caralli 2006]. This technical report defined fundamental resilience and process improvement concepts and detailed candidate focus areas (called "capability areas") that could be included in an eventual model. This document was the foundation for developing the first instantiation of the model.
In May 2007, as a result of work with FSTC, CERT published an initial framework for managing operational resilience in the technical report Introducing the CERT Resiliency Engineering Framework: Improving the Security and Sustainability Processes [Caralli 2007]. In this document, the initial outline for a process improvement model for managing operational resilience was published.
In March 2008, a preview version of a process improvement model for managing operational resilience was released by CERT under the title CERT Resiliency Engineering Framework, v0.95R [REF Team 2008a]. This model included an articulation of 21 "capability areas" that described high-level processes and practices for managing operational resilience and, more significantly, provided an initial set of elaborated generic goals and practices that defined capability levels for each capability area.
In early 2009, the name of the model was changed to the CERT Resilience Management Model to reflect the managerial nature of the processes and to properly position the "engineering" aspects of the model. Common CMMI-related taxonomy was applied (including the use of the term process areas), and generic goals and practices were expanded with more specific elaborations in each process area. CERT began releasing CERT-RMM process areas individually in 2009, leading up to the "official" release of v1.0 of the model in a technical report published in 2010. The model continues to be available by process area at www.cert.org/resilience.
The publication of this book marks the official release of CERT-RMM v1.1. Version 1.1 includes minor changes to process areas resulting from field use and piloting of the model. In addition, version 1.1 introduces the concept of the operational resilience management system, which broadly defines the organization's collective capability and mechanism for managing operational resilience. More about the operational resilience management system can be found in Section 2.2.
CERT-RMM
CERT-RMM draws upon and is influenced by many bodies of knowledge and models. Figure 1.3 illustrates these relationships. (See Tables 1.1 and 1.2 for details about the connections between CERT-RMM and CMMI models.)
Figure 1.3 CERT-RMM Influences
Table 1.1. Process Areas in CERT-RMM and CMMI Models
CMMI Models Process Areas |
Equivalent CERT-RMM Process Areas |
CAM—Capacity and Availability Management (CMMI-SVC only) |
TM—Technology Management CERT-RMM addresses capacity management from the perspective of technology assets. It does not address the capacity of services. Availability management is a central theme of CERT-RMM, significantly expanded from CMMI-SVC. Service availability is addressed in CERT-RMM by managing the availability requirement for people, information, technology, and facilities. Thus, the process areas that drive availability management include
|
IRP—Incident Resolution and Prevention (CMMI-SVC only) |
IMC—Incident Management and Control In CERT-RMM, IMC expands IRP to address a broader incident management system and incident life cycle at the asset level. Workarounds in IRP are expanded in CERT-RMM to address incident response practices. |
MA—Measurement and Analysis |
MA—Measurement and Analysis is carried over intact from CMMI. In CERT-RMM, MA is directly connected to MON—Monitoring, which explicitly addresses data collection that can be used for MA activities. |
OPD—Organizational Process Definition |
OPD—Organizational Process Definition is carried over from CMMI, but development-life-cycle–related activities and examples are deemphasized or eliminated. |
OPF—Organizational Process Focus |
OPF—Organizational Process Focus is carried over intact from CMMI. |
OT—Organizational Training |
OTA—Organizational Training and Awareness OT is expanded to include awareness activities in OTA. |
REQM—Requirements Management |
RRM—Resilience Requirements Management Basic elements of REQM are included in RRM, but the focus is on managing the resilience requirements for assets and services, regardless of where they are in their development cycle. |
RD—Requirements Development |
RRD—Resilience Requirements Development Basic elements of RD are included in RRM, but practices differ substantially. |
RSKM—Risk Management |
RISK—Risk Management Basic elements of RSKM are reflected in RISK, but the focus is on operational risk management activities and the enterprise risk management capabilities of the organization. |
SAM—Supplier Agreement Management |
EXD—External Dependencies Management In CERT-RMM, SAM is expanded to address all external dependencies, not only suppliers. EXD practices differ substantially. |
SCON—Service Continuity (CMMI-SVC only) |
SC—Service Continuity In CERT-RMM, SC is positioned as an operational risk management activity that addresses what is required to sustain assets and services balanced with preventive controls and strategies (as defined in CTRL). |
TS—Technical Solution |
RTSE—Resilient Technical Solution Engineering RTSE uses TS as the basis for conveying the consideration of resilience attributes as part of the technical solution. |
Table 1.2. Other Connections Between CERT-RMM and the CMMI Models
Element |
Connection |
Generic goals and practices |
The generic goals and practices have been adapted mostly intact from CMMI. Slight modifications have been made as follows:
|
Continuous representation |
CERT-RMM adopts the continuous representation concept from CMMI intact. |
Capability levels |
CERT-RMM defines four capability levels up to capability level 3—"defined." Definitions of capability levels in CMMI are carried over for CERT-RMM. |
Appraisal process |
The CERT-RMM capability appraisal process uses many of the elements of the SCAMPI process. The "project" concept in CMMI is implemented in CERT-RMM as an "organizational unit." CERT-RMM capability appraisals have constructs inherited from SCAMPI. See Section 6.4.1 for the use of SCAMPI in CERT-RMM capability appraisals. |
At the descriptive level of the model, the process areas in CERT-RMM have been either developed specifically for the model or sourced from existing CMMI models and modified to be used in the context of operational resilience management. CERT-RMM also draws upon concepts and codes of practice from other security, business continuity, and IT operations models, particularly at the typical work products and subpractices level. This allows users of these codes of practice to incorporate model-based process improvement without significantly altering their installed base of practices. The CERT Resiliency Engineering Framework: Code of Practice Crosswalk, Preview Version, v0.95R [REF Team 2008b] details the relationships between common codes of practice and the specific practices in the CERT-RMM process areas. The Crosswalk is periodically updated to incorporate new and updated codes of practice as necessary. The Crosswalk can be found at www.cert.org/resilience.
Familiarity with common codes of practice or CMMI models is not required to comprehend or use CERT-RMM. However, familiarity with these practices and models will aid in understanding and adoption.
As a descriptive model, CERT-RMM focuses at the process description level but doesn't necessarily address how an organization would achieve the intent and purpose of the description through deployed practices. However, the subpractices contained in each CERT-RMM process area describe actions that an organization might take to implement a process, and these subpractices can be directly linked to one or more tactical practices used by the organization. Thus, the range of material in each CERT-RMM process area spans from highly descriptive processes to more prescriptive subpractices.
In terms of scope, CERT-RMM covers the activities required to establish, deliver, and manage operational resilience activities in order to ensure the resilience of services. A resilient service is one that can meet its mission whenever necessary, even under degraded circumstances. Services are broadly defined in CERT-RMM. At a simple level, a service is a helpful activity that brings about some intended result. People and technology can perform services; for example, people can deliver mail, and so can an email application. A service can also produce a tangible product.
From an organizational perspective, services can provide internal benefits (such as paying employees) or have an external focus (such as delivering newspapers). Any service in the organization that is of value to meeting the organization's mission should be made resilient.
Services rely on assets to achieve their missions. In CERT-RMM, assets are limited to people, information, technology, and facilities. A service that produces a product may also rely on raw materials, but these assets are outside of the immediate scope of CERT-RMM. However, the use of CERT-RMM in a production environment is not precluded, since people, information, technology, and facilities are a critical part of delivering a product, and their operational resilience can be managed through the practices in CERT-RMM.
CERT-RMM does not cover the activities required to establish, deliver, and manage services. In other words, CERT-RMM does not address the development of a service from requirements or the establishment of a service management system. These activities are covered in the CMMI for Services model (CMMI-SVC) [CMMI Product Team 2009]. However, to the extent that the "management" of the service requires a strong resilience consideration, CERT-RMM can be used with CMMI-SVC to extend the definition of high-quality service delivery to include resilience as an attribute of quality.
CERT-RMM contains practices that cover enterprise management, resilience engineering, operations management, process management, and other supporting processes for ensuring active management of operational resilience. The "enterprise" orientation of CERT-RMM does not mean that it is an enterprise-focused model or that it must be adopted at an enterprise level; on the contrary, CERT-RMM is focused on the operations level of the organization, where services are typically executed. Enterprise aspects of CERT-RMM describe how horizontal functions of the organization, such as managing people, training, financial resource management, and risk management, affect operations. For example, if an organization is generally poor at risk management, the effects typically manifest at an operational level in poor risk identification, prioritization, and mitigation, misalignment with risk appetite and tolerances, and diminished service resilience.
CERT-RMM was developed to be scalable across various industries, regardless of their size. Every organization has an operational component and executes services that require a degree of operational resilience commensurate with achieving the mission. Although CERT-RMM was constructed in the financial services industry, it is already being piloted and used in other industrial sectors and government organizations, both large and small.
Finally, understanding the process improvement focus of CERT-RMM can be tricky. An example from software engineering is a useful place to start. In the CMMI for Development model (CMMI-DEV), the focus of improvement is software engineering activities performed by a "project" [CMMI Product Team 2006]. In CERT-RMM, the focus of improvement is operational resilience management activities to achieve service resilience as performed by an "organizational unit." This concept can become quite recursive (but no less effective) if the "organizational unit" happens to be a unit of the organization that has primary responsibility for operational resilience management "services," such as the information security department or a business continuity team. In this context, the operational resilience management activities are also the services of the organizational unit.