- Transport Network Failures and Their Impacts
- Survivability Principles from the Ground Up
- Physical Layer Survivability Measures
- Survivability at the Transmission System Layer
- Logical Layer Survivability Schemes
- Service Layer Survivability Schemes
- Comparative Advantages of Different Layers for Survivability
- Measures of Outage and Survivability Performance
- Measures of Network Survivability
- Restorability
- Reliability
- Availability
- Network Reliability
- Expected Loss of Traffic and of Connectivity
3.10 Restorability
A simple GOF-type measure that is widely used in design and characterization of transport networks is the restorability, also sometimes called the restoration ratio. Restorability is the most basic indication of survivability because it directly reflects the extent to which "single points of failure" have been removed as outage-causing circumstances. The biggest single step toward survivability is to eliminate single-span failures as a cause of service outage. This has a quantum effect on improving service availability as service outage can then only arise from much less frequent dual failures or node failures. As most commonly used the restorability is defined as the fraction of payload-bearing (i.e., "working") signal units that are subsequently restored, or that are topologically capable of being restored by replacement routes through the network. That is, for a specific failure scenario X,
where (most generally) wi,j is the number of service paths between nodes i,j that are failed in the failure scenario X. This way of stipulating a failure scenario is totally general; any number of span and/or node failures can be represented in terms of the set X of i,j node pairs that simultaneously have one or more failed paths in scenario X. Thus the denominator of Equation 3.4 can be thought of as a "total damage" sum in terms of the number of transport signal units that are severed in the failure scenario X. The numerator is the sum of what is restored (or can be restored) for each subset of failed signal units corresponding to a damaged span. ki,j represents the number of replacement (restoration) paths that can be provided for (i,j). The min(-) operator ensures that no credit is given for providing more restoration than is actually needed for any subgroup of failed working signals.
One set of failure scenarios that is of particular practical interest is the set of all single and complete span failures. That is the set of all X which just one (i,j). In this case the restorability for any one scenario m = (i,j) simplifies to:
and the network restorability is defined as the average restorability of all working paths that are failed under each single-span failure scenario. That is:
where S is the set of all spans in the network. Rn = 1 is often referred to as a "fully restorable" network. It is the mark of a network that can withstand any single-span failure without any service path outage. As a single figure of merit for network survivability Rn is of considerable practical interest because:
-
The likelihood of failure scenarios containing more than one (independent) span failure at a time is much lower than a single failure.
-
It is generally considered economically feasible (or at least necessary and reasonable) to design for Rn = 1 whereas it may be economically infeasible to protect against all possible multi-span or node failures by design.
-
Rn is a property of the network design, or current network state, that is independent of any knowledge or assumptions about actual failure frequencies or mechanisms.
-
Given the much higher failure rate of cables (outside plant structures in general) relative to node failures, achieving Rn = 1 by design is the most significant single step that can be taken in practice toward improvement of service availability.
A variety of purpose-specific variants from the basic definition of restorability are common. Examples are the "prompt restorability" which is the restorability level arising before a certain elapsed time from failure onset, or the "dual-failure restorability" which is as the name suggests and is considered further in Chapter 8. Other measures can include prioritized demand weightings Rn. These are all valid measures as long as their specifics are fully stipulated in terms of the specific set of failure scenarios being considered and the criteria being employed to define survivability against those failures.
Restorability, and GOF measures in general, are relatively simple to compute and to understand, because they reflect simple measures of recovery levels for a specific set of assumed failure scenarios. In contrast, ROF measures can be much more involved and/or require simulation. A grounding in reliability and availability theory is required for their appreciation. Let us therefore now cover the basic concepts of reliability and availability which underlie ROF measures, and are also highly relevant to work in network survivability in general.