- Transport Network Failures and Their Impacts
- Survivability Principles from the Ground Up
- Physical Layer Survivability Measures
- Survivability at the Transmission System Layer
- Logical Layer Survivability Schemes
- Service Layer Survivability Schemes
- Comparative Advantages of Different Layers for Survivability
- Measures of Outage and Survivability Performance
- Measures of Network Survivability
- Restorability
- Reliability
- Availability
- Network Reliability
- Expected Loss of Traffic and of Connectivity
3.7 Comparative Advantages of Different Layers for Survivability
The layered view we have just worked through allows us to see that survivability measures at each layer are for the most part complimentary, not competitive. Physical layer measures are essential and service layer measures always help. And we should always have at least one technique implemented at the system or logical layers but there is really no need to employ both, especially if cost is considered. An important planning decision is thus whether to employ a system layer or a logical layer recovery scheme. Two of the main factors in this decision are flexibility and efficiency. With rings, or 1+1 diverse routing, there will be an investment of over 100% in redundant transmission capacity because (by definition) both diverse routes cannot be equally shortest routes. With logical layer mesh alternatives this may often be reduced to 50-70% redundancy. In addition, complete flexibility exists with an OXC-based (i.e., logical layer) implementation (1) to adapt the protection stance to changing demand patterns, (2) to evolve the entire protection strategy from one scheme to another and/or, (3) to implement prioritized protection service classes. A summary of other comparative aspects is offered in Table 3-7.
Table 3-7. Comparative Strengths and Weaknesses of Layers for Survivability
Attribute |
Transmission System Layer |
Logical Cross-Connection Layer |
Services (or IP Transport) Layer |
---|---|---|---|
example: |
BLSR Rings |
Span Restoration |
MPLS SBPP |
Capacity Required |
Highest |
Middle |
Least |
Speed |
Highest (~50 ms) |
High (~ 100–300 ms typ.) |
Slowest (seconds–minutes) |
Certainty / predictability |
Highest |
High |
Lower |
Multiple Quality of Protection (QoP) |
None |
Easily supported on per path basis |
Easily supported |
Provisioning view (working) |
Ring-constrained shortest path |
Shortest path |
Shortest path coordinated to be disjoint with protection |
Provisioning view (survivability) |
Inherent once routed |
Checked upon shortly after routing |
Coordinate protection sharing arrangements network-wide |
Degradation characteristics (if restoration fails) |
Abrupt and total outage |
Abrupt on affected channels, may be partial |
More graceful degradation; congestion not outage |
Oversubscription strategies |
No |
SONET or WDM: no ATM VP: yes |
Yes |
Customer control |
Least |
Through VPN services |
Most |
Database and protocol dependencies |
Least: a "hardwired" implementation |
Little: event-driven protocols in firmware interacting on overhead bytes, network state is database |
Highest–large databases of global network state, dissemination protocols, software dependent |
Susceptibility to SRLG effects and fault escalation |
Least, controlled during planning |
Low, especially with adaptive distributed restoration |
Highest vulnerability to SRLG effects and physical-to-logical fault expansion |
Multi-Layer Protection: Containing the Inheritance of Dependencies
In thinking about the different layers where we can implement survivability, the issue of physical to logical fault multiplication is critical. Adequate knowledge of SRLG relationships may be extremely difficult to obtain (or maintain) if there are several steps of the emergence and inheritance of failure dependencies, to use the terms introduced in [OePu98]. At every layer of routing abstraction, new fault dependencies emerge and are inherited by all higher levels. The growth in complexity of determining physical diversity between paths as one goes higher up the hierarchy from physical toward service layers is conveyed in Figure 3-14, based on [OePu98]. Graph G shows the layout of cables which in this case involves some degree 1 nodes. As mentioned, a first step is to create a biconnected physical graph. Doing so in this example would remove some of the dependencies in G' emerging from G, but not all. Even when G is biconnected, dependencies between transmission systems are impossible to avoid as long as the systems are allowed to pass through nodes without terminating. They are especially frequent if least-cost routing of each system is desired. For example, we could close G with respect to stub-node 7 by adding a cable (7-8) but transmission systems (6-5) and (7-5) would likely remain dependent because span (5-7) in G is on both of their shortest routes. Observe also that node 11 is a junction in the cable graph but has no corresponding appearance in the higher level logical graphs. This is the classic case of a common duct (here, 5-11) creating dependency between what are otherwise viewed as separate transmission spans (6-5) and (7-5) in G'. When one routes lightpaths over theses transmission systems, still further dependencies emerge where lightpaths share transmission systems and the prior dependencies from the cables to systems layer are inherited.
Figure 3-14. Illustrating the fault dependencies that emerge and are inherited by higher levels (adapted from [OePu98]).
The example in Figure 3-14 goes up only two layers above the cables and considers only eight top-layer nodes. In practice if the G'' shown is the lightpath service layer, then service paths at the STS-3c level routed over them have at least one or two more layers of dependency emergence and inheritance. The overwhelming impression, extrapolating from Figure 3-14 as a simple example, is that it may be difficult to give a robust assurance of full survivability against a cable cut if operating higher up in the hierarchy. Diverse STS-3c level paths would be able to protect the corresponding service against same-level failures or a single lightpath failure (one layer down), or perhaps also a single transmission system failure (two layers down). For example, STS-3c level 1+1 diversity can protect against an STS-3c interface port failure on the host router or against a lightpath failure (including access multiplexing) one layer below. But at three layers of reach-down (to the cables) it seems far less plausible that we would always be certain that STS-3c primary and backup paths would have no inherited dependencies.
In practice this a compelling reason to use protection strategies at the service layer and at either the system or logical layers. Diversity measures at one level can realistically be expected to protect against single failures with known dependencies one or two levels below, or at the same level, but it is probably unrealistic to expect a services layer diversity mapping to retain complete physical disjointness more than two layers below. One set of options to consider is rings, p-cycles, or mesh protection implemented with whole-fiber cross-connects directly over a biconnected physical cable graph, G. In this case there are no emergent dependencies to be inherited by higher layers since each cable span becomes a directly protected single-failure entity. This is simple, robust, and requires relatively low-cost devices for whole-fiber protection switching. It is not very fine-grained, however, and the devices used for protection have no secondary use such as for dynamic service provisioning (other than provisioning dark fiber services).
Alternately, logical layer measures implemented in G' using cross-connects for mesh-protection at the channel level are more agile, multi-purpose, and fine-grained in capacity-handling and only have to cope with one level of known dependencies, such as arise in G' from the cable junction node 11 in G. Using SBPP in G'' (instead of a protection scheme in G') is not infeasible, but we see a major complexity associated with this alternative because now we are two layers above the level at which physical faults occur—so the complete map of dependencies is far more complex. To go yet another layer up and rely on MPLS autoreprovisioning or MPLS-level SBPP, it becomes hard to imagine that we could support a claim of protection (implemented at that level) against failures stemming from the physical layer, because G is a full three levels below the MPLS layer. This all suggests a practical principle that the emergence and inheritance of SRLG-like effects may need to be "contained" by an appropriate protection arrangement every two layers. If followed, this leads to a strategy of choosing some basic "infrastructure" protection scheme at the system or logical layers and complimenting it (possibly only for high priority service paths) with an additional technique at the corresponding service layer itself. For example system level p-cycles and service level 1:1 APS would be one combination. Lightpath level SBPP complimented by MPLS layer p-cycles would be another viable combination, and so on. Note in this regard that even in an ideal "IP over WDM" network, the three layers (G, G', G'') in Figure 3-14 all still exist. Where the reduction of levels occurs in IP over WDM is actually in the levels above the lightpath layer.