- Transport Network Failures and Their Impacts
- Survivability Principles from the Ground Up
- Physical Layer Survivability Measures
- Survivability at the Transmission System Layer
- Logical Layer Survivability Schemes
- Service Layer Survivability Schemes
- Comparative Advantages of Different Layers for Survivability
- Measures of Outage and Survivability Performance
- Measures of Network Survivability
- Restorability
- Reliability
- Availability
- Network Reliability
- Expected Loss of Traffic and of Connectivity
3.6 Service Layer Survivability Schemes
Service layer techniques are the last safeguards before physical failures become apparent to user applications and are usually worth having in addition to a lower layer scheme. Unlike lower layer schemes in which costs are incurred for extra ports and explicit protection capacity, service layer schemes are usually software-based implementations that attempt rerouting within the working, but only partly utilized, capacity that is visible at the service layer. A service layer rerouting response can also complement a lower layer response if the latter is incomplete, by logical reconfiguration of its paths, and/or application of service priorities to reduce delay or packet loss.
In addition, a service layer node failure, or interface failure on a switch or router, can be best dealt with among the peer layer network elements in the same service layer itself. Unlike methods at the logical or system layers which tend to be very fast-acting but all-or-nothing in terms of their benefit for any given path i.e., they either fully protect the traffic-bearing signals (so the effect is invisible) or do not (so the effect is total outage), service layer methods are generally more gradual and provide a shared "graceful degradation"-type of network response. Typically, blocking or congestion and delay levels may rise, but a basic functionality continues. Thus, except for extreme cases, service layer restoration methods tend to prevent hard outage per se, trading a performance degradation instead.
Table 3-6 identifies a number of options for service layer survivability. Dynamic routing in circuit switched networks and link-state adaptive routing schemes, such as OSPF in the Internet are the two most traditional service layer schemes. With the advent of an IP-centric control plane, several of the logical layer schemes, in particular SBPP and p-cycles, have direct correspondents for use in the service layer as well. The main difference is only that a physical circuit like path entity in the data plane is replaced with a virtual path construct such as a VP or LSP. With IP-centric protocols an essentially identical control plane implementation can establish these service layer constructs, just as GMPLS constructs transport layer constructs. In an MPLS/IP service layer, label-switched paths just replace lightpaths in the prior descriptions of logical layer SBPP and p-cycles. Other more service-specific forms of restoration are also possible in the services layer. For instance, circuit switched telephony networks have long-used centralized adaptive call routing (called dynamic routing or dynamic non-hierarchical routing (DNHR)) to re-calculate routing plans in the face of congestion [WoCh99], [Topk88], [Ash91] , [IEEE95]. And, of course, in all data networks, message retransmission and adaptive routing protocols apply. These, and the basic ability of OSPF to update its routing tables following link withdrawal LSAs, are all possible forms of service layer restoration mechanisms, as well as GMPLS auto reprovisioning of MLPS paths. The same basic proviso applies that, by itself, mass independent reprovisioning attempts by every affected end-node pair will have no assured or predictable outcome. But when used in the services layer to complement a logical layer restoration response (if needed), there is much less concern, because any auto reprovisioning activity in the service layer is, in that context, understood as only a best-efforts activity to improve performance following a logical layer response.
Table 3-6. Schemes for Service-Layer Restoration or Protection
Scheme or Principle |
Short Description |
Notes |
---|---|---|
MPLS p-cycles |
IP-link protecting p-cycles formed using LSPs |
Conceptually same as span-protecting p-cycles in logical layer but formed in MPLS layer and amenable to oversubscription based planning (Chapters 7, 10) |
Node-encircling p-cycles |
p-cycles formed as LSPs to protect against node (router or LSR) failure |
p-cycles for which all flows through a node are straddling flows hence restorable in the event of node loss (Chapter 10) |
MPLS SBPP |
Equivalent to ATM Backup VP Protection |
Oversubscription based capacity design (Chapter 7) |
MPLS SLSP |
SBPP on redefined sub-path segments |
Short leap shared protection on LSPs with overlapped SBPP sub-path setups |
OSPF (for routed IP flows) |
Routing table reconvergence |
No assured recovery level if used without a lower layer scheme, uncontrolled oversubscription |
OSPF-TE / CR-LDP (for label-switched paths) |
Independent LSP "redial" |
No assured recovery level if used without a lower layer scheme, uncontrolled oversubscription |
Dynamic call routing |
Centrally recomputed alternate routing policies |
Minimizes circuit-switched trunk group blocking |
Dynamic routing schemes for circuit switched networks are an evolution of alternate routing in teletraffic networks wherein a direct "high-usage" trunk group would be supplemented by shared overflow "final routes." The routing of individual calls is determined at call setup time by first testing the direct route and then possibly one or more alternate routes, subject to loop-avoidance constraints. Dynamic routing schemes today follow this basic pattern but are centrally managed with a typical period of about 10 seconds between updates to a central site on the traffic levels on each trunk group from each site. Centralized algorithms can then update the outgoing first and/or second choice trunk group recommendation at each node based on the destination of calls it is handling. The centralized recommendations are able to take into account the current congestion states in various parts of the network, thereby inherently diverting traffic flows around areas of failure. A main benefit of adaptively updating the routing tables is exploitation of the non-coincidence of busy hour loads in the network.
Note that such updates to the routing plan do not imply rerouting existing connections. The aim is to improve the situation for new call (or packet or LSP) arrivals only. This is a natural approach in a pure data or telephony service layer where calls or sessions come and go on a minute-by-minute time-scale, and where users can re-establish their calls if need be and where data protocols retransmit lost data packets. Thus, the aim and approach is to seek adaptations that improve aggregate performance, without too much concern about the fate of any particular call or session. This is in contrast to the lower layer restoration environment where the emphasis is on re-establishment of existing paths, which may be in existence for years, and which may bear the entire traffic between two cities, rather than a single call or data session.
In circuit-switched services efforts may also be made to split the realization of the single logical trunk group between two nodes over physically diverse paths. In addition, optimization algorithms can be used to slightly overprovision the trunk quantities in each group as a general margin against failures or congestion. Other attractive aspects of service layer restoration in general is that different priority statuses for various users or services may be much more easily established and finely assigned. In addition, capacity is managed at a much finer scale so that small amounts of available capacity in larger working channels units manipulated by the lower layer schemes can be accessed to enhance performance.
Service layer schemes may also involve establishment of a full logical mesh of trunk groups or MPLS paths, or a full mesh over a subset of key nodes, so that routing of any connection is not through more than one other intermediate node of the same service layer. Such high-degree logical connectivity is possible because the resources to support this in service layer networks are essentially virtual, i.e., VPI numbers or LSP labels, etc., not physical cables and routes. A consequence of this, however, is that when a high-degree logical network is established over a sparse physical network there can be escalation or expansion of one physical cut into a large and hard to predict number of logical link failures in the service layer, making it rather uncertain to rely solely on a service layer restoration scheme. This is when it is especially useful to make sure a system or logical layer scheme is in place to hide the whole event from the service layers. This is later referred to as the "fault escalation" issue.