- Bringing the Managed Data to the Code
- Scalability: Today's Network Is Tomorrow's NE
- MIB Note: Scalability
- Light Reading Trials
- Large NEs
- Expensive (and Scarce) Development Skill Sets
- Linked Overviews
- Elements of NMS Development
- Expensive (and Scarce) Operational Skill SetsElements of NMS Development
- MPLS: Second Chunk
- MPLS and Scalability
- Summary
Expensive (and Scarce) Development Skill Sets
Building management systems for the devices of today and tomorrow is increasingly difficult. (The same is true for device development where new technologies such as MPLS and Gigabit Ethernet are being added to new and legacy layer 2 NEs.) Some vendor organizations have completely separate groups devoted to NE and management system development. This introduces a need for clear communication between the groups. Aside from this, the skill set required of NMS software developers is growing and includes, in some cases, what have traditionally been separate disciplines:
-
Object-oriented development and modeling using Unified Modeling Language (UML) for capturing requirements, defining actors (system users) and use cases (the principal transactions and features), and mapping them into software classes
-
Java/C++
-
GUI, often packaged as part of a browser and providing access to network diagrams, provisioning facilities, faults, accounting, and so on
-
Server software for long-running, multiclient FCAPS processes
-
Specific support for mature/developing features, such as ATM/MPLS
-
CORBA for multiple programming languages and remote object support across heterogeneous environments
-
Database design/upgrade—matching MIB to database schema across numerous NMS/NE software releases2
-
Deployment and installation issues—performance is always an important deployment issue, as is ease of installation
-
IP routing
-
MPLS
-
Layer 2 technologies such as ATM, FR, and Gigabit Ethernet
-
Legacy technologies such as voice-over-TDM and X.25
-
Ability to develop generic software components and models—the management system can hide much of the complex underlying detail of running the network
-
Client/server design
-
Managed object design, part of the modeling phase for the management system
-
MIB design—often there is a need for new objects in the managed devices to support the management system
This is an impressive set of skills for even the most experienced engineer. An excellent overall knowledge of these areas is needed along with an ability to focus on any one of them. The general migration to a layer 3 infrastructure is another reason for the widening gap between available development skills and required product features. Natural attrition, promotions, and new entrants to the industry ensure that there is a steady supply of engineers who are fairly unlikely to have all the required skills. Added to this is the need for customers to see rapid ROI for all infrastructural purchases. It seems a different type of approach is needed for developing management systems, one that involves adoption of:
-
A solution mindset
-
Distributed, creative problem solving
-
Taking ownership
-
Acquiring domain expertise
-
Embracing short development cycles
-
Minimizing code changes
-
Strong testing capability
Acquiring skills like these would positively enhance the development process. We examine strategies for developing these capabilities in the next chapter. For now, the elements of them and their advantages are described.
Developer Note: A Solution Mindset
Adopting a solution mindset is an important first step in effective NMS development. It reflects the move away from the purely technological aspect of products to embrace the way enterprises and service providers now look at overall solutions to business problems. The days of box shifting (selling NEs with little or no management facilities) are probably gone forever. This is no bad thing because the real cost of adding NEs to networks is the time and money required to debug and integrate them into the fabric of the network. This is why network operators require ease of management for new NEs. An extra problem for vendors is that hardware is increasingly sold at heavily discounted prices. The revenue from hardware sales may no longer match the effort required for development, though upgrades and enhancements may help offset this. Customers will pay for overall solutions that make it easier to manage and operate their networks. An example is consolidation of incompatible NMS into a single NMS, as we saw in Chapter 1. Solutions have a number of characteristics:
-
Clear economic value
-
Fulfillment of important requirements
-
Resolution of one or more end-user problems
An issue facing many network operators is what to do with legacy layer 2 equipment. Should the operator simply throw away its existing hardware? This is a difficult question, and providing a migration path for such users is a good example of a solution. Existing deployed device software should also be maintained by the vendor for as long as possible in order to protect the network operator investment. (This is often easier said than done, as devices such as PABXs are increasingly discontinued because they have reached the end of their lifecycle.) Large networks don't change overnight, so management systems should be written to accommodate both legacy and new equipment. The MPLS ships-in-the-night (SIN) option that we discussed in Chapter 2 is an example of such an approach. SIN is a special mode of operation on MPLS nodes. It allows ATM users to upgrade their firmware (some devices may also need hardware upgrades) to MPLS and then simultaneously use both ATM and MPLS control planes on the same switch. The two technologies do not interact, but pass each other like ships in the night (hence the name). The logical progression of this is to try to allow any layer 2 service to cross an MPLS cloud. This is a good example of solutions thinking because it saves money, protects existing investments, and addresses important user problems.
Well-engineered management solutions are also of benefit to vendors when they are built from components. The elements of such solutions can be re-used in other areas and products. The vendor can leverage the solution for many purposes. Examples of management systems solutions include the following:
-
Providing minimal management support for third-party devices. Many NMS are proprietary, supporting only the equipment vendor's hardware. Networks may contain multivendor NEs, so separate NMS are often required to support what are often very similar devices from different vendors. It is better for end users if the incumbent NMS provides limited (rather than no) support for third-party NEs. NMS vendors should be prepared to offer this support even if it means just device discovery and notification/trap handling.
-
Creating generic management system components that can be used across numerous different products and technologies, such as ATM and MPLS connections. An ATM virtual circuit is not the same thing as an MPLS LSP, but the management software can still provide a technology-independent GUI abstraction for both. The user is then freed from the complexity of the underlying technologies and can perform similar management operations for both. This also reduces training time.
-
Aiming for technology-independent software infrastructure using standard middleware, such as CORBA-based products, rather than custom-built facilities.
As far as possible, the management system should also provide code encapsulation for functions such as SNMP access, network message transport, and network protocols. This is illustrated in Figure 3-4, where the FCAPS areas are shielded from the complexities of the underlying SNMP, messaging services, and network technologies.
Figure 3-4. FCAPS software layers.
While this seems an obvious point relating to good software development practice, it's surprising how often low-level code (such as SNMP API calls) is called directly from the FCAPS layer. In many cases, this is just poor coding practice caused by inexperience or excessive haste. Then, the smallest change in the low-level code requires a full FCAPS rebuild. It is important that changes to MIBs or underlying protocols should not necessitate a full rebuild of the management system. This loose coupling (via APIs and layering) between components makes it easier for developers to take ownership of substantial product areas. In turn, this can help in avoiding situations in which a change to one component breaks the code in another component.
Developer Note: Distributed, Creative Problem Solving
Once management systems have been built and integrated into a vendor test network, they often present complex problems. The distributed nature of managed networks (NEs with local agents reporting to a centralized NMS) and the broad functional composition of NMS present difficult logistical development problems. Solving such problems is where the expanded skill set comes into its own. Typical problems are:
-
Software bugs
-
NE bugs (can be very hard to identify)
-
Performance bottlenecks in any of the FCAPS applications due to congestion in the network, DBMS, agent, manager, and so on
-
Database problems such as deadlocks, client disconnections, log files filling up, and so on
-
Client applications crashing intermittently
-
MIB table corruption, such as a number of set operations that only partially succeed—for example, three setRequests (against a MIB table) are sent but one message results in an agent timeout and the other two are successful, which could leave the table in an inconsistent state
-
SNMP agent exceptions
Solving these and other problems requires a wide-ranging view of the system components and an excellent understanding of technologies like SNMP, databases, and networking. It also requires a creative approach to the use of debuggers, MIB browsers, trace files, and so on. Clearly, part of creative problem solving is a requirement for developers to have a high aptitude for testing. Such developers leverage their product and software knowledge to comprehensively test the system prior to delivery to QA. This helps in providing solid software builds to QA. Organizations play a role in facilitating this by the provision of many of the excellent tools available, including:
-
UML support packages
-
Java/C++/SDL products
-
Version control
-
Debuggers
The ultimate goal is zero-defect software. The complexity of NMS code often means that bug fixes can take a long time, often a day or more. Taking the time to do this is nearly always a good investment, provided any changes are properly tested.
Developer Note: Taking Ownership
Taking ownership is another important part of a solution mindset. In this, engineers strive to produce a complete feature without the need for handing off part of the development to others. A broad task can be ring-fenced by a small group of developers who take responsibility for design, development, and delivery. Given the skill set required for management system development, this is a difficult undertaking. It means that traditional development boundaries are removed: no more pure GUI, backend, or database developers. All NMS software developers should strive to extend their portfolio of skills to achieve this.
Another aspect of taking ownership is being prepared to fix bugs in old code produced in earlier projects. This can be achieved in conjunction with maintenance and support developers. The important point is that ownership is maintained even as new projects are undertaken. This has the additional merit of extending institutional memory and minimizing the incidence of coding errors during support bug fixes. Institutional memory relates to individual developers with key knowledge of product infrastructure. It equips the organization to smoothly migrate products over numerous release cycles and is an essential skill for long-term development. The end result is more robust management software in customer networks.
Developer Note: Acquiring Domain Expertise and Linked Overviews
Many service providers employ domain experts for producing documents such as bid requests and requests for proposal. These are highly detailed documents that are sent to vendors. Service provider domain experts may be permanent staff or external consultants. Vendors tend to employ sales and marketing executives as inhouse domain experts. The interplay between these two groups ultimately drives much of the vendor's engineering effort. Both groups of domain experts tend to have impressive expertise. It is important that these skills are also available in engineering, because domain experts [JavaDev] tend to be in great demand. In other words, engineers need to become domain experts as well.
Domain expertise represents a range of detailed knowledge, such as IP/MPLS, that can be readily applied to the needs of an organization. For service providers, the knowledge of their domain experts is leveraged for structuring bid and proposal documents and generally formulating short-, medium-, and long-term strategies. Such knowledge might include areas such as:
-
Layer 2 and layer 3 traffic engineering
-
Layer 2 and layer 3 QoS
-
Network management
-
Convergence of legacy technologies into IP. Many service providers have built large IP networks in anticipation of forecasted massive demand. These IP networks are, in many cases, not profitable, so service providers are keen to push existing revenue-generating services (such as layer 2) over them.
-
Backward and forward compatibility of new technologies, such as MPLS. An example is that of a service provider with existing, revenue-generating services such as ATM, FR, TDM, and Ethernet. The service provider wants to retain customers but migrate the numerous incoming services into a common MPLS core.
The choice of technology, systems, and devices in each of the above areas is critical and is an opportunity for one or more domain experts.
Domain expertise is needed by engineers, for example, when adding technologies such as Gigabit Ethernet or MPLS to a suite of IP routers or ATM switches. The acquisition of domain expertise is an essential component of solutions engineering. This is easier said than done, because the number of technologies is increasing at the same time as layers 2 and 3 converge. Interestingly, the boundaries of modern networks are also shifting: Devices that were in the core a few years ago are now moving out to the edge. Also, devices that were in the access layer can be enhanced and moved into the distribution layer. In many cases, the different network layers may have individual management systems. The movement of devices across the layers means that support for a given NE may have to be removed from one management system and added to another. This adds to the knowledge burden of developers. Acquiring domain expertise is necessary for hard-pressed developers.
A key to becoming a domain expert lies in what we call linked overviews, described in the next section.