Testing Challenges in Packet Telephony
Testing communications equipment and networks has never been an easy task. Even a moderately complex topology of a data network can result in myriads of permutations of test scenarios between the protocols and the applications that they serve. To appreciate the level of complexities involved in the general case, you must start from the bottom of the protocol stacks supported by the network and move upward, toward the applications themselves. A network topology can consist of switches, routers, and Integrated Access Devices (IADs). In turn, those devices can employ OC-n, SONET, and Ethernet interfaces of various flavors, in addition to physical access interfaces such as DSL, cable, or simple twisted-pair wire. The permutations of protocols at layers 2 and 3 make the test scenarios exciting, and the applications complete the scene with their own requirements and quirks in using the underlying protocols.
The tricky part is identifying and enumerating the test scenarios that must be executed to ensure complete coverage of functionality and performance testing. In functionality testing, we must make sure that we include failover (rainy-day) scenarios; in performance testing, we must ascertain that the system's performance characteristics remain invariant when failover conditions occur.
It is, however, a tedious but necessary process to enumerate all the possible operating conditions of a system that must be reached before we "hit" it with a failover scenario. What does this mean? Consider this example: A system in the process of completing a single call while no other calls are active will most likely not behave the same way as a system with hundreds or thousands of stable calls that is also in the process of completing a few dozen calls when the failure occurs and that has a switch that must undertake recovery procedures. Simply speaking, you must make sure that the "state" of the switch is preserved and that it continues to operate "normally," depending on how the product specifications define "normally" under a failure scenario.
Designers of switching systems will always be looking for newer and better test tools to ascertain complete conformance at the functional and performance levels of any product before it leaves the lab. Functional and design problems usually do not manifest themselves as hardware issues in the system test phase, except in cases of VLSI functionality, which might not have been fully tested in the unit test phase. Life gets interesting when platform and architectural issues surface in integration testing:
The wrong CPU platform was picked to meet the specified performance in calls per second and call capacity.
A distributed switch architecture reaches a plateau in performance and stops scaling.
The custom VLSI devices either have bugs or are not up to par with specifications.
Prevention of such issues is the best defense, through elaborate diligence of the system in the design phase and thorough testing in unit testing through a comprehensive test suite that spans all aspects of the design except those that need other modules to ensure complete system functionality. Architectural problems can be virtually impossible to solve after the fact and can doom a system while it is still in the lab. Diligence of an architecture is very hard to do while still on paper or in someone's mind. That's why it is a good idea to construct detailed scenarios to drill into the operation of the system before pen meets paper to start the design phase. Simulation is a good approach to get a comfortable feeling at this stage, but simulation needs to be fed the correct protocol behavior(s), the complete set of protocols that will be supported, the permutations of protocol interactions that will be encountered during call setup, the feature call flows, the queuing behavior and stability of the system under heavy load, and the impact of the underlying operating system on performance.
The last part, the OS, is tricky to account for. This is because its impact can be hard to simulate unless there is a good understanding of the incremental impact of the OS on system performance, such as disk operations in the middle of call setup, which might or might not be related to the call setup itself. Often the case in missed performance expectations is the impact of "other" things that are running on the platform (such as billing, FCAPS, database ops, and so on), which must be accounted for when setting the expectations in advance. Therefore, an accurate assessment involves the complete understanding of everything that is running on the system while it performs call processing.
Establishing system stability early on in the architecture and design process is vital because a well-designed system must never crash under traffic conditions of heavy load or unexpected events. A lot of this footwork can be done during simulation by feeding excessive traffic into the various parts of the platform, establishing the continuing system operation, and documenting its expected behavior. For example, if certain packet-discard policies have been designed into the system, the only way to ensure that the system is functional is to cause the conditions that will invoke them. If the requirement is for the system to accept a 911 call regardless of current system load, then load up the system with 100% traffic capacity and send a 911 call through it to see what happens. Successful simulation gives the architect and project managers the confidence that they need to proceed with the next step: the design phase.
After such in-advance testing has been done, via simulation first and in the lab later, your customer will get an equally warm feeling about the system.
Unit Testing
In unit testing, in which the various system components are tested for functionality and performance at an earlyand mostly standalonephase, it is important to cover as much of the functionality as possible before proceeding with system integration. For example, if a switch consists of a CPU platform and a variety of gateways, with all sorts of physical interfaces and capacity specifications, a thorough test of a single type of interface with all the supported signaling protocols and media transport methods will reveal whether the subsequent integration test will be successful and quick or whether trouble should be expected. Furthermore, if sufficient test resources are available (such as time and personnel to write simulated call processing scripts on the real CPU), protocol interworking can be checked out well before connecting to a gateway, via call arrival and processing over simulated physical interfaces.
If firewalls are to be used, it wouldn't be a bad idea at this time to check the viability of the performance specification with a single firewall, a single physical interface, and as many protocol interworking scenarios as possible in this stage. If this stage shows weaknesses in a centralized configuration, system integration in a distributed environment is certain to reveal even more problems. The rule of thumb is that the more coverage is achieved in unit test, the fewer headaches will be caused during system integration.